QS Deduplication Job taking 2jrs for 34000 records

Infosphere's Quality Product

Moderators: chulett, rschirm

Post Reply
saadmirza
Participant
Posts: 76
Joined: Tue Mar 29, 2005 2:57 am

QS Deduplication Job taking 2jrs for 34000 records

Post by saadmirza »

Hi all,
I have a job which uses QS PLugin for DS to run a QS job within DS.
Its a Undup Job with 1 input file and 2 output files. It has 6 passes within the job. Strangely it never took that much time in our Quality and Dev environment but in Production it is taking 2hrs for Deduplicating 34000 records.
How can I analyse where the problem lies since QS doesnt provide me any monitoring system.
Please give your valuable suggestions on this topic.

Thanks,
SM
vmcburney
Participant
Posts: 3593
Joined: Thu Jan 23, 2003 5:25 pm
Location: Australia, Melbourne
Contact:

Post by vmcburney »

Your QualityStage text log file will tell you a lot, it has an entry for each step with a timestamp so you can roughly tell how long each step is taking (I hope!). Strange that it is slower in prod, are any traces turned on?
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

If you really want to nail it you could modify the scripts that QualityStage runs, to add timing points to them. This would help you to identify the hot spots. How may rows were processed in the development environment? What other activities were occurring at the time on the production machine?
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
saadmirza
Participant
Posts: 76
Joined: Tue Mar 29, 2005 2:57 am

Post by saadmirza »

Hi,
The Production server runs only DS and QS. In dev and Quality Environment we tested the same number of records. It went thru in 2min.
The input link of the Plugin takes 4rows per sec for 34000 rows. Why?
Is it DS problem or QS? There are no traces on P server.
Please advice.

Thanks,
SM
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

What's different between the two systems? Be as detailed as possible.
There are no traces on either system - you have to add them yourself.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
saadmirza
Participant
Posts: 76
Joined: Tue Mar 29, 2005 2:57 am

Post by saadmirza »

Hi Ray,
Do you want the server config information for the 2 servers(Q and P)?Actually I have one more QS job attached as a After Job Subroutine with Exec Dos. This is a match job and it executes within seconds for the same number of records. What do you forsee?

Regards,
SM
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

You must be the detective - I'm not there. Take the QS pieces out of the DS job and run some timings. Run the QS jobs separately and get more timings. I doubt that qsrtmngr is likely to be the bottleneck, but you could trace that also. Monitor the system while all these things are happening, particularly %Idle for CPU, PF/S for memory. Monitor both systems, in case this shows up any discrepancies.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
saadmirza
Participant
Posts: 76
Joined: Tue Mar 29, 2005 2:57 am

Post by saadmirza »

Hi Ray,
I run the QS job independently thru QS designer and it executes within few seconds for 34000 recs. So I assume that the QS is not giving problems.
Ascential Support replies: "Then it has to be the server differences with the id's being used. When certain id's are used on the particular servers the user id invokes default settings. Swap space, etc.."

I am confused but the problem still persist.
Pls advice.

Thanks,
SM
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

This is where you have to begin being a real detective. Maybe a good first step is to use a before/after subroutine to report who you are, your environment, and so on. If you have MKS Toolkit you can use UNIX commands like id and env to report these factors. Look for differences.

Similarly look for free disk space differences. QualityStage is particularly hungry for disk resources. Get your administrator to monitor disk I/O (you can do this with Task Manager, on the Processes tab choose View > Select Columns... - or use Performance Monitor software. Again you're looking for differences between the two systems while the two jobs are running. And take snapshots of disk free space while the jobs are running on each system - use the $DSHOME\bin\avail command for this or, if you have MKS Toolkit, use the df -kPt command.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
saadmirza
Participant
Posts: 76
Joined: Tue Mar 29, 2005 2:57 am

Post by saadmirza »

Hi Ray,
Just now I tried running the QS job from a DS job as Before Job subroutine instead of using a DS job with QS plugin. It runs in 1 min tme.
Also, my server is on Windows server and I cannot using Unix commands.
I also checked the server performance...its is optimum as desired since no other application runs on the system. If the server is a problem then why only for this job which takes 4rows per sec as compared to pther jobs which take some 200rows/sec.

Thx,
SM
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

You CAN run the UNIX commands. Get MKS Toolkit or CygWin. You get MKS Toolkit for free with DataStage EE 7.5x2.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
wendor
Participant
Posts: 1
Joined: Fri Oct 01, 2004 11:02 am

Post by wendor »

ray.wurlod wrote:You CAN run the UNIX commands. Get MKS Toolkit or CygWin. You get MKS Toolkit for free with DataStage EE 7.5x2.
Actually if he's running QualityStage 7.x on Windows server then MKS toolkit is already installed as part of QualityStage.
saadmirza
Participant
Posts: 76
Joined: Tue Mar 29, 2005 2:57 am

Post by saadmirza »

Hi Ray and all,
Finally running around the bush will not help...I have realised..Just lack of Documentation and experience will make you do this...
The issue is resolved.
Solution:
When I transported the DS job which uses QS plugin, the Host system within QS plugin was still pointing to the same old D machine instead of P. We need to change the host system in QS plugin manually after transport to P or to any new machine. I assumed that QS plugin would take the information from the run profile which i configured properly. The job was running at 4 rows/sec but did not abort as I had shared the D machine on my P environment i suppose. I am not sure. But now after changing the Host IP in QS plugin, the job runs at 1000rows/sec.
Hope this info helps everyone who is new to usage of QS plugin.

Thanks,
SM
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Well done! I'm sure we would have gotten there, systematically, eventually. Sounds like a good opportunity to use a job parameter!
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply