improving performance

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
dnat
Participant
Posts: 200
Joined: Thu Sep 06, 2007 2:06 am

improving performance

Post by dnat »

Hi,

I have several datastage jobs which run to update multiple dim and fact tables in a data warehousing application.

Most of the jobs take around 3 to 4 minutes to complete and 2 or 3 jobs take around 40 minutes to complete. Altogether it takes around 4 to 5 hours to complete. We have an SLA for these jobs to complete within 6 hours. Although most of the days it completes on time, 2 or 3 days in a month, we miss the SLA.

Also, sometimes the job aborts due to more CPU utilization, where the same unix server is used for multiple applications

On days when the job takes more than 6 hours, i am not able to find out what the reason is. The number of records seem to be almost same, and the log doesnt show anything indifferent.

I suspect two things

1. More CPU utilization--I am not sure whether this will affect the speed of the job, will it?

2. More DB usage. I am checking with the DBA, he hasnt responded yet

What are the other parameters we need to look for?
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Other things happening on the machine at the same time is an often overlooked one.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
greggknight
Premium Member
Premium Member
Posts: 120
Joined: Thu Oct 28, 2004 4:24 pm

Post by greggknight »

Yes if you cpu is saturated you will have degragation.

Just a thought.
I did a little testing once.
I have a 4core machine.

So I set up four config.apt
one node
two node
three node
four nodes.

I then ran a batch with some jobs in it on a one node and got my time.
I then ran on two nodes , the time was reduced almost by 50%
I ran on three nodes , the differerance from one node was not three times faster actuallyit was slightly faster then the two node by only a min or so.
I then ran a four nodeand performance actually went the other way.
Reason: because the more nodes you have the more processes that are spawned the more processes the more CPU consumption. You have to find the balance. Your configuration is a good starting place as well as what else is running. If other processes are running and your process max's cpu in it self then everytning will slow down.
Just a thought, I don't know your config but that was my testing results. I use 50% for 4 cores I use 2 nodes and 2 controllers for my resource disks. and a third for my scratch.
Just some thoughts.
You could look at some job designs of the slower jobs there could be some changes that could be done in there as well

Bottom line you just need to analyze the whole process.
"Don't let the bull between you and the fence"

Thanks
Gregg J Knight

"Never Never Never Quit"
Winston Churchill
Post Reply