Transaction Size vs Array Size

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
gayathrisivakumar
Premium Member
Premium Member
Posts: 60
Joined: Thu Dec 17, 2009 1:36 am

Transaction Size vs Array Size

Post by gayathrisivakumar »

Hi,

I would like to know the best values to be kept in Transaction Size and Array size for a datastage update.

I am updating 10 million rows based on one key column. Data input is coming from a sequential file and target is db2 connector.

Also does the option 'keep conductor connections Alive' has some impact on the performance of the job?

Thanks
Last edited by gayathrisivakumar on Thu Oct 03, 2013 11:58 am, edited 1 time in total.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Short answer is - there are no "best values".
Last edited by chulett on Thu Oct 03, 2013 1:15 pm, edited 1 time in total.
-craig

"You can never have too many knives" -- Logan Nine Fingers
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

And only because you have "vs" in your subject... are you clear on what the two settings control?
-craig

"You can never have too many knives" -- Logan Nine Fingers
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

What does "best" mean in your context?
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
chandra.shekhar@tcs.com
Premium Member
Premium Member
Posts: 353
Joined: Mon Jan 17, 2011 5:03 am
Location: Mumbai, India

Post by chandra.shekhar@tcs.com »

Use DB2 Connector Partitioning in the target stage for better performance.
And for Transaction and Array size, give them a decent value i.e. not so big and not so small.
Now you have to determine the best combination for both as everybody have said that there's no best value.
It depends upon various factors like Database load, Datastage Server load, Number of jobs running, complexity of the job etc.

When I say that those values should not be big like ~ 1-2 lakhs, this is because if the job fails then all that data which is not committed will be rolled back.
And if the value is too low ~ 100-200, then it can slow down the database.
So try combination like 5000-10000, 10000-20000 for array-transaction size.
Hope it helps
Last edited by chandra.shekhar@tcs.com on Mon Oct 07, 2013 2:47 am, edited 1 time in total.
Thanx and Regards,
ETL User
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

I disagree with just setting the two values without actually knowing what they entail, the recommendation above will only work for a certain combination of record size, database connection, database and system sizing.

The "Transaction Size" is the number of records after which a commit will be done. Partly the setting is dependent upon what the goal of a job is. If the volumes are low and your job does an all-or-nothing load, then setting the transaction size to 0 would make sense. If the volume is high and/or the record sizes very large, then doing a single commit might result in the transaction log filling up. Thus you need to know your (a) goals (b) sizing (c) logging policy and use those to decide how big your transactions size should be. Generally the larger this value is, the faster your database load will run.

The "Array Size" determines how many records will be written in one "block" sent to the database. This can be both the network transport layer block size and also the database block size. I recall in the early days of DataStage this setting really made a big difference, but recently I seem to see little difference on this setting. The goal is to fit as many records as possible into a block as possible with little wastage.

If we take a 4Kb block and 1Kb records, using an array size of 1 would result in a transport block being only 25% utilized. Optimal would be an array size of 4 in this.
Post Reply