Hi,
I would like to know the best values to be kept in Transaction Size and Array size for a datastage update.
I am updating 10 million rows based on one key column. Data input is coming from a sequential file and target is db2 connector.
Also does the option 'keep conductor connections Alive' has some impact on the performance of the job?
Thanks
Transaction Size vs Array Size
Moderators: chulett, rschirm, roy
-
- Premium Member
- Posts: 60
- Joined: Thu Dec 17, 2009 1:36 am
Transaction Size vs Array Size
Last edited by gayathrisivakumar on Thu Oct 03, 2013 11:58 am, edited 1 time in total.
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
-
- Premium Member
- Posts: 353
- Joined: Mon Jan 17, 2011 5:03 am
- Location: Mumbai, India
Use DB2 Connector Partitioning in the target stage for better performance.
And for Transaction and Array size, give them a decent value i.e. not so big and not so small.
Now you have to determine the best combination for both as everybody have said that there's no best value.
It depends upon various factors like Database load, Datastage Server load, Number of jobs running, complexity of the job etc.
When I say that those values should not be big like ~ 1-2 lakhs, this is because if the job fails then all that data which is not committed will be rolled back.
And if the value is too low ~ 100-200, then it can slow down the database.
So try combination like 5000-10000, 10000-20000 for array-transaction size.
Hope it helps
And for Transaction and Array size, give them a decent value i.e. not so big and not so small.
Now you have to determine the best combination for both as everybody have said that there's no best value.
It depends upon various factors like Database load, Datastage Server load, Number of jobs running, complexity of the job etc.
When I say that those values should not be big like ~ 1-2 lakhs, this is because if the job fails then all that data which is not committed will be rolled back.
And if the value is too low ~ 100-200, then it can slow down the database.
So try combination like 5000-10000, 10000-20000 for array-transaction size.
Hope it helps
Last edited by chandra.shekhar@tcs.com on Mon Oct 07, 2013 2:47 am, edited 1 time in total.
Thanx and Regards,
ETL User
ETL User
I disagree with just setting the two values without actually knowing what they entail, the recommendation above will only work for a certain combination of record size, database connection, database and system sizing.
The "Transaction Size" is the number of records after which a commit will be done. Partly the setting is dependent upon what the goal of a job is. If the volumes are low and your job does an all-or-nothing load, then setting the transaction size to 0 would make sense. If the volume is high and/or the record sizes very large, then doing a single commit might result in the transaction log filling up. Thus you need to know your (a) goals (b) sizing (c) logging policy and use those to decide how big your transactions size should be. Generally the larger this value is, the faster your database load will run.
The "Array Size" determines how many records will be written in one "block" sent to the database. This can be both the network transport layer block size and also the database block size. I recall in the early days of DataStage this setting really made a big difference, but recently I seem to see little difference on this setting. The goal is to fit as many records as possible into a block as possible with little wastage.
If we take a 4Kb block and 1Kb records, using an array size of 1 would result in a transport block being only 25% utilized. Optimal would be an array size of 4 in this.
The "Transaction Size" is the number of records after which a commit will be done. Partly the setting is dependent upon what the goal of a job is. If the volumes are low and your job does an all-or-nothing load, then setting the transaction size to 0 would make sense. If the volume is high and/or the record sizes very large, then doing a single commit might result in the transaction log filling up. Thus you need to know your (a) goals (b) sizing (c) logging policy and use those to decide how big your transactions size should be. Generally the larger this value is, the faster your database load will run.
The "Array Size" determines how many records will be written in one "block" sent to the database. This can be both the network transport layer block size and also the database block size. I recall in the early days of DataStage this setting really made a big difference, but recently I seem to see little difference on this setting. The goal is to fit as many records as possible into a block as possible with little wastage.
If we take a 4Kb block and 1Kb records, using an array size of 1 would result in a transport block being only 25% utilized. Optimal would be an array size of 4 in this.
<a href=http://www.worldcommunitygrid.org/team/ ... TZ9H4CGVP1 target="WCGWin">
</a>
</a>