Db2 bulk load job- Performance Issue

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
SachinCho
Participant
Posts: 45
Joined: Thu Jan 14, 2010 1:23 am
Location: Pune

Db2 bulk load job- Performance Issue

Post by SachinCho »

Hi,
We have a multi-instance Db2 bulk loading job which reads a dataset and loads data to Db2 target via DB2 connector stage. No other stages involved in job except these two
Other details as below
- Approximate volume it handles is around 13 to 15 million records
- Time consumed normally - 1 hr 30 mins plus always. Occasionally 2 hrs+
- Write Mode - Bulk Load
- Table Action - Append
- Transaction and array size - 50000 (although its insignificant in this case I believe)
- Load control properties
Load mode - Insert
Load method - Named Pipe
Save count - 0
Non recoverable load - Yes

If we observe detailed log in director after job start it stays at "db2_Insert_Target,0: DB2 Load is starting ... " and completes only after timeframe mentioned above. Rows/second value observed is around 2800

Queries are -
Is this acceptable behaviour or job should finish much earlier having straight bulk load?
Any issues with respect to parameters configured in db2 connector stage?

Happy to provide any additional information required.

Thanks,
Sachin
Sachin C
SachinCho
Participant
Posts: 45
Joined: Thu Jan 14, 2010 1:23 am
Location: Pune

Post by SachinCho »

Hi All,
Any pointers to the above issue will be much appreciated.

Additional information on this job - RCP enabled for this one

Happy to provide additional information required
Sachin C
asorrell
Posts: 1707
Joined: Fri Apr 04, 2003 2:00 pm
Location: Colleyville, Texas

Post by asorrell »

1) You haven't said anything about average record size... and that's one of the most critical pieces of information.
2) How many records total are in the table?
3) Is this DB2 z/OS or regular DB2?
4) Do you access this database with any other jobs? Do you have ANY jobs that access this database with good (or great) performance?
5) Why do you think Array / Transaction size is inconsequential?
Andy Sorrell
Certified DataStage Consultant
IBM Analytics Champion 2009 - 2020
SachinCho
Participant
Posts: 45
Joined: Thu Jan 14, 2010 1:23 am
Location: Pune

Post by SachinCho »

Thank you Andy for reply.
1. Record size is as below. Nothing indicating a big record size
Field 1 - integer
Field 2 - varchar(250)
Field 3 - date
Field 4 - time
Field 5 - timestamp
Field 6 - integer
Field 7 - integer
Field 8 - integer
Field 9 - timestamp
Field 10 -char(1)

2. Target table count is something I need to check
3. Regular DB2 - Database server = DB2/AIX64 10.1.3
4. We are getting better performance in another similar job which loads around 30-40 million records in 1/2 an hour or so
5. What I understood from reading DB2 connector load control properties is that its not transaction size but savecount property that defines consistency points in case of bulk load. Looks like I am not interpreting it correctly

Based on 4th point above I had a look at better performing job and only difference in these two jobs is array size and transaction size. The better performing job has it as 2000 while this job creating issue has it as 50000

Time to tweak transaction size and array size in bad performing job??

Thanks,
Sachin
Sachin C
asorrell
Posts: 1707
Joined: Fri Apr 04, 2003 2:00 pm
Location: Colleyville, Texas

Post by asorrell »

I haven't done a bulk load to DB2 in ages, and I can't check it at the moment, so this is basically from memory...

The total size of the table and the number of indicies can be important. If I remember correctly, it has to rebuild every index after the bulk load completes - which can take some time and affect the overall job completion time (which lowers RPS).

Also - I don't remember that Array size was ignored. If it isn't, then it is set WAY too large and would be negatively affecting performance.

If someone with more recent experience can chime in... please do so!
Andy Sorrell
Certified DataStage Consultant
IBM Analytics Champion 2009 - 2020
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Sorry, not one of the chimers requested as my experience is primarily with Oracle not DB2 but...

If it's anything like Oracle you can't bulk load into a table with indexes, if they exist they must be dropped and rebuilt afterwards. There are times when that fact alone rules it out as the added speed of the load is far offset by the time taken to rebuild indexes.

And I'm not sure why the OP mentions Array Size and Transaction Size together all the time as if they were one setting. A bulk load is not 'transactional' and thus does not issue commits but it sounds like DB2 has a 'save count' property? Not sure that has much if any impact on performance.

Now, as noted and in my experience, Array Size is an issue no matter what kind of load you are doing and I've seen plenty of examples in the past where setting it too high had an adverse affect on performance. I think you'll find some formulas posted here to compute it based on packet size and ARL but sometimes there's more art than science involved. Finding the 'sweet spot' can involve trying a series of sizes in a repeatable scenario, tracking the results. Crank it up until it starts to degrade and then back it back down. This rather than just pulling a number out of... the air.

Still looking for someone with more recent (and relevant) experience to chime in. :wink:
-craig

"You can never have too many knives" -- Logan Nine Fingers
SachinCho
Participant
Posts: 45
Joined: Thu Jan 14, 2010 1:23 am
Location: Pune

Post by SachinCho »

Thank you Andy and Craig for your replies. We have started investigating from db2 tables perspective in terms of counts etc. and I have below information

Code: Select all

                             Table1                            Table2
Record count                 7595154159                        13564905740
daily avg load volume        50-70 million                     10-20 million
Time taken                   20-30 mins                        100 mins plus
load method                  bulk-append                       bulk-append
Commit size                  2000                              2000 (recently changed)
Index                        only single column                four columns
So as rightly pointed by Andy, it might be something to do with existing records in the table and indexes. Doing more investigation and will keep posted in this forum as well

Apologies for the formatting. Need to learn that

Thanks,
Sachin
Sachin C
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Code: Select all

 tags preserve whitespace, followed by lots of 'preview' passes until it all lines up.  :wink:
-craig

"You can never have too many knives" -- Logan Nine Fingers
Post Reply