Deleting DB2 table's rows very slow

A forum for discussing DataStage<sup>®</sup> basics. If you're not sure where your question goes, start here.

Moderators: chulett, rschirm, roy

Post Reply
sec105105
Participant
Posts: 44
Joined: Fri Mar 20, 2009 7:21 am
Location: Ottawa

Deleting DB2 table's rows very slow

Post by sec105105 »

I have a job that takes rows from a hashed file and deleted records from a DB2 (9.5) table. The target table has 122 million rows. The deletion process is very slow - rarely above 5 rows/second. Each week, the hashed file is refreshed, and contains about 1 million rows. The deletes are done in order to clear out old records before loading new ones.
I've thought about loading the contents of the hashed file into a staging table, then letting the DB do the work (delete from target_table where exists in staging table), but creating a new table in my environment is a huge chore, taking weeks.
I've thought about abandoning the delete-then-insert approach, and just use Change Data Capture to load the target table (replacing existing rows if necessary), but the CDC stage seems to load all 122m rows of the target table, which takes hours.

Can anyone suggest :
- different way of doing the delete
- different way to do the CDC
- I'm ok with a server job or Px solution !
Last edited by sec105105 on Thu Jun 13, 2013 8:51 am, edited 1 time in total.
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

Can you access your DBA and have her/him trace what is happening during your deletes? They might [should] discover a method, within DB2, to speed up your processing.

At what speed can you write your new records to this DB2 table?
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

Does your database access allow you to do an explain plan or session trace in DB2? That would help a lot in explaining why your deletes are so slow. I would first try to solve that problem in the database, and if that fails then think of DS alternatives.
sec105105
Participant
Posts: 44
Joined: Fri Mar 20, 2009 7:21 am
Location: Ottawa

Post by sec105105 »

When we do the subsequent insert, it goes around 200+ rows/second.

When the delete occurs, it's feeding the 2 leading columns of a 3-column Primary key. Multiple rows from the target table may be deleted for each key that's fed, but that's OK.

EG: 700K keys may result in 1.5m rows deleted.
I will ask the DBA if it would be better to feed the full set of 1.5m (3-column) keys, so there will be a 1:1 key:rows ratio.

I've contacted the DBA - hopefully I will be able to run the job, and have them monitor it. But if anyone has any other ideas, I'm all ears!
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Is there an index over the columns named in the WHERE clause of the DELETE statement?
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
sec105105
Participant
Posts: 44
Joined: Fri Mar 20, 2009 7:21 am
Location: Ottawa

Post by sec105105 »

Yes:
It's feeding the 2 leading columns of a 3-column Primary key. Multiple rows from the target table may be deleted for each key that's fed, but that's OK.

It appears there's an average 120 rows for each row passed. So in reality, the job is passing 4 rows/second to the DB2 stage, but probably 4-500 rows are actually being deleted from the table.

I have re-jigged the job to provide all three columns for the delete SQL.
the row/second is now averaging 170 rows/second (which translates into 170 rows/second being deleted from the table.

Again, this is all for a delete-then-insert approach. If there's a way to do CDC without loading the entire 120m rows from the target table...
Post Reply