I have a job that takes rows from a hashed file and deleted records from a DB2 (9.5) table. The target table has 122 million rows. The deletion process is very slow - rarely above 5 rows/second. Each week, the hashed file is refreshed, and contains about 1 million rows. The deletes are done in order to clear out old records before loading new ones.
I've thought about loading the contents of the hashed file into a staging table, then letting the DB do the work (delete from target_table where exists in staging table), but creating a new table in my environment is a huge chore, taking weeks.
I've thought about abandoning the delete-then-insert approach, and just use Change Data Capture to load the target table (replacing existing rows if necessary), but the CDC stage seems to load all 122m rows of the target table, which takes hours.
Can anyone suggest :
- different way of doing the delete
- different way to do the CDC
- I'm ok with a server job or Px solution !
Deleting DB2 table's rows very slow
Moderators: chulett, rschirm, roy
Deleting DB2 table's rows very slow
Last edited by sec105105 on Thu Jun 13, 2013 8:51 am, edited 1 time in total.
Can you access your DBA and have her/him trace what is happening during your deletes? They might [should] discover a method, within DB2, to speed up your processing.
At what speed can you write your new records to this DB2 table?
At what speed can you write your new records to this DB2 table?
<a href=http://www.worldcommunitygrid.org/team/ ... TZ9H4CGVP1 target="WCGWin">
</a>
</a>
Does your database access allow you to do an explain plan or session trace in DB2? That would help a lot in explaining why your deletes are so slow. I would first try to solve that problem in the database, and if that fails then think of DS alternatives.
<a href=http://www.worldcommunitygrid.org/team/ ... TZ9H4CGVP1 target="WCGWin">
</a>
</a>
When we do the subsequent insert, it goes around 200+ rows/second.
When the delete occurs, it's feeding the 2 leading columns of a 3-column Primary key. Multiple rows from the target table may be deleted for each key that's fed, but that's OK.
EG: 700K keys may result in 1.5m rows deleted.
I will ask the DBA if it would be better to feed the full set of 1.5m (3-column) keys, so there will be a 1:1 key:rows ratio.
I've contacted the DBA - hopefully I will be able to run the job, and have them monitor it. But if anyone has any other ideas, I'm all ears!
When the delete occurs, it's feeding the 2 leading columns of a 3-column Primary key. Multiple rows from the target table may be deleted for each key that's fed, but that's OK.
EG: 700K keys may result in 1.5m rows deleted.
I will ask the DBA if it would be better to feed the full set of 1.5m (3-column) keys, so there will be a 1:1 key:rows ratio.
I've contacted the DBA - hopefully I will be able to run the job, and have them monitor it. But if anyone has any other ideas, I'm all ears!
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Yes:
It's feeding the 2 leading columns of a 3-column Primary key. Multiple rows from the target table may be deleted for each key that's fed, but that's OK.
It appears there's an average 120 rows for each row passed. So in reality, the job is passing 4 rows/second to the DB2 stage, but probably 4-500 rows are actually being deleted from the table.
I have re-jigged the job to provide all three columns for the delete SQL.
the row/second is now averaging 170 rows/second (which translates into 170 rows/second being deleted from the table.
Again, this is all for a delete-then-insert approach. If there's a way to do CDC without loading the entire 120m rows from the target table...
It's feeding the 2 leading columns of a 3-column Primary key. Multiple rows from the target table may be deleted for each key that's fed, but that's OK.
It appears there's an average 120 rows for each row passed. So in reality, the job is passing 4 rows/second to the DB2 stage, but probably 4-500 rows are actually being deleted from the table.
I have re-jigged the job to provide all three columns for the delete SQL.
the row/second is now averaging 170 rows/second (which translates into 170 rows/second being deleted from the table.
Again, this is all for a delete-then-insert approach. If there's a way to do CDC without loading the entire 120m rows from the target table...