I have a job that takes rows from a hashed file and deleted records from a DB2 (9.5) table. The target table has 122 million rows. The deletion process is very slow - rarely above 5 rows/second. Each week, the hashed file is refreshed, and contains about 1 million rows. The deletes are done in order to clear out old records before loading new ones.
I've thought about loading the contents of the hashed file into a staging table, then letting the DB do the work (delete from target_table where exists in staging table), but creating a new table in my environment is a huge chore, taking weeks.
I've thought about abandoning the delete-then-insert approach, and just use Change Data Capture to load the target table (replacing existing rows if necessary), but the CDC stage seems to load all 122m rows of the target table, which takes hours.
Can anyone suggest :
- different way of doing the delete
- different way to do the CDC
- I'm ok with a server job or Px solution !
Deleting DB2 table's rows very slow
Moderators: chulett, rschirm, roy
Deleting DB2 table's rows very slow
Last edited by sec105105 on Thu Jun 13, 2013 8:51 am, edited 1 time in total.
When we do the subsequent insert, it goes around 200+ rows/second.
When the delete occurs, it's feeding the 2 leading columns of a 3-column Primary key. Multiple rows from the target table may be deleted for each key that's fed, but that's OK.
EG: 700K keys may result in 1.5m rows deleted.
I will ask the DBA if it would be better to feed the full set of 1.5m (3-column) keys, so there will be a 1:1 key:rows ratio.
I've contacted the DBA - hopefully I will be able to run the job, and have them monitor it. But if anyone has any other ideas, I'm all ears!
When the delete occurs, it's feeding the 2 leading columns of a 3-column Primary key. Multiple rows from the target table may be deleted for each key that's fed, but that's OK.
EG: 700K keys may result in 1.5m rows deleted.
I will ask the DBA if it would be better to feed the full set of 1.5m (3-column) keys, so there will be a 1:1 key:rows ratio.
I've contacted the DBA - hopefully I will be able to run the job, and have them monitor it. But if anyone has any other ideas, I'm all ears!
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Yes:
It's feeding the 2 leading columns of a 3-column Primary key. Multiple rows from the target table may be deleted for each key that's fed, but that's OK.
It appears there's an average 120 rows for each row passed. So in reality, the job is passing 4 rows/second to the DB2 stage, but probably 4-500 rows are actually being deleted from the table.
I have re-jigged the job to provide all three columns for the delete SQL.
the row/second is now averaging 170 rows/second (which translates into 170 rows/second being deleted from the table.
Again, this is all for a delete-then-insert approach. If there's a way to do CDC without loading the entire 120m rows from the target table...
It's feeding the 2 leading columns of a 3-column Primary key. Multiple rows from the target table may be deleted for each key that's fed, but that's OK.
It appears there's an average 120 rows for each row passed. So in reality, the job is passing 4 rows/second to the DB2 stage, but probably 4-500 rows are actually being deleted from the table.
I have re-jigged the job to provide all three columns for the delete SQL.
the row/second is now averaging 170 rows/second (which translates into 170 rows/second being deleted from the table.
Again, this is all for a delete-then-insert approach. If there's a way to do CDC without loading the entire 120m rows from the target table...