Delete then insert to replace a set of records

PhilHibbs · Post by **PhilHibbs** » Mon Sep 24, 2012 4:24 am

I have a requirement to replace a set of records in a table. The parent key is a bigint called DIN and I want to delete all the records for that DIN and then re-load the new set. Can I do this with a "Delete then Insert" operation, just setting the DIN as the key column? I thought that this was all that was needed, but it is leaving me with a random subset of the records that should be there. I am sorting the data.

ArndW · Post by **ArndW** » Mon Sep 24, 2012 6:29 am

Phil,

if the original DIN key had 10 records, and you only replace 4 of them, the remaining 6 should be unchanged (and thus incorrect for your implementation)

I wonder if you could remove these with a before-stage SQL to delete and commit, then an insert of the new records..

PhilHibbs · Post by **PhilHibbs** » Mon Sep 24, 2012 6:39 am

Surely the "delete" statement will be based on the "key" columns, and thus all 10 records would be deleted? If it isn't based on the columns that have "key" checked, then what is the delete statement based on? It can't surely put every value in the row in the "where" clause? There could be hundreds of very large columns.

The before-stage would have to know all the keys to delete, which in my case is hundreds of thousands of keys. I don't want to leave the table half empty in between the execution of two separate jobs.

chulett · Post by **chulett** » Mon Sep 24, 2012 6:46 am

Yes, the Key columns go into the where clause of the delete so it will be based on it. However, as Arnd notes only values you send in will be deleted. You'd need to clarify the relationship between what you are trying to delete versus the "random subset" left behind.

For example:

1234
1234
1234
4567
4567
9876
9876

If you sent in three rows, one with each of the key values above, all seven would be deleted. Is that not what you are seeing?

Also, what stage and database? The Oracle Connector states this for "Delete then Insert":

For each input record, the connector first tries to delete the matching rows in the target table. Regardless of whether rows were actually deleted or not, the connector then runs the insert statement to insert the record as a new row in the target table.

Note the delete is noted as plural and 'matching' would be based on your key column(s).

PhilHibbs · Post by **PhilHibbs** » Mon Sep 24, 2012 7:37 am

OK so if DB2 works the same way as Oracle in this respect, I should only have one row in the database, the last one that was inserted. Yet I have three or four rows out of five. I'm baffled.

So if "Delete then insert" isn't what I want, how do I refresh a set of rows for a parent key?

ArndW · Post by **ArndW** » Mon Sep 24, 2012 7:43 am

Hmm... I wonder how it truly works. Is your database primary key just the DIN column or are there other columns as part of the PK?

chulett · Post by **chulett** » Mon Sep 24, 2012 8:00 am

That's not supposed to matter but who knows? Worst case it seems you could do the deletes first from the data and then the inserts as either separate steps or as properly ordered targets.

BI-RMA · Post by **BI-RMA** » Mon Sep 24, 2012 8:11 am

ArndW wrote:Hmm... I wonder how it truly works. Is your database primary key just the DIN column or are there other columns as part of the PK?

This should not matter, because DataStage does not need the database primary key for update and delete operations. It generates the SQL-clauses based on the key-column-definition on the target-stage only. If this is identical with the database primary key it should just speed up the operation.

ArndW · Post by **ArndW** » Mon Sep 24, 2012 8:28 am

I am fully aware of that, I am just looking at explaining this job's behaviour.

PhilHibbs · Post by **PhilHibbs** » Mon Sep 24, 2012 10:41 am

chulett wrote:That's not supposed to matter but who knows? Worst case it seems you could do the deletes first from the data and then the inserts as either separate steps or as properly ordered targets.

The table does have a unique key that is the DIN plus another date column.

What do you mean "as properly ordered targets"? *Edit* OK I get it, multiple input links to the stage.