Delete then insert to replace a set of records

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
PhilHibbs
Premium Member
Premium Member
Posts: 1044
Joined: Wed Sep 29, 2004 3:30 am
Location: Nottingham, UK
Contact:

Delete then insert to replace a set of records

Post by PhilHibbs »

I have a requirement to replace a set of records in a table. The parent key is a bigint called DIN and I want to delete all the records for that DIN and then re-load the new set. Can I do this with a "Delete then Insert" operation, just setting the DIN as the key column? I thought that this was all that was needed, but it is leaving me with a random subset of the records that should be there. I am sorting the data.
Phil Hibbs | Capgemini
Technical Consultant
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

Phil,

if the original DIN key had 10 records, and you only replace 4 of them, the remaining 6 should be unchanged (and thus incorrect for your implementation)

I wonder if you could remove these with a before-stage SQL to delete and commit, then an insert of the new records..
Last edited by ArndW on Mon Sep 24, 2012 7:42 am, edited 1 time in total.
PhilHibbs
Premium Member
Premium Member
Posts: 1044
Joined: Wed Sep 29, 2004 3:30 am
Location: Nottingham, UK
Contact:

Post by PhilHibbs »

Surely the "delete" statement will be based on the "key" columns, and thus all 10 records would be deleted? If it isn't based on the columns that have "key" checked, then what is the delete statement based on? It can't surely put every value in the row in the "where" clause? There could be hundreds of very large columns.

The before-stage would have to know all the keys to delete, which in my case is hundreds of thousands of keys. I don't want to leave the table half empty in between the execution of two separate jobs.
Phil Hibbs | Capgemini
Technical Consultant
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Yes, the Key columns go into the where clause of the delete so it will be based on it. However, as Arnd notes only values you send in will be deleted. You'd need to clarify the relationship between what you are trying to delete versus the "random subset" left behind.

For example:

1234
1234
1234
4567
4567
9876
9876

If you sent in three rows, one with each of the key values above, all seven would be deleted. Is that not what you are seeing?

Also, what stage and database? The Oracle Connector states this for "Delete then Insert":
For each input record, the connector first tries to delete the matching rows in the target table. Regardless of whether rows were actually deleted or not, the connector then runs the insert statement to insert the record as a new row in the target table.
Note the delete is noted as plural and 'matching' would be based on your key column(s).
-craig

"You can never have too many knives" -- Logan Nine Fingers
PhilHibbs
Premium Member
Premium Member
Posts: 1044
Joined: Wed Sep 29, 2004 3:30 am
Location: Nottingham, UK
Contact:

Post by PhilHibbs »

OK so if DB2 works the same way as Oracle in this respect, I should only have one row in the database, the last one that was inserted. Yet I have three or four rows out of five. I'm baffled.

So if "Delete then insert" isn't what I want, how do I refresh a set of rows for a parent key?
Phil Hibbs | Capgemini
Technical Consultant
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

Hmm... I wonder how it truly works. Is your database primary key just the DIN column or are there other columns as part of the PK?
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

That's not supposed to matter but who knows? Worst case it seems you could do the deletes first from the data and then the inserts as either separate steps or as properly ordered targets.
-craig

"You can never have too many knives" -- Logan Nine Fingers
BI-RMA
Premium Member
Premium Member
Posts: 463
Joined: Sun Nov 01, 2009 3:55 pm
Location: Hamburg

Post by BI-RMA »

ArndW wrote:Hmm... I wonder how it truly works. Is your database primary key just the DIN column or are there other columns as part of the PK?
This should not matter, because DataStage does not need the database primary key for update and delete operations. It generates the SQL-clauses based on the key-column-definition on the target-stage only. If this is identical with the database primary key it should just speed up the operation.
"It is not the lucky ones are grateful.
There are the grateful those are happy." Francis Bacon
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

I am fully aware of that, I am just looking at explaining this job's behaviour.
PhilHibbs
Premium Member
Premium Member
Posts: 1044
Joined: Wed Sep 29, 2004 3:30 am
Location: Nottingham, UK
Contact:

Post by PhilHibbs »

chulett wrote:That's not supposed to matter but who knows? Worst case it seems you could do the deletes first from the data and then the inserts as either separate steps or as properly ordered targets.
The table does have a unique key that is the DIN plus another date column.

What do you mean "as properly ordered targets"? *Edit* OK I get it, multiple input links to the stage.
Last edited by PhilHibbs on Mon Sep 24, 2012 10:47 am, edited 1 time in total.
Phil Hibbs | Capgemini
Technical Consultant
Post Reply