Delete then insert option

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
dnat
Participant
Posts: 200
Joined: Thu Sep 06, 2007 2:06 am

Delete then insert option

Post by dnat »

Hi,

I am using "Delete then insert" option to load data in target Oracle table. Source is a .csv file. Every file is identified with an id and data needs to be replaced if we get the same file again even with different data.

I have defined the key as the file id and load option as "Delete then insert". So, ideally it should delete all the records in the table with the id and then insert. B

I had a file with around 3000 records and when i ran the job, the log says all 3000 has been inserted, but there were only 400 records in the table. I changed the option to insert only and it inserted 3000 records without any problem.

What could be causing the problem.

With 100 records there are no issues, but the problem started with loading more than 2000 records.
rschirm
Premium Member
Premium Member
Posts: 27
Joined: Fri Dec 13, 2002 2:53 pm

Post by rschirm »

The Delete then Insert logic works like this.

Row of data comes in it goes to database and deletes records based on the keys specified. Then inserts the current row. Next row of data comes in and goes to database to delete records based on the keys specified and then inserts the row.

If I were to guess I would say that you have approx. 400 rows with unique key values.
dnat
Participant
Posts: 200
Joined: Thu Sep 06, 2007 2:06 am

Post by dnat »

My data is like this

ID value
1 abcd
1 defg
1 rtrt
1 6767
1 7887

As such there is no key in the table. I am specifiying the ID column as the key column in datastage for delete then insert option.

My assumption was that When I give Delete then insert based on key(ID), it deletes all the records with the ID=key(in this case 1) and insert all the records.

So if it works row by row then the first record would get deleted when it tries to insert the 2nd record. When the data is small around 1000 it works, but with just 3000 records it is having this problem.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

It doesn't work, it just appears to with small volumes and your assumption as to how it would work is incorrect, as noted by Rick. I would suggest two passes through the data or perhaps two targets, controlling which is executed first. First pass or target would get all of the distinct "key" values from the data and delete all existing records in the target with those keys. Then a second pass to insert all of the source records.
-craig

"You can never have too many knives" -- Logan Nine Fingers
dnat
Participant
Posts: 200
Joined: Thu Sep 06, 2007 2:06 am

Post by dnat »

Thanks Rick and Craig!!..And looks like the array size is the one responsible for the job working good when record count is low. Since array count was set to 2000, it didnt cause any problem when it processed 1000 records or so. I tested it by increasing it to 5000 and it worked fine if the record count was below 5000.

But anyway, I changed the design to have a before sql statement to delete based on the key and changed the Write mode to "insert new rows"
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Again, that depends on your definition of "works fine". :wink:
-craig

"You can never have too many knives" -- Logan Nine Fingers
Post Reply