Insert records from where it last failed

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
dsxchuser
Participant
Posts: 1
Joined: Tue Mar 19, 2013 1:24 pm

Insert records from where it last failed

Post by dsxchuser »

Is there a possibility that a parallel job can pick the records from where the job last failed inserting to a database without actually looking up on the target table or just updating the old record?

What if the job aborted after inserting 900,000 records and committed. The job still has 100,000 rows to be inserted.

Does "Running from point of failure" achieve this?
This doubt was expressed lot of times everywhere but without a satisfactory answer.


Thanks,
DSX User.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Welcome aboard.

The short answer is "No".

You have to design in a mechanism to "remember" which rows were inserted if you don't want to query the target.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
harishkumar.upadrasta
Participant
Posts: 18
Joined: Tue Dec 25, 2012 10:39 pm
Location: Detroit,MI

Post by harishkumar.upadrasta »

Hi Ray,

Will this be a possible solution to eliminate the issue of re insertion / updation even a job loading huge volume fails.

Job 1:

1. Partition the source data using round robin partitioning method into different datasets. Perferably to the number of nodes available to make even division.

Job 2:

This job will run in a loop which reads data from the data sets one after another and load the data the target. The file name will be stored in a temporary file for every loop. If the job fails we will resume only from the last failed file, hence eliminating re updating the data which is already loaded.

Kindly suggest if this will work out?
Harish
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

No, because that's not how Round Robin partitioning works. In any case, what happens if the failure occurs half way through a Data Set?
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
harishkumar.upadrasta
Participant
Posts: 18
Joined: Tue Dec 25, 2012 10:39 pm
Location: Detroit,MI

Post by harishkumar.upadrasta »

We load this in a sequence with Checkpointing set.

Job 1 is loaded only once per load, hence the data extraction happens only once from the source.

Then Job 2 will run in Loop.

If for example DS1, DS2, DS3 and DS4 were created and if DS1 and DS2 were loaded and job failes during DS3 Load then we have to reload only DS3 and then DS4.

For DS3 any ways we have to reload the data from the beginning.
We can save reloading time for DS1 and DS2.

This is a suggestible approach only if the data volume is very huge.
Harish
prasannakumarkk
Participant
Posts: 117
Joined: Wed Feb 06, 2013 9:24 am
Location: Chennai,TN, India

Post by prasannakumarkk »

Is it going to insert always?
Do you have any surrogate key or increasing sequence number in the table? If that is the case get the max number before load and have a filter in between dataset and connector stating key greater than max.

And what is the database that you are loading? Does it have table partition?
Thanks,
Prasanna
BI-RMA
Premium Member
Premium Member
Posts: 463
Joined: Sun Nov 01, 2009 3:55 pm
Location: Hamburg

Post by BI-RMA »

harishkumar.upadrasta wrote:If for example DS1, DS2, DS3 and DS4 were created and if DS1 and DS2 were loaded and job failes during DS3 Load then we have to reload only DS3 and then DS4.
Yes. But You still need a mechanism to find out, which records from DS3 were already loaded to the database and which were not. So you either have to query the target and use change-capture or you have to "remember" the last sort-key-value loaded successfully to the database and load only values higher than that.
"It is not the lucky ones are grateful.
There are the grateful those are happy." Francis Bacon
daignault
Premium Member
Premium Member
Posts: 165
Joined: Tue Mar 30, 2004 2:44 pm
Contact:

Post by daignault »

You don't mention the volume of data and that has a part in designing the solution.

You could always create a dataset which contains ONLY KEYS for the data being inserted.

When the job loads, load the keys as a lookup and then perform a lookup on the table. If the key exists, ignore the row. If it does not exist, perform an insert.

Of course I would much rather examine why your job is failing enough that you need to be concerned about building this type of logic into the job.
Post Reply