restartability in db2 enterprise stage

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
lakshmipriya
Participant
Posts: 31
Joined: Tue Jul 13, 2004 5:26 am
Location: chennai
Contact:

restartability in db2 enterprise stage

Post by lakshmipriya »

While writing in the DB2 Enterprise stage, if the job is aborted what is the flexibility of restarting it by deleting the written records in the previous unsuccessful run in the same job other than looking up the same table again in the same job before writing it or running one more job for solving this purpose.

Is there any feature in this stage to solve this problem
Lakshmi
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Best practice dictates that every record loaded is marked with some form of identification of the load run, perhaps a batch ID, perhaps a timestamp.
If you do this, unwinding inserts becomes a simple constrained DELETE statement.

If you've stage a before-image of the table you're loading, unwinding becomes an only slightly more complex operation using rows from the before-image. This would the approach to take if you ever perform UPDATE or DELETE operations into DB2.

There's nothing in the ETL tool; what you're asking to do isn't strictly ETL. So you have to plan and to design your recovery strategy.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
dsxdev
Participant
Posts: 92
Joined: Mon Sep 20, 2004 8:37 am

Re: restartability in db2 enterprise stage

Post by dsxdev »

lakshmipriya wrote:While writing in the DB2 Enterprise stage, if the job is aborted what is the flexibility of restarting it by deleting the written records in the previous unsuccessful run in the same job other than looking up the same table again in the same job before writing it or running one more job for solving this purpose.

Is there any feature in this stage to solve this problem
In DB2 stage you have options Update Commit Interval and Row Commit Interval. These are available if the write mode is Upsert and only Row Commit is available when write mode is write(insert only)

Row Commit Interval would commit after specified no of records were inserted and if job aborts it rolls back these records.
Happy DataStaging
lakshmipriya
Participant
Posts: 31
Joined: Tue Jul 13, 2004 5:26 am
Location: chennai
Contact:

Post by lakshmipriya »

I am using write mode. Thanks of ur great response, i hope this would be help full for me. Could you please through some more light on what will be the maximum commit level that we can give to a DB2
Lakshmi
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

That's a question best posed to your DB2 DBA who must configure sufficient space to handle transactions of the size you desire. Or, more likely, to tell you to use smaller transactions. :cry:
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

You face a larger issue with a partitioned database load. On high volumes you simply cannot issue a commit at the end, as the data is independently streaming to a node. PX bypasses the coordinator node because it has the ability to read the system catalog. It automatically partitions the data to that of the database and fires off independent processes to directly write each partition. You will not be able to "commit at the end" because of the independent nature of this activity.

What you will have to do is have an audit number in your target table that will identify each row to a given load cycle, like a batch number. To make a self-correcting and restartable load job, you'll have to query the database to find rows you've already loaded and take them out of the load set as you stream it.

What we just finished designing for a major customer of DataStage PX just a restartable template job. The job takes the load-ready data and cuts out the primary key and loads it into an ETL work table in the target and inner-joins it to the target table, extracting any keys already there into a filtering stream. The load data then streams against that filter and only lets thru rows that aren't in that filter stream. To optimize this process, we added a job parameter that would be set to the state of that jobs run to indicate if it's the first try or a retry run. In the case of a first try, no primary key rows get written to the ETL work table, thus causing that extra stream of logic to abbreviate immediately and let all rows pass thru efficiently. A lot of work, but necessary to achieve a high-performance restartable load template job.
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
Post Reply