More than 9 active oracle connections in a datastage job

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
deepticr
Participant
Posts: 32
Joined: Wed Mar 19, 2008 7:01 am
Location: Bangalore, India

More than 9 active oracle connections in a datastage job

Post by deepticr »

Hi,

While extracting data from a given source system (sourcing 10 tables), we decided to have a single ETL job for it. The source and target are both databases and so, to extract and load the 10 tables we used 20 oracle enterprise stages in a single job.

When executing the job, we encountered the following error:

Insert statement didnot prepare correctly....Unable to find a free dynamic SQL instance. Instance Number must be 0<-1<9.

1. Does this mean to say that a single datastage job can support only 9 active db connections at a time?

2. Also, could anyone suggest the pros and cons of placing all the extracts in a single job as against having 10 ETLs, one for each entity?
datastage
Participant
Posts: 229
Joined: Wed Oct 23, 2002 10:10 am
Location: Omaha

Re: More than 9 active oracle connections in a datastage job

Post by datastage »

deepticr wrote: 2. Also, could anyone suggest the pros and cons of placing all the extracts in a single job as against having 10 ETLs, one for each entity?

The only pro I can think of in a design like this is that you have fewer total jobs. You have 1 job instead of 10, some people are annoyed by many jobs, so it is potentially fewer objects to migrate from dev to test to prod, or fewer to manage in a source code control system. The datastage repository and director applications can slow down in projects with a very large number of jobs, but this is minimal.

DataStage best practices tends to recommend an architecture with the loads split out into 10 single jobs. There are many reasons and hopefully others will give some more thoughts, but the first thing to comes to my mind is restartability. These 10 loads may vary greatly in time.. the fastest tables may load in 2 minutes while you may have a job that takes an hour. So there is a scenario where 8 or 9 of the loads may have finished and there is an abort on the final one or two... You would now have restart all of them in recovery since they are in the same job, and this may be placing unwanted burden on the source or target databases and would also add extra processing to the ETL server since you are redoing work that didn't need to be redone in the case of the loads that finished.

Also, while at the moment it may seem fine to combine all in one job with no obvious reason why it would change, environments and requirements always change and there may be points in the future when these 10 loads need to occur at different points and thus you have to start splitting up the original job due to dependencies. Also, there may be cases when a developer needs to modify one of the loads and then another developer needs to modify a different load in that job, and now the second developer is locked out and has to wait. So even if it seems simple now to have them all in one, as the years go by you'll find issues like the above causing a problem where a design of 10 separate jobs would makes those issues easier for the ETL team to deal with
Byron Paul
WARNING: DO NOT OPERATE DATASTAGE WITHOUT ADULT SUPERVISION.

"Strange things are afoot in the reject links" - from Bill & Ted's DataStage Adventure
Post Reply