Multiple Input links and Performance

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
sensiva
Premium Member
Premium Member
Posts: 21
Joined: Tue Aug 22, 2017 10:39 am

Multiple Input links and Performance

Post by sensiva »

Hi,

It has been around 8 months that I started to work in datastage and dsxchange role was very vital. Now that i am comfortable with datastage basics, would like to go a little further and learn more.

I did learn that it would be performant to split the update and insert operations for DB rather choosing "insert then update" which I implemented in my jobs with 2 DB connectors (one for insert and one for update) and its really good. Recently i came across jobs that contain multiple input links to a single oracle connector with one link doing update and another doing insert. I did learn that mulitple links are made to a connector to handle all the actions in a single unit of work to better maintain transactions, which i totally agree.

I wanted to know if that(multiple links) would equally enhance the performance as well? because in our project context, we dont need to maintain the transaction, we need to insert or update millions of records.. I did make a test but at times the one with 2 connector perform better and at times the single connector with multiple link perform better. I am yet to make a full fledged test, but thought of posting here to get you views.

Would it not be that we overload the connector with different actions which could potentially degrade the performance ? May be i am wrong. It would be great if you could show some light on this topic..

Thanks
sen
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Excellent question and thanks for the background information, saves a lot of possible back-and-forth information gathering. 8)

I expect people with good experience to chime in on this but let me put something out there. These kinds of questions are really hard to answer as there are SO many variables at play that will effect this, variable that are specific to your shop. In terms of general advice, one man's meat can be another man's poison, so to speak. Tip of the iceberg things to consider - database configuration (including per user and per task limits), hardware capabilities, network speed, shared vs dedicated resources, volume fluctuations, degree of SQL parallelism used, etc. etc. etc. etc. Heck, even time of day plays a role here since we have to contend with crazy BI processes running once the sun comes up. :wink:

I would think in a very general sense that it probably doesn't matter significantly whether you use a single connector doing multiple tasks or several dedicated connectors, assuming your Oracle configuration allows that. But that's just a gut feeling after decades of Oracle and ETL. Curious what others think.
-craig

"You can never have too many knives" -- Logan Nine Fingers
qt_ky
Premium Member
Premium Member
Posts: 2895
Joined: Wed Aug 03, 2011 6:16 am
Location: USA

Post by qt_ky »

I personally do not know the answer but I suspect that it would not matter one way or the other. You initial tests show inconsistent results which probably just indicates the common real-world scenario that there is other varying workload activity on the database server and/or the DataStage server.

Given that, i would choose the job design that will be the easiest to document and maintain for other current or future developers who may be new to the tool. Developers tend to come and go a lot more frequently than server hardware is upgraded to improve overall performance. I don't know about your organization, but in ours, the cost of labor is by far the most expensive cost.
Choose a job you love, and you will never have to work a day in your life. - Confucius
Post Reply