Page 1 of 1

Multiple Input links and Performance

Posted: Wed Jun 20, 2018 7:23 am
by sensiva
Hi,

It has been around 8 months that I started to work in datastage and dsxchange role was very vital. Now that i am comfortable with datastage basics, would like to go a little further and learn more.

I did learn that it would be performant to split the update and insert operations for DB rather choosing "insert then update" which I implemented in my jobs with 2 DB connectors (one for insert and one for update) and its really good. Recently i came across jobs that contain multiple input links to a single oracle connector with one link doing update and another doing insert. I did learn that mulitple links are made to a connector to handle all the actions in a single unit of work to better maintain transactions, which i totally agree.

I wanted to know if that(multiple links) would equally enhance the performance as well? because in our project context, we dont need to maintain the transaction, we need to insert or update millions of records.. I did make a test but at times the one with 2 connector perform better and at times the single connector with multiple link perform better. I am yet to make a full fledged test, but thought of posting here to get you views.

Would it not be that we overload the connector with different actions which could potentially degrade the performance ? May be i am wrong. It would be great if you could show some light on this topic..

Thanks

Posted: Wed Jun 20, 2018 8:50 am
by chulett
Excellent question and thanks for the background information, saves a lot of possible back-and-forth information gathering. 8)

I expect people with good experience to chime in on this but let me put something out there. These kinds of questions are really hard to answer as there are SO many variables at play that will effect this, variable that are specific to your shop. In terms of general advice, one man's meat can be another man's poison, so to speak. Tip of the iceberg things to consider - database configuration (including per user and per task limits), hardware capabilities, network speed, shared vs dedicated resources, volume fluctuations, degree of SQL parallelism used, etc. etc. etc. etc. Heck, even time of day plays a role here since we have to contend with crazy BI processes running once the sun comes up. :wink:

I would think in a very general sense that it probably doesn't matter significantly whether you use a single connector doing multiple tasks or several dedicated connectors, assuming your Oracle configuration allows that. But that's just a gut feeling after decades of Oracle and ETL. Curious what others think.

Posted: Thu Jun 21, 2018 5:52 am
by qt_ky
I personally do not know the answer but I suspect that it would not matter one way or the other. You initial tests show inconsistent results which probably just indicates the common real-world scenario that there is other varying workload activity on the database server and/or the DataStage server.

Given that, i would choose the job design that will be the easiest to document and maintain for other current or future developers who may be new to the tool. Developers tend to come and go a lot more frequently than server hardware is upgraded to improve overall performance. I don't know about your organization, but in ours, the cost of labor is by far the most expensive cost.