Running a stage within a job with a different config file

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
abc123
Premium Member
Premium Member
Posts: 605
Joined: Fri Aug 25, 2006 8:24 am

Running a stage within a job with a different config file

Post by abc123 »

I have a job with several stages with the last stage being an Oracle Connector stage. I would like the last stage to run with a different config file than the other stages. I looked into node map constraint but it wasn't of much help. Any ideas?
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Why don't we start with the why of this... what would be different about the configuration for this last stage?
-craig

"You can never have too many knives" -- Logan Nine Fingers
abc123
Premium Member
Premium Member
Posts: 605
Joined: Fri Aug 25, 2006 8:24 am

Post by abc123 »

Nothing would be different. I am just wondering if just the Oracle Connector stage can be run with a higher number of nodes to improve performance. The SQL takes a long time due to the volume of data and was hoping to improve performance.
UCDI
Premium Member
Premium Member
Posts: 383
Joined: Mon Mar 21, 2016 2:00 pm

Post by UCDI »

if you run into something that cannot be set as a parameter, and there are a very few things that are like this, you can make 2 copies of the parallel job with the different setups and call the one you want dynamically in the sequence job.

that said I can't recall if this is something you can stuff into a parameter or not.

From memory only (I don't do a lot of this kind of tampering) I think you can put these config file names into parameter sets and pick the one you want at the sequence level and pass it down from there. I know we have a couple of jobs where we can pick 1cpu or 4 this way.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Ah, okay. Perhaps you can handle this on the database side? With possibly a parallel hint or setting a default level of parallelism on the tables involved. Each node would get the portion it has to work on and that work could be further "parallel-ized" by Oracle.
-craig

"You can never have too many knives" -- Logan Nine Fingers
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Ah, okay. Perhaps you can handle this on the database side? With possibly a parallel hint or setting a default level of parallelism on the tables involved. Each node would get the portion it has to work on and that work could be further "parallel-ized" by Oracle.
-craig

"You can never have too many knives" -- Logan Nine Fingers
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

You could run the whole job using a configuration file with the higher number of nodes, but use node pool(s) to constrain the other stages to execute in a smaller subset of those nodes.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
PaulVL
Premium Member
Premium Member
Posts: 1315
Joined: Fri Dec 17, 2010 4:36 pm

Post by PaulVL »

More is not always better.

Look to see how you are accessing that target Oracle system.

Are you doing updates?
Are you using bulk load?
Is the table truncated before you are loading it?
Is your data pre-sorted based upon the keys of your table?
Any locks on that table during your load time?

Have you spoken to your Oralce DBA to see if runstats/reorg needs to be done on the table so that you don't waste time.

I have seen where the target Database was the bottleneck because it was super busy servicing customer queries at the time of load. So if you put even MORE work on it by upping the parallelism, you could be impacting customers.


=============

I have NEVER seen a need to push out any target Connector stage to it's own APT file.

You'd be repartitioning data left and right if your parallelism changed.

You'd potentially be crossing host names if your APT file pushed it to a different host. (TCP/IP traffic is slower than memory to memory traffic)

Your DataStage admin would slap you if you were running on a grid install and hard coded an APT file just for that stage.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

All good points. :wink:

Came back here after realizing my simplistic suggestion wasn't really appropriate, guess I was thinking more from a source standpoint than a target. D'oh. I too should have asked about the nature of the target load you are doing - updates versus inserts versus a mixture of the two - and what seems to be the issue / bottleneck. Are you sure the target is being fed fast enough? Have you tried replacing the target Oracle connector with a Sequential file stage to check?
-craig

"You can never have too many knives" -- Logan Nine Fingers
Post Reply