Running jobs in parallel

A forum for discussing DataStage<sup>®</sup> basics. If you're not sure where your question goes, start here.

Moderators: chulett, rschirm, roy

Post Reply
bhushan
Participant
Posts: 4
Joined: Mon Nov 16, 2015 3:55 am

Running jobs in parallel

Post by bhushan »

Interesting question.

i have 10 separate parallel jobs and when i run those jobs from sequencer or anything. i want all jobs to be run @ same time .not sequencially one after another.can anyone help me to find out this solution :D

i dont want include all 10 parallel jobs in one job all jobs are separate with 10 different tables..

Thanks in advance
UCDI
Premium Member
Premium Member
Posts: 383
Joined: Mon Mar 21, 2016 2:00 pm

Post by UCDI »

want aside, put all 10 in a sequence job with NO links between them and they will all start at once. Starting 10 jobs at once may or may not punish your server, depending on your systems and the job complexity.

you can also use director or whatever your company uses to schedule jobs to kick them off at the same time.

same time is relative of course. When you get down into the nanoseconds, maybe not, but you can start them all within a second or so easily. If you need sub-second, you may need something more powerful.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Most schedulers will allow you to do this.

Sequence jobs will not even come close, unless (perhaps) you start your jobs using Execute Command activities to start background processes running dsjob command line.

But that gives you no convenient way to monitor them, except Director or Operations Console.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
UCDI
Premium Member
Premium Member
Posts: 383
Joined: Mon Mar 21, 2016 2:00 pm

Post by UCDI »

Is that an environment thing Ray? I have found that un-connected seq jobs kick of a small number of parallel jobs more or less at the same time (again, if not concerned with a few seconds between real start times). I had been doing it that way but there is no dependency on mine, the run at the same time approach was for trimming the actual job run time down vs running them one at a time.
FranklinE
Premium Member
Premium Member
Posts: 739
Joined: Tue Nov 25, 2008 2:19 pm
Location: Malvern, PA

Post by FranklinE »

There's a few posts here about the runtime engine making jobs wait to run due to resource limitations. I don't remember the correct search parameters for finding them, but I do recall that there's an environment variable which governs this limit.

Putting all of the px jobs in one job sequence, no links, will effectively attempt to run all of them concurrently. The alternative is to have your scheduler invoke the separate jobs together, though that will also mean more parent processes.
Franklin Evans
"Shared pain is lessened, shared joy increased. Thus do we refute entropy." -- Spider Robinson

Using mainframe data FAQ: viewtopic.php?t=143596 Using CFF FAQ: viewtopic.php?t=157872
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Are you perhaps thinking of Workload and/or Queue Management?
-craig

"You can never have too many knives" -- Logan Nine Fingers
FranklinE
Premium Member
Premium Member
Posts: 739
Joined: Tue Nov 25, 2008 2:19 pm
Location: Malvern, PA

Post by FranklinE »

I don't have admin rights, Craig, but it might be related to APT_PM_NODE_TIMEOUT (Startup timeout). It defaults to blank, and as I recall there's a default setting for it somewhere for which this variable is an override.
Franklin Evans
"Shared pain is lessened, shared joy increased. Thus do we refute entropy." -- Spider Robinson

Using mainframe data FAQ: viewtopic.php?t=143596 Using CFF FAQ: viewtopic.php?t=157872
bhushan
Participant
Posts: 4
Joined: Mon Nov 16, 2015 3:55 am

datastage 11.5

Post by bhushan »

UCDI,

"apology for late reply"
Tried with no links in sequencer but when i ran jobs all wont run at time 1 runs remaining 9 are goes in queue.tried this solution.
thanks in advance :lol:
bhushan
Participant
Posts: 4
Joined: Mon Nov 16, 2015 3:55 am

Post by bhushan »

"Thanks ray"
let me clarify the problem

10 parallel jobs in one sequencer
each parallel job takes 15 mins to complete.
when i trigger 10 parallel jobs using one sequencer one job runs and remaining 9 jobs comes with status queued till the first job is finished.same happens for 150 mins.
All the 10 jobs status should be in running state once the sequencer is triggered so that it wont run sequentially rather it should run parallelly.
Timato
Participant
Posts: 24
Joined: Tue Sep 30, 2014 10:51 pm

Post by Timato »

If 1 runs and 9 is queued, then it means in your jobs resources are hitting the CPU and/or memory limit designated in the operations console. From memory it defaults to 80% of each, so perhaps some job tuning is in order and nagging the infrastructure providers to beef up the DataStage engine host?
cdp
Premium Member
Premium Member
Posts: 113
Joined: Tue Dec 15, 2009 9:28 pm
Location: New Zealand

Post by cdp »

Example of a Job sequence that allows us to load up-to 10 tables at the same time (250+ tables)

Image

First job just creates 10 flat files for looping through list of tables.. You don't need this.
How about you try something like what is in the green square ? (Sequencer set to 'All' -> Your n Jobs you want to run in parallel -> Sequencer set to 'All')
UCDI
Premium Member
Premium Member
Posts: 383
Joined: Mon Mar 21, 2016 2:00 pm

Post by UCDI »

Timato wrote:If 1 runs and 9 is queued, then it means in your jobs resources are hitting the CPU and/or memory limit designated in the operations console. From memory it defaults to 80% of each, so perhaps some job tuning is in order and nagging the infrastructure providers to beef up the DataStage engine host?
There are also limits on # of jobs that can kick off at once. It could also be that, but it seems really unlikely that the setup has this so low.

I second this post, either your jobs are very inefficient, your hardware is very weak, or your settings are wrong, one of those 3 things (or a combination of them) is causing it to not work right in the no-link sequence job. Start with the settings, that is easy to fix. The check the job to see if it needs a simple change to be more efficient. Buying a new server is the absolute last thing to consider.


I have a realtime background (not database stuff). As a rule of thumb, if you try to run more processes than you have cpu cores, you are usually causing more problems than you are solving. You typically want 1 core totally free for OS level stuff and as a fudge factor for randomness on the box. The rest you want each one dedicated to burning thru stuff, with as little context switching and interruptions as possible. Long story short -- datastage parallelism is difficult to deal with. Each job is starting all kinds of processes, and those are by default mostly going to be running in parallel internally because datastage is a parallel tool. Running 10 at once, that becomes 30 or more processes running at once, each of those is trying to split across the CPUs, ... it quickly becomes less efficient to do this. Specifics of your hardware make it difficult to say where the sweet spot is, but its very likely that running 10 at once will take longer than running say 2 or 3 at once in smaller groups. If there is some interdependent thing going on and they NEED to run in parallel, then you do it, otherwise, you may want to experiment with how many you can run at once to get good results.
Post Reply