Running jobs in parallel

bhushan · Post by **bhushan** » Fri Jun 09, 2017 12:09 am

Interesting question.

i have 10 separate parallel jobs and when i run those jobs from sequencer or anything. i want all jobs to be run @ same time .not sequencially one after another.can anyone help me to find out this solution :D

i dont want include all 10 parallel jobs in one job all jobs are separate with 10 different tables..

Thanks in advance

UCDI · Post by **UCDI** » Fri Jun 09, 2017 12:39 pm

want aside, put all 10 in a sequence job with NO links between them and they will all start at once. Starting 10 jobs at once may or may not punish your server, depending on your systems and the job complexity.

you can also use director or whatever your company uses to schedule jobs to kick them off at the same time.

same time is relative of course. When you get down into the nanoseconds, maybe not, but you can start them all within a second or so easily. If you need sub-second, you may need something more powerful.

ray.wurlod · Post by **ray.wurlod** » Fri Jun 09, 2017 8:27 pm

Most schedulers will allow you to do this.

Sequence jobs will not even come close, unless (perhaps) you start your jobs using Execute Command activities to start background processes running dsjob command line.

But that gives you no convenient way to monitor them, except Director or Operations Console.

UCDI · Post by **UCDI** » Mon Jun 12, 2017 8:59 am

Is that an environment thing Ray? I have found that un-connected seq jobs kick of a small number of parallel jobs more or less at the same time (again, if not concerned with a few seconds between real start times). I had been doing it that way but there is no dependency on mine, the run at the same time approach was for trimming the actual job run time down vs running them one at a time.

FranklinE · Post by **FranklinE** » Mon Jun 12, 2017 9:04 am

There's a few posts here about the runtime engine making jobs wait to run due to resource limitations. I don't remember the correct search parameters for finding them, but I do recall that there's an environment variable which governs this limit.

Putting all of the px jobs in one job sequence, no links, will effectively attempt to run all of them concurrently. The alternative is to have your scheduler invoke the separate jobs together, though that will also mean more parent processes.

chulett · Post by **chulett** » Mon Jun 12, 2017 11:30 am

Are you perhaps thinking of Workload and/or Queue Management?

FranklinE · Post by **FranklinE** » Mon Jun 12, 2017 12:46 pm

I don't have admin rights, Craig, but it might be related to APT_PM_NODE_TIMEOUT (Startup timeout). It defaults to blank, and as I recall there's a default setting for it somewhere for which this variable is an override.

bhushan · Post by **bhushan** » Tue Jun 13, 2017 12:50 am

UCDI,

"apology for late reply"
Tried with no links in sequencer but when i ran jobs all wont run at time 1 runs remaining 9 are goes in queue.tried this solution.
thanks in advance

bhushan · Post by **bhushan** » Tue Jun 13, 2017 1:10 am

"Thanks ray"
let me clarify the problem

10 parallel jobs in one sequencer
each parallel job takes 15 mins to complete.
when i trigger 10 parallel jobs using one sequencer one job runs and remaining 9 jobs comes with status queued till the first job is finished.same happens for 150 mins.
All the 10 jobs status should be in running state once the sequencer is triggered so that it wont run sequentially rather it should run parallelly.

Timato · Post by **Timato** » Tue Jun 13, 2017 3:42 am

If 1 runs and 9 is queued, then it means in your jobs resources are hitting the CPU and/or memory limit designated in the operations console. From memory it defaults to 80% of each, so perhaps some job tuning is in order and nagging the infrastructure providers to beef up the DataStage engine host?

cdp · Post by **cdp** » Tue Jun 13, 2017 4:03 pm

Example of a Job sequence that allows us to load up-to 10 tables at the same time (250+ tables)

First job just creates 10 flat files for looping through list of tables.. You don't need this.
How about you try something like what is in the green square ? (Sequencer set to 'All' -> Your n Jobs you want to run in parallel -> Sequencer set to 'All')

UCDI · Post by **UCDI** » Thu Jun 15, 2017 9:06 am

Timato wrote:If 1 runs and 9 is queued, then it means in your jobs resources are hitting the CPU and/or memory limit designated in the operations console. From memory it defaults to 80% of each, so perhaps some job tuning is in order and nagging the infrastructure providers to beef up the DataStage engine host?

There are also limits on # of jobs that can kick off at once. It could also be that, but it seems really unlikely that the setup has this so low.

I second this post, either your jobs are very inefficient, your hardware is very weak, or your settings are wrong, one of those 3 things (or a combination of them) is causing it to not work right in the no-link sequence job. Start with the settings, that is easy to fix. The check the job to see if it needs a simple change to be more efficient. Buying a new server is the absolute last thing to consider.

I have a realtime background (not database stuff). As a rule of thumb, if you try to run more processes than you have cpu cores, you are usually causing more problems than you are solving. You typically want 1 core totally free for OS level stuff and as a fudge factor for randomness on the box. The rest you want each one dedicated to burning thru stuff, with as little context switching and interruptions as possible. Long story short -- datastage parallelism is difficult to deal with. Each job is starting all kinds of processes, and those are by default mostly going to be running in parallel internally because datastage is a parallel tool. Running 10 at once, that becomes 30 or more processes running at once, each of those is trying to split across the CPUs, ... it quickly becomes less efficient to do this. Specifics of your hardware make it difficult to say where the sweet spot is, but its very likely that running 10 at once will take longer than running say 2 or 3 at once in smaller groups. If there is some interdependent thing going on and they NEED to run in parallel, then you do it, otherwise, you may want to experiment with how many you can run at once to get good results.

DSXchange

Running jobs in parallel

Running jobs in parallel

datastage 11.5