Anyway to force job to run on all compute nodes on a grid?

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
bobyon
Premium Member
Premium Member
Posts: 200
Joined: Tue Mar 02, 2004 10:25 am
Location: Salisbury, NC

Anyway to force job to run on all compute nodes on a grid?

Post by bobyon »

I am suspsicious that the ulimit settings are not as they should be across all of our compute nodes in the grid. I would like to create a sequence/job that will run on each compute node in the grid and return the results of a couple ulimit statements. I have the few statements that are required (ulimit-Sa; ulimit -Ha) in a shell script.

I would like the process to be as generic as possible so I can port it to different environments (grids) and run without modification.

However, the only way I can think of to control which node a job runs on is via a config file and I don't really want to create a bunch of config files.

Am I brain dead? Is there a way to do this? Or am I reinventing a wheel here and there is some other way than datastage to do this?

TIA
Bob
Bob
deepak.hsbc
Participant
Posts: 39
Joined: Sun Apr 15, 2007 11:30 pm

Re: Anyway to force job to run on all compute nodes on a gri

Post by deepak.hsbc »

I used peek stage to display result on each node and using ulimit -ah in job start subrutine,that should work for you.
"Books are as useful to a stupid person as a mirror is useful to a Blind person."
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

When you create configuration files for grid execution you don't specify the exact nodes, only the number that you require (at two levels). The grid management software actually allocates machines to nodes. This is fully documented in the manuals.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
bobyon
Premium Member
Premium Member
Posts: 200
Joined: Tue Mar 02, 2004 10:25 am
Location: Salisbury, NC

Post by bobyon »

Thanks for the response Ray. I do understand the dynamic nature of building the config file.

What I am trying to find is a way to coax the grid management software to running my job on each of the compute nodes so that I can see the ulimit settings on each and every node.

My thought regarding creating config files would have required NOT grid enabling the job.
Bob
lstsaur
Participant
Posts: 1139
Joined: Thu Oct 21, 2004 9:59 pm

Post by lstsaur »

So you mean that you want to run a non-grid job on every compute node in a grid?
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

OK, an explicit configuration file will do it. Create a job that uses an External Source stage to execute hostname ; ulimit -a and capture all the lines of output into wherever makes sense. This operator should execute on each compute node.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
bobyon
Premium Member
Premium Member
Posts: 200
Joined: Tue Mar 02, 2004 10:25 am
Location: Salisbury, NC

Post by bobyon »

lstsaur wrote:So you mean that you want to run a non-grid job on every compute node in a grid?
Actually it does not matter to me if it is a grid enabled job or not. As long as I can get the job to run on all the nodes in the grid.

All I am trying to do is confirm the ulimit settings on each server.
Bob
bobyon
Premium Member
Premium Member
Posts: 200
Joined: Tue Mar 02, 2004 10:25 am
Location: Salisbury, NC

Post by bobyon »

ray.wurlod wrote:OK, an explicit configuration file will do it. Create a job that uses an External Source stage to execute hostname ; ulimit -a and capture all the lines of output into wherever makes sense. This operator should execute on each compute node.
Now we are getting to the heart of my question. I have the job that captures the ulimit output. But, how do I get it to run on all the compute nodes?

Is the only way to put one job in a sequence for each compute passing an explicit config file for each job?
Bob
PaulVL
Premium Member
Premium Member
Posts: 1315
Joined: Fri Dec 17, 2010 4:36 pm

Post by PaulVL »

Why don't you just ask your administrator to confirm the settings according to the user id you are using?
bobyon
Premium Member
Premium Member
Posts: 200
Joined: Tue Mar 02, 2004 10:25 am
Location: Salisbury, NC

Post by bobyon »

Well, 2 reasons basically:
1 - If you are referring to a Unix Admin, because they have no way to see what the vaule of ulimit is as seen from a DataStage job. It is often different from what is seen by just issuing commands on the unix command line.
and
2 - If you are referring to a DataStage Admin, because I am the admin however unfortunate that might be. :-)
Bob
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

bobyon wrote:Is the only way to put one job in a sequence for each compute passing an explicit config file for each job?
That, or a job that has a configuration file mentioning every compute node.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
daignault
Premium Member
Premium Member
Posts: 165
Joined: Tue Mar 30, 2004 2:44 pm
Contact:

Post by daignault »

We have a large grid, so what we do is create APT files for each compute node on the grid.

Disable the GRID thru the environment variable and then resubmit the job for each compute node.

Ray D
Post Reply