load Leveler Error: There is no active resource manager job.

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
bobyon
Premium Member
Premium Member
Posts: 200
Joined: Tue Mar 02, 2004 10:25 am
Location: Salisbury, NC

load Leveler Error: There is no active resource manager job.

Post by bobyon »

Recently we have been having jobs fail very intermittently with the following messages:

Code: Select all

<Dynamic_grid.sh>Error: There is no active resource manager job with name pj_crt_fact_pr_invstmnt_pr_intrm_stg_files_4186, or this job has been already terminated. Exiting...

Code: Select all

<Dynamic_grid.sh>Error: Job error, submit error.

Code: Select all

Parallel job reports failure (code 1)
And there is virtually nothing else in the DS Director log that helps.

Have you seen anything like this before? Do you know a cause or resolution, or even where to find more information regarding the error messages?


Unfortunately our Support Provider (IBM) is not much help since the product is no longer supported. We are building new servers and moving to Platform LSF as a replacement but just didn't get there before this started happenning.

Thanks
Bob
PaulVL
Premium Member
Premium Member
Posts: 1315
Joined: Fri Dec 17, 2010 4:36 pm

Post by PaulVL »

Go look in your grid job dir for that job submittion. There may be info in the stdout or maybe an error file.

Run a diagnostic command and hit each server in your grid.

go to $GRIDHOME and run that test.sh script. Modify it span multiple nodes.

My guess is that one of your mounts is missing on a compute node. So it could not kick off the handshaking script used in the grid enablement toolkit.
Post Reply