I met a warning msg "job control process (pid xxxx) has failed" and then the job abort. After search in the IBM, I found this.
Problem(Abstract)
Sequence job control process (pid xxxx) has failed
Cause
Sequence job run continuously in a loop, appends to the dsenv after each run, causing the length of your LD_LIBRARY_PATH (Sun/Linux), LIBPATH (AIX), LIB_PATH (HPUX) environment variable, to exceeded 8192 bytes.
Diagnosing the problem
If, after actioning steps in Technote http://www-01.ibm.com/support/docview.w ... wg21397247, the issue persists and you are running a Sequence job continuously in a loop, then the next action is to check the length of your LD_LIBRARY_PATH (Sun/Linux), LIBPATH (AIX), LIB_PATH (HPUX) environment variable, ensure the length this string has NOT exceeded 8192 bytes.
If it has, then the likely cause is that the dsenv is being sourced continuously in a loop as well.
Resolving the problem
Set the environment settings outside the loop (or) set the absolute-strings (such as "LD_LIBRARY_PATH=<all-paths>", but do not append this with :$LD_LIBRARY_PATH, which can cause the path-settings to get repeated on multiple-runs & finally cause the crash.
job control process (pid xxxx) has failed
Moderators: chulett, rschirm, roy
Re: how to understand this error
I design a seq job, which only contains a routine. In the routine, firstly I trigger job A, B, C, D to run one by one.(use a for loop)
And then I have a for loop from 1-9, to submit job index1...index9 to run parallelly.
This is the log where it abort.
[info]a..JobControl (DSRunJob): Waiting for job index1 to start
[warn]Job control process (pid 28967492) has failed
And then I have a for loop from 1-9, to submit job index1...index9 to run parallelly.
This is the log where it abort.
[info]a..JobControl (DSRunJob): Waiting for job index1 to start
[warn]Job control process (pid 28967492) has failed
wuruimao
Re: how to understand this error
I simply rerun the job ,without change. Now it's processing job 1-9. no error.
wuruimao
So... the sequence job itself had the PID failure or one of the jobs it attempted to run had the failure? For the latter, anything in that job's log?
For an intermittent error like this, something you can't reproduce, in your shoes I would involve support.
For an intermittent error like this, something you can't reproduce, in your shoes I would involve support.
-craig
"You can never have too many knives" -- Logan Nine Fingers
"You can never have too many knives" -- Logan Nine Fingers
We actually dislike this kind of support tickets. "It failed, and then work again, do our work for us!"
Get a consultant to help diagnosis the system issue, if you do not have the appropriate resource that is skilled enough to do an evaluation of your server. Do not lean on IBM Support without specific details, "Why is running x, y, z producing action a, b, c on this server?"
Tickets that complain that it failed then worked, with no further investigation done, will most likely require specific consulting assistance to be done. It is your server, which is so unlike most of our other customers' servers, with different settings, configurations, and software installed. We need you to investigate how you set it up, and find out what is going on on the system level, before we can help explain the why.
Get a consultant to help diagnosis the system issue, if you do not have the appropriate resource that is skilled enough to do an evaluation of your server. Do not lean on IBM Support without specific details, "Why is running x, y, z producing action a, b, c on this server?"
Tickets that complain that it failed then worked, with no further investigation done, will most likely require specific consulting assistance to be done. It is your server, which is so unlike most of our other customers' servers, with different settings, configurations, and software installed. We need you to investigate how you set it up, and find out what is going on on the system level, before we can help explain the why.
thanks for ur long response.
The job was failed with a message I could not understand, eventhough I get the explaination in the IBM website, I could not make it clear, that's why I send the post here. I suspect this is a server resource issue, but who knows. After the rerun the job resumed, I just want to know "what the error means".
The job was failed with a message I could not understand, eventhough I get the explaination in the IBM website, I could not make it clear, that's why I send the post here. I suspect this is a server resource issue, but who knows. After the rerun the job resumed, I just want to know "what the error means".
Teej wrote:We actually dislike this kind of support tickets. "It failed, and then work again, do our work for us!"
Get a consultant to help diagnosis the system issue, if you do not have the appropriate resource that is skilled enough to do an evaluation of your server. Do not lean on IBM Support without specific details, "Why is running x, y, z producing action a, b, c on this server?"
Tickets that complain that it failed then worked, with no further investigation done, will most likely require specific consulting assistance to be done. It is your server, which is so unlike most of our other customers' servers, with different settings, configurations, and software installed. We need you to investigate how you set it up, and find out what is going on on the system level, before we can help explain the why.
wuruimao