Issues Aborting Jobs
Posted: Thu Dec 21, 2017 9:12 am
Hi Guys,
I've been running into some issues lately with debugging jobs. Sometimes I will use the debug breakpoints
and review the data at that point, then stop the job to adjust for whatever I've seen there.
Recently I've been having an issue telling a running job to stop, where the job will not stop. This happens
to me both in the debug and when just running a job normally. The log will show that the SIGINT and SIGTERM
and SIGKILL signals are sent to the process but it never ends. The job indeed stops processing data but it
stays in a running state.
I've left a job in this state overnight and it was still like this. I CAN get it to finally die if I use the
Cleanup Resources option in the Director, then logoff the process associated with the job.
Here is where the REALLY annoying issue comes into play. After I have done this cleanup, 50% of the time,
the job will fail to run ever again until the Engine server is rebooted. The following log is one such job
that I am having this issue with currently. I've opened a case with IBM on it but so far they've not responded
in two days.
The Job log can be found here:
https://raw.githubusercontent.com/jacks ... rorlog.txt
I've been running into some issues lately with debugging jobs. Sometimes I will use the debug breakpoints
and review the data at that point, then stop the job to adjust for whatever I've seen there.
Recently I've been having an issue telling a running job to stop, where the job will not stop. This happens
to me both in the debug and when just running a job normally. The log will show that the SIGINT and SIGTERM
and SIGKILL signals are sent to the process but it never ends. The job indeed stops processing data but it
stays in a running state.
I've left a job in this state overnight and it was still like this. I CAN get it to finally die if I use the
Cleanup Resources option in the Director, then logoff the process associated with the job.
Here is where the REALLY annoying issue comes into play. After I have done this cleanup, 50% of the time,
the job will fail to run ever again until the Engine server is rebooted. The following log is one such job
that I am having this issue with currently. I've opened a case with IBM on it but so far they've not responded
in two days.
The Job log can be found here:
https://raw.githubusercontent.com/jacks ... rorlog.txt