Issues Aborting Jobs

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
jackson.eyton
Premium Member
Premium Member
Posts: 145
Joined: Thu Oct 26, 2017 10:43 am

Issues Aborting Jobs

Post by jackson.eyton »

Hi Guys,
I've been running into some issues lately with debugging jobs. Sometimes I will use the debug breakpoints
and review the data at that point, then stop the job to adjust for whatever I've seen there.

Recently I've been having an issue telling a running job to stop, where the job will not stop. This happens
to me both in the debug and when just running a job normally. The log will show that the SIGINT and SIGTERM
and SIGKILL signals are sent to the process but it never ends. The job indeed stops processing data but it
stays in a running state.

I've left a job in this state overnight and it was still like this. I CAN get it to finally die if I use the
Cleanup Resources option in the Director, then logoff the process associated with the job.

Here is where the REALLY annoying issue comes into play. After I have done this cleanup, 50% of the time,
the job will fail to run ever again until the Engine server is rebooted. The following log is one such job
that I am having this issue with currently. I've opened a case with IBM on it but so far they've not responded
in two days.

The Job log can be found here:
https://raw.githubusercontent.com/jacks ... rorlog.txt
Last edited by jackson.eyton on Fri Dec 22, 2017 3:10 pm, edited 1 time in total.
-Me
asorrell
Posts: 1707
Joined: Fri Apr 04, 2003 2:00 pm
Location: Colleyville, Texas

Post by asorrell »

These kind of issues are usually quite difficult to diagnose. Have you checked for project corruption with SyncProject recently?

The reason you are having to reboot the engine to restore job operation is that some resource tied to the job, like a semaphore
or lock, is not being cleared by the Cleanup Resources option. Rebooting is clearing that, and the job runs again.

I don't think I'd worry about figuring out what resource is being tied up, that's really a symptom. The problem is really the
job hanging when you attempt to stop it.

There's no way for you to diagnose that kind of issue without customer service. What the are going to need you to do is run a
stack trace against a hung job. That will tell them the internal routine currently being run by the job. At that point
they'll have to contact engineering and get them to say what is being executed by that routine.

If you haven't already, I'd suggest running an ISALite on your server, and attaching that, along with the Version.xml's, job dsx
and job logs to your IBM ticket. Engineering won't even look at your problem unless they have an ISALite output to tell them
it is in good working order. They'll need the Version.xml to know exactly what code base to look at internally during diagnosis.

Side note - the weird pagination on your post is caused by the PATH statements with no statements, in a "code" block.
The browser doesn't want to mess with the code, even to insert line breaks, so the window gets VERY wide...
Andy Sorrell
Certified DataStage Consultant
IBM Analytics Champion 2009 - 2020
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Yeah, I usually go in and nuke those lines from the post since they don't really add any value. Just wasn't in the mood when I had the time and didn't have the time when I was in the mood. :wink:
-craig

"You can never have too many knives" -- Logan Nine Fingers
jackson.eyton
Premium Member
Premium Member
Posts: 145
Joined: Thu Oct 26, 2017 10:43 am

Post by jackson.eyton »

Thank you both, I have edited my original post so that the job log is in a better suited location, I usually think of that and can't really remember what I was/wasn't thinking for this post so my apologies there.

I have run the health check for IBM and will continue to work with them. I am not familiar with SyncProject however.
-Me
Post Reply