Sequence job aborted after 'Waiting for job to start'

A forum for discussing DataStage<sup>®</sup> basics. If you're not sure where your question goes, start here.

Moderators: chulett, rschirm, roy

Post Reply
Palermo
Participant
Posts: 24
Joined: Fri Dec 12, 2014 2:05 pm
Location: Belarus
Contact:

Sequence job aborted after 'Waiting for job to start'

Post by Palermo »

Hi all,

I faced a serious challenge. Sequence jobs have been aborting for the whole week and this happens with different sequence jobs chaotically. Before the jobs worked fine for the last 6 months.

Here is an example of log. As you can see, TRIGGERS_JobSeq (1) ran PRCSSD_TRGGR_IND_SET_Y_PJob (2) but the (1) aborted and (2) finished. Starting time=120 seconds.

Image

Image

After rerunning (1) they both finished successfully:

Image

What was done?
1) DS server was restarted
2) DSWaitStartup and DSWaitStartup were changed to 120 (although the log doesn't show us any errors related to timeout. Why not? if this is the problem.)

Please advise how to fix it? Many thanks, in advance, for your help.
UCDI
Premium Member
Premium Member
Posts: 383
Joined: Mon Mar 21, 2016 2:00 pm

Post by UCDI »

did anything else change? running more jobs, server OS update, anything like that? How many jobs are running, and how many are allowed (operations console)?
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Ah yes, the proverbial question - what changed? Obviously something did.

Typically, when I see someone post about seemingly random issues with jobs not starting within the timeout limit but then run fine later, it is almost always an indication of a resource issue on the server. So I have the same questions as UCDI posted...
-craig

"You can never have too many knives" -- Logan Nine Fingers
Palermo
Participant
Posts: 24
Joined: Fri Dec 12, 2014 2:05 pm
Location: Belarus
Contact:

Post by Palermo »

UCDI,

At this time 44 jobs are running. OS was not updated. The Workload Management was disabled 1,5 year ago and I am not sure that the following parameters limit a number of running jobs: T30FILE=4096, RLTABSZ=480 (Maximum running jobs=900)

CHulett - I agree with you. Support team opened PMR ticket to monitor and estimate Server resources.

Thanks.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Those settings tend to limit the number of running jobs by causing any over the limit to blow up... and throw very specific errors pointing to them as the culprit, from what I recall. They're not the issue.

And realize that the "resource issue" isn't confined to just what DataStage things are running on the server...
-craig

"You can never have too many knives" -- Logan Nine Fingers
Palermo
Participant
Posts: 24
Joined: Fri Dec 12, 2014 2:05 pm
Location: Belarus
Contact:

Post by Palermo »

The support team reported that was GSKit. Now the problem was solved.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

GSKit? Not something that's ever been posted here before, can you (or anyone else) elaborate a bit? Thanks.
-craig

"You can never have too many knives" -- Logan Nine Fingers
JRodriguez
Premium Member
Premium Member
Posts: 425
Joined: Sat Nov 19, 2005 9:26 am
Location: New York City
Contact:

Post by JRodriguez »

Latest versions of IIS use a Global Security kit (GSkit) for both encryption and SSL communication...by default.

It will be nice to find more on the root cause and the solution. I could forsee that if two version of the GSkit got installed in the server could cause issues ( two DS version side by side with itag??, or other IBM products using a different version of the GSkit) ...
Julio Rodriguez
ETL Developer by choice

"Sure we have lots of reasons for being rude - But no excuses
Palermo
Participant
Posts: 24
Joined: Fri Dec 12, 2014 2:05 pm
Location: Belarus
Contact:

Post by Palermo »

I don't know the details because I am a developer and I was not involved in solving the problem.
Post Reply