dsjob Command - Handling Error Scenario

A forum for discussing DataStage<sup>®</sup> basics. If you're not sure where your question goes, start here.

Moderators: chulett, rschirm, roy

Post Reply
rbk
Participant
Posts: 23
Joined: Wed Oct 23, 2013 1:10 am
Location: India

dsjob Command - Handling Error Scenario

Post by rbk »

Dear All,
Another day, another request for suggestions and help.

I am trying to create a UNIX script to trigger the jobs. These scripts would then be scheduled to run as per the requirements. I have encountered some trouble with certain scenarios and would like the advice of the experts here.

Please note that I am basically testing out error scenarios.

Background:
1. All the jobs that will be triggered through this script will be sequencers with restartability options enabled. (All the 4 properties in the Job properties & exception handling is also in place).
2. A conscious decision has been made not to reset any jobs from the script since it would remove the restartability capability of the jobs. In case where we definitely need to reset, it would be done by the Support team.
3. Sequencers have parameter sets with different value files for each environment. The value file name is read from a text file in the script.

Scenario:
1. Execute the job with an invalid value file name. The job is aborting as expected and returns the code 3.
2. The director seems to be in Aborted Status and not in Aborted/Restartable status.
3. On triggering the script again after providing the correct value file name, the jobs is not getting triggered and exiting with status code of -2 (DSJE_BADSTATE).
4. I assumed that this might be the case, Since we are not resetting the job, the job is not proceeding with the successful execution.
5. Same issue does not seem to happen if the underlying job fails for some reason in which case the job

Kindly help me understand the behaviour in such a case. Also if there is anyway we can handle such a scenario in the job ?

Is there any other scenarios that we need to handle in our script like 255, 1002 etc ?

Thank you for all the help in advance.
Cheers,
RBK
Cheers,
RBK
FranklinE
Premium Member
Premium Member
Posts: 739
Joined: Tue Nov 25, 2008 2:19 pm
Location: Malvern, PA

Post by FranklinE »

Your design description matches what I have here. Some details you didn't mention...

Do you have a terminator activity on a failed trigger link? I've found this to be necessary to clarify the restart checkpoints. My guess is that your job is not in Aborted/Restartable because you don't have one.

Error conditions in jobs will transmit back to your script process, but you need to explicitly handle them in the script. I'm just an adequate script guy, so get someone good at it to review your script. If that's you, a second pair of eyes is always a good idea.

Some error conditions, like out of FTP Enterprise, are specified in Info entries to the Director log. The following Fatal entries lack those details. I've toyed with querying the log to get them, but haven't really done much with it. However, I have trained our support staff on how to read Director, and they do triage and handle system-level errors themselves very well.

I believe that defining triggers is key. Let me know how it goes with you.
Franklin Evans
"Shared pain is lessened, shared joy increased. Thus do we refute entropy." -- Spider Robinson

Using mainframe data FAQ: viewtopic.php?t=143596 Using CFF FAQ: viewtopic.php?t=157872
rbk
Participant
Posts: 23
Joined: Wed Oct 23, 2013 1:10 am
Location: India

Post by rbk »

Thank you so much for your response Franklin.

We have designed the sequencer in such a way that only on the successful completion of an activity we move to the next activity. In case of certain execute command stages which return a specific values we have handled them with appropriate trigger values.

We only have an Exception Handler set up which we expect to capture any kind of failures and have linked it to a Terminator Activity.

Also please note that the job is not going to Aborted/Restartable status only in cases where the value file name is not set up properly. In other words, I assume the job sticks to Aborted status where any of the activities inside the sequencer have not completed and no checkpoints have been set on their completion. In case where the job has started and few activities have completed successfully then they get into the Aborted/Restartable status.

I hope this gives a clear idea of the existing framework. Please feel free to ask for more information. Kindly help me out with the issue..

Thank you,
Cheers,
RBK
FranklinE
Premium Member
Premium Member
Posts: 739
Joined: Tue Nov 25, 2008 2:19 pm
Location: Malvern, PA

Post by FranklinE »

I'm glad to try to help.

On what stage do you check the file name? Do you have a Failed trigger link for it? That's the only thing I can think of other than the stage itself is failing to set a checkpoint. Some stages have a check box to not set a checkpoint.

Looks like a small mystery. Good luck.
Franklin Evans
"Shared pain is lessened, shared joy increased. Thus do we refute entropy." -- Spider Robinson

Using mainframe data FAQ: viewtopic.php?t=143596 Using CFF FAQ: viewtopic.php?t=157872
rbk
Participant
Posts: 23
Joined: Wed Oct 23, 2013 1:10 am
Location: India

Post by rbk »

Hi Franklin,
Actually we are not picking up the value file within the job.
What we are doing is identify the hostname in the script, then based on its output have the value file name captured.

Step 1:
Say for example
(Just representation)
if $varHostName is 'xyz' then varEnvId='DEV'
else if $varHostName is 'ABC' then varEnvId='QA'
else varEnvId='env'

Step 1:
We have a control file having all the params in the format param=value. One entry for the PS also exists.
The place holder for the PS is set as PS=ENVIND which we replace with the value identified above.

sed "s/ENVIND/$varEnvId/g" controlfile

this is being passed along in the value file name.

dsjob .....-param PS=DEV.....

so when I have the ENVIND set as ENVIN then the sed command does not have anything to replace meaning the dsjob command executes as -param PS=ENVIN instead of -param PS=DEV

The more I think about it, the more it feels like an invalid test scenario. Kindly share your thoughts..and ways to get around this issue.
Cheers,
RBK
FranklinE
Premium Member
Premium Member
Posts: 739
Joined: Tue Nov 25, 2008 2:19 pm
Location: Malvern, PA

Post by FranklinE »

That's great information there. I now believe that your error is being caught by the runtime engine, before the first actual executable code is being read.

What you describe is an OS/environment error from the command line. I hope others will take a look and confirm or correct me.

System level errors are rare, but I don't agree that creating errors in a test at that level is invalid. If it can happen there, it can happen in your production environment. If I had a dollar for every error that happened after someone said "oh, that can't/won't happen", I'd be retired by now.

Because it looks like it should be rare, I wouldn't personally provide error handling for it. Errors at that level can also be secondary effects from something more serious that won't be noticed otherwise (or until it creates a more widespread series of errors). In cases like this, I just expect to intervene with a manual job reset in Director after correcting the cause.

EDIT: The best way to avoid the abort status is to validate the parameter before sending the command line to the runtime engine. You did ask for suggestions on handling or avoiding the error, and assuming my previous conclusions are correct, that is what I would try next.

We have similar validations in our script. It's generic, it sources a project-specific ssi file, and it contains basic "allowed value" tests on critical parameters. One is the syslevel, which defines the processing environment (DEV, TEST, PRD). My rule of thumb is if the list of allowed values is greater than five, I have a design flaw.
Franklin Evans
"Shared pain is lessened, shared joy increased. Thus do we refute entropy." -- Spider Robinson

Using mainframe data FAQ: viewtopic.php?t=143596 Using CFF FAQ: viewtopic.php?t=157872
Post Reply