Best way to call a Shell Script

A forum for discussing DataStage<sup>®</sup> basics. If you're not sure where your question goes, start here.

Moderators: chulett, rschirm, roy

Post Reply
DSFreddie
Participant
Posts: 130
Joined: Wed Nov 25, 2009 2:16 pm

Best way to call a Shell Script

Post by DSFreddie »

Hi All,

I know there are multiple ways to call a Shell Script in Datastage such as,

1. Execute command activity
2. Before/After job Subroutine
3. Transformer stage ? etc...

I am trying to figure out the best way in terms of overall performance. Can anyone shed some light on what is the ideal way to execute a shell script in datastage ? ( our platform is GRID enabled)

Thanks
Freddie
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

I'm not sure there's a "best" way, overall performance-wise or not. Basically it's just shelling out to the O/S and running it, which is what they all would do. The basic difference is the where of it and what the script needs to accomplish, other than succeed or fail. #1 and #2 are single use, while #3 would be for every record through the job. #1 could give you conditional control of downstream decisions / processes while #2 would just run before or after job and maybe make it go boom. I think #3 would generally be a mistake but simply saying "call a shell script" is a very wide-open topic. As I'm sure you knew. Care to narrow it down a bit?

Not anything I've worked with but I assume GRID would only affect the answer for #3. :?
-craig

"You can never have too many knives" -- Logan Nine Fingers
FranklinE
Premium Member
Premium Member
Posts: 739
Joined: Tue Nov 25, 2008 2:19 pm
Location: Malvern, PA

Post by FranklinE »

As Craig mentions, it depends on some specifics, but I use a specific standard for the first decision about "where": what level of error handling does it require?

The Exec stage is best for this. It handles return codes, it provides explicit tracing of parameters, and (most important) writes useful entries to the job log... well, not all log entries are equal, but at the job sequence level I am also afforded explicit error handling with checkpoints.

That last is critical here. If I can fix an underlying problem and rerun without further intervention, that is my first choice. The key point is that when a Px job abends, its parameters' values are embedded. Restarting at the Px job means that a badly valued parameter can't be fixed. I have to reset the job and the sequence to correct a bad parameter value. If I used a script to provide that value, and the abort can be handled at the script, I avoid extra work on the entire job.

Example:

Code: Select all

UserVariables ==> Execute Script ==> Activity Stage Px
Checkpoints are active. A terminate stage is a second link for the script and Px job. Triggers are set to examine the critical values.

So, for a hypothetical situation, a script retrieves the name of a file using Unix ls command. The Px job uses the file name. The Exec stage triggers cause abort if the result of the ls command is no file found. If you don't abort at the Exec stage, you can't restart at the Px job after "fixing" the file situation, because the name of the fixed file is different from the name of the file not found.
Franklin Evans
"Shared pain is lessened, shared joy increased. Thus do we refute entropy." -- Spider Robinson

Using mainframe data FAQ: viewtopic.php?t=143596 Using CFF FAQ: viewtopic.php?t=157872
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

That "Exec" stage is the Execute Command stage, yes? Just wanted to make sure it wasn't that odd variant that is only available on a Windows installation that I don't quite remember the name of... ah, the Command stage. Never mind. :wink:
-craig

"You can never have too many knives" -- Logan Nine Fingers
FranklinE
Premium Member
Premium Member
Posts: 739
Joined: Tue Nov 25, 2008 2:19 pm
Location: Malvern, PA

Post by FranklinE »

I avoid using the correct name "Execute Command" because it sounds like I'm ordering someone's demise.

But yes, that's the one, officer. I saw it do the deed.
:lol:
Franklin Evans
"Shared pain is lessened, shared joy increased. Thus do we refute entropy." -- Spider Robinson

Using mainframe data FAQ: viewtopic.php?t=143596 Using CFF FAQ: viewtopic.php?t=157872
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

It probably doesn't matter all that much. In each case the executing process has to fork a child process to execute the shell to execute the shell script. Most of the "performance" (whatever that means) impact is in the creation and management of the child process, which is the same for all three approaches.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply