How to find out the processes forked by a DS job thru Unix?

A forum for discussing DataStage<sup>®</sup> basics. If you're not sure where your question goes, start here.

Moderators: chulett, rschirm, roy

Post Reply
synsog
Premium Member
Premium Member
Posts: 232
Joined: Sun Aug 01, 2010 11:01 pm
Location: Pune

How to find out the processes forked by a DS job thru Unix?

Post by synsog »

Hi,

We have a project requirement where we have to find out all the processes forked by a Datastage job and to check if any of those processes are utilizing excessive CPU or IO. This will be done with a shell script.

Please help me out in finding a way in which we can -

1. List all the processes created by running a DS job.
2. Finding out the CPU and IO utilization of the same (I am using vmstat for CPU and iostat for IO. Please suggest me if there are any other efficient ways to do that.)

We have to check these things via Unix only. No changes can be made to the jobs.

Thanks in advance.
kduke
Charter Member
Charter Member
Posts: 5227
Joined: Thu May 29, 2003 9:47 am
Location: Dallas, TX
Contact:

Post by kduke »

I tried to do this once. It is hard to do. You can set a parameter to show the PIDs in the log. You need to capture the processes along with their PIDs to trace these. You would need to capture the CPU utilization at the same time. Their are public domain tools to do this. You need to load all these in separate tables and do joins across the times. These are difficult joins because you are sampling the CPU information every 10 seconds or whatever time interval you choose. So a job starts up and maybe 80 or more processes kick off. You need to find the average CPU utilization for each process based on these samples.

10:00:00 Job1 starts
10:00:10 subprocess1 from job1 starts
10:10:00 subprocess1 from job1 ends
10:01:10 subprocess2 from job1 starts
10:08:00 subprocess2 from job1 ends
and so on

10:00:00 CPU 50%
1):01:00 CPU 90%
10:02:00 CPU 75%

Now average the CPU percents for each subprocess. Not easy .
Mamu Kim
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Use the Monitor in Director.
Double click on any component - its PID should be displayed amongst the other information.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
synsog
Premium Member
Premium Member
Posts: 232
Joined: Sun Aug 01, 2010 11:01 pm
Location: Pune

Post by synsog »

Hi,

We need to do this via unix only. Can't make any changes to the existing jobs or use the director client.

We fired a ps aux for getting the top 10 CPU consuming processes and for fiding the running DS jobs we grepped DSD.RUN. Now the problem is how we associate the job pids with the top 10 process pids that we got from ps aux.

There is a possibility that the job forked some processes and one of those processes might be consuming more CPU. So what we want is the processes forked by that particular job.

The commands that I tried to find the subprocesses is -

proctree
ps -fL

Please let me know if there is any other way for finding the subprocesses of a DS job in Unix.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Wouldn't all of the subprocesses show their parent's PID, i.e. their PPID? Am I missing something or couldn't you just simply walk that chain of pids? :?
-craig

"You can never have too many knives" -- Logan Nine Fingers
kduke
Charter Member
Charter Member
Posts: 5227
Joined: Thu May 29, 2003 9:47 am
Location: Dallas, TX
Contact:

Post by kduke »

chulett wrote:Wouldn't all of the subprocesses show their parent's PID, i.e. their PPID? Am I missing something or couldn't you just simply walk that chain of pids? :?
I thought so too but that is not how they implemented it. They phantom some which means they have no parent. I think the section leaders are that way. I pretty sure that means they now become the parent not the original job process.
Mamu Kim
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

synsog wrote:We need to do this via unix only.
Why?

Why did you bother buying the DataStage tool?
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
synsog
Premium Member
Premium Member
Posts: 232
Joined: Sun Aug 01, 2010 11:01 pm
Location: Pune

Post by synsog »

chulett wrote:Wouldn't all of the subprocesses show their parent's PID, i.e. their PPID? Am I missing something or couldn't you just simply walk that chain of pids? :?
We tried using proctree <job_pid> and ps -fL <job_pid> for getting the sub processes created by the job process but its not working. It just shows me two processes -

1. The running job
2. Osh Monitor for that job
synsog
Premium Member
Premium Member
Posts: 232
Joined: Sun Aug 01, 2010 11:01 pm
Location: Pune

Post by synsog »

ray.wurlod wrote:
synsog wrote:We need to do this via unix only.
Why?

Why did you bother buying the DataStage tool?
Ray,

We need a monitoring script in Unix which we would be using to find out the CPU and IO usage by the running DS jobs - so that we can isolate the bad jobs (ones causing High Disk and CPU usage).

For this we just need to find out a way in which we can capture all the sub process ids created by a job process and then we are going to check the CPU & IO usage - all in just one script run.
synsog
Premium Member
Premium Member
Posts: 232
Joined: Sun Aug 01, 2010 11:01 pm
Location: Pune

Post by synsog »

kduke wrote:
chulett wrote:Wouldn't all of the subprocesses show their parent's PID, i.e. their PPID? Am I missing something or couldn't you just simply walk that chain of pids? :?
I thought so too but that is not how they implemented it. They phantom some which means they have no parent. I think the section leaders are that way. I pretty sure that means they now become the parent not the original job process.
Kim,

Is there any way in which we can list the process ids of the sub processes created by the DS job? Or is it just a wild goose chase :( ? Please suggest what approach we should take if we really need to find out the pids...
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

The Performance Monitoring tool within DataStage will give you all of the things you mentioned.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
kduke
Charter Member
Charter Member
Posts: 5227
Joined: Thu May 29, 2003 9:47 am
Location: Dallas, TX
Contact:

Post by kduke »

Not sure what Ray is talking about. Sounds like you need to try it.

My way is to turn on the parameter to show PIDs. Run one job. Get PIDs from log file. As the job is running then do ps -ef >outfile1.txt several times incrementing the 1 to 2 then 3 and so on. Load text files into table or spreadsheet. Find all the PIDs and add their sizes together. That will give you the RAM used.
Mamu Kim
Post Reply