How can I generate ETL stats on all jobs in a project?

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
saikailas
Premium Member
Premium Member
Posts: 3
Joined: Tue Mar 07, 2006 10:16 pm
Location: New Jersey

How can I generate ETL stats on all jobs in a project?

Post by saikailas »

I was looking at DSXchange to find the approach to generate the report on datastage project with the run times of each sequencer, sub jobs and the stages in the individual jobs. The details provided in the discussion forums are very helpful and make the work easier.
I ran the dsjob - report on a parallel job and i dont see the information regarding dataset stages.

I wanted to know if I can use the ETL stats to generate the run time stats for all the jobs in a project on Version 8x ( parallel jobs) or if there is any other utility to accomplish this. Please let me know if any steps that I should follow.
Thanks

[* Note - Title made more descriptive - Andy *]
kduke
Charter Member
Charter Member
Posts: 5227
Joined: Thu May 29, 2003 9:47 am
Location: Dallas, TX
Contact:

Post by kduke »

Modify stage seems to block row counts. I have never seen an issue on datasets. The jobs need to be modified to get row counts for PX jobs. Vincent and others have modified it so it works fine. The issue is PX store row counts at the partition level. So you either need to aggregate all the partitions or modify ETL_ROW_HIST to have partition id as part of the key.

The Driver job can take a job name, sequence name, folder name or 'ALL' and will get row counts for one job or all jobs in a sequence, folder or just all jobs. Big project can run a hour or more. Very useful though.
Mamu Kim
kduke
Charter Member
Charter Member
Posts: 5227
Joined: Thu May 29, 2003 9:47 am
Location: Dallas, TX
Contact:

Post by kduke »

Someone with version 8 needs to respond to how it works in that version. I do not have v8 to try it.

If someone fixes these jobs and wants to post their fix for PX along with table changes then zip up your dsx and ddl and email it to me. I will post it on my web site. I will consider posting any other generic job or script someone might think is useful. Make sure your name is in the job properties and script comments. I do not want to take credit for another persons work.
Mamu Kim
sohasaid
Premium Member
Premium Member
Posts: 115
Joined: Tue May 20, 2008 3:02 am
Location: Cairo, Egypt

Post by sohasaid »

Saikailas & kduke,

would you please share any posts or topics which can guide me to generate such ETL status report for project jobs because i'm quite new to this & I need it urgently?

Regards.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Seems like you might not be aware that ETLStats is a pre-built package of jobs / scripts that Kim Duke create and donated to the community via his website. You'll probably need / want to tweak or extend it and you may have issues with 8.x jobs but at least it will give you a huge leg up with your 'urgent' need.
-craig

"You can never have too many knives" -- Logan Nine Fingers
sohasaid
Premium Member
Premium Member
Posts: 115
Joined: Tue May 20, 2008 3:02 am
Location: Cairo, Egypt

Post by sohasaid »

Thanks chulett, you're right I've no previous idea about this feature in DataStage and it's urgent as the client needs such an ETL report and I need to know how to implement first, and how much time it will take.

Thanks again.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Do you have Metadata Workbench?
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
saikailas
Premium Member
Premium Member
Posts: 3
Joined: Tue Mar 07, 2006 10:16 pm
Location: New Jersey

Post by saikailas »

Ray,
We do not have Metadata Workbench.
John Smith
Charter Member
Charter Member
Posts: 193
Joined: Tue Sep 05, 2006 8:01 pm
Location: Australia

Post by John Smith »

sohasaid wrote:Thanks chulett, you're right I've no previous idea about this feature in DataStage and it's urgent as the client needs such an ETL report and I need to know how to implement first, and how much time it will take.

Thanks again.
sorry to disappoint you but in this forum nothing is urgent as everyone is just a volunteer. How long is a piece of string? You have not specified your requirements and even then it depends on who is doing the job. An expert could do it in less time than a beginner. btw you should also have courtesy to start a new thread and not hijack someone else's thread.
DS consultant.
JoshGeorge
Participant
Posts: 612
Joined: Thu May 03, 2007 4:59 am
Location: Melbourne

Post by JoshGeorge »

XMETA (Repository) will be the right place to look for to generate such a report in 8x versions (Infosphere) especially if logging on to XMETA is switched on.
Joshy George
<a href="http://www.linkedin.com/in/joshygeorge1" ><img src="http://www.linkedin.com/img/webpromo/bt ... _80x15.gif" width="80" height="15" border="0"></a>
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Before deciding where all this should come from, it would help to know what exactly it is that you need in this 'ETL report'. Do you? Can you articulate it here for us? :?
-craig

"You can never have too many knives" -- Logan Nine Fingers
kduke
Charter Member
Charter Member
Posts: 5227
Joined: Thu May 29, 2003 9:47 am
Location: Dallas, TX
Contact:

Post by kduke »

Craig is correct there are lots of ways to get this information. Tell us what you need and why and maybe we can help you better.

XMETA is fine for the last job run but you need to setup Metadata Workbench to get a history of row counts. Do a search there are many topics about all this.
Mamu Kim
saikailas
Premium Member
Premium Member
Posts: 3
Joined: Tue Mar 07, 2006 10:16 pm
Location: New Jersey

Post by saikailas »

Thank you all very much for the inputs.

Details on the ETL Report that I need.
We have some Sequencer Jobs and each sequencer is calling subjobs and also sub sequencer jobs.
ETL report should generate the run times of the sequencers, subjobs and the stages/links in the subjobs at the project level.
I'm able to generate the xml, detail report on individual jobs through dsjob -report command, but wanted to generate this report at the project level. I can get the start and end times of the stages from the director/monitor, but taking into consideration the large number of jobs that we have, wanted to generate the report so that the jobs/stages/links running for long hours can be traced and can proceed with redesign for better performance.
dsedi
Participant
Posts: 220
Joined: Wed Jun 02, 2004 12:38 am

Post by dsedi »

saikailas wrote:Thank you all very much for the inputs.

Details on the ETL Report that I need.
We have some Sequencer Jobs and each sequencer is calling subjobs and also sub sequencer jobs.
ETL report should generate the run times of the sequencers, subjobs and the stages/links in the subjobs at the project level.
I'm able to generate the xml, detail report on individual jobs through dsjob -report command, but wanted to generate this report at the project level. I can get the start and end times of the stages from the director/monitor, but taking into consideration the large number of jobs that we have, wanted to generate the report so that the jobs/stages/links running for long hours can be traced and can proceed with redesign for better performance.
ETLstats do exactly all the above plus much more ....

We have done a simple job recently which creates a CSV which contains all jobs, catogroy, status start time and elapsed time.
the outline we have used is, query the universe in a server job for jobs and their category names and then pass them in a transformer through the separate routines which is having the following functions with proper error handling etc..
DSGetJobInfo(hJob1,DSJ.JOBSTATUS)
DSGetJobInfo(hJob1,DSJ.JOBSTARTTIMESTAMP)
DSGetJobInfo(hJob1,DSJ.JOBELAPSED)
Please note that we haven't done anything for stage/link level.

Hope this helps a little.
Accept that some days you're the pigeon and some days you're the statue.
kduke
Charter Member
Charter Member
Posts: 5227
Joined: Thu May 29, 2003 9:47 am
Location: Dallas, TX
Contact:

Post by kduke »

I have built some nice tree controls using some of these jobs where you can drill down into a sequence and see all the jobs in that sequence and if there is another sequence drill down into it.

I think there is a a job dependency table. If not then I will be glad to add one. I like small sequences and to see how you started this sequence is difficult without this kind of table. There is a routine which will rebuild the job dependency relationship if your job properties is out of sync. I will post the code if anyone needs it.

There are also included several jobs which will list long running jobs. The reports are really nice and will email you their results if you get that part of EtlStats working.
Mamu Kim
Post Reply