Joblogging
Moderators: chulett, rschirm, roy
Joblogging
Hello,
For each DS-Job, which retrieves data from a database, I want to put the start- and endtime and some more information in a database table.
For the starttime-entry I use a StageVariable RowCount. When RowCount = 1 I write the start of the Job in the table.
For the end of the DS-Job I have been looking in the DS-Functions but couldn't find anything appropriate.
Can someone help me with this challange and is there a more efficient way to determine the start of this job than the RowCount-version?
Kind Regards,
Jan
For each DS-Job, which retrieves data from a database, I want to put the start- and endtime and some more information in a database table.
For the starttime-entry I use a StageVariable RowCount. When RowCount = 1 I write the start of the Job in the table.
For the end of the DS-Job I have been looking in the DS-Functions but couldn't find anything appropriate.
Can someone help me with this challange and is there a more efficient way to determine the start of this job than the RowCount-version?
Kind Regards,
Jan
It is much easier to log the job runtime informater after the job has finished, then you can use the documented calls such as DSGetJobInfo() to return this information.
<a href=http://www.worldcommunitygrid.org/team/ ... TZ9H4CGVP1 target="WCGWin">
</a>
</a>
How to load joblog from routine into Database table
Hello,
The solution provided by you to log the jobdetails by routine using DSGETJOBINFO() Macro is perfect. But how can we populate this joblogdata into a database table by directly running DSAfterjobRoutine.
The solution provided by you to log the jobdetails by routine using DSGETJOBINFO() Macro is perfect. But how can we populate this joblogdata into a database table by directly running DSAfterjobRoutine.
ArndW wrote:It is much easier to log the job runtime informater after the job has finished, then you can use the documented calls such as DSGetJobInfo() to return this information. ...
The easiest way is to have the after job routine create a text file with the information, then call up a DataStage job which loads that to a table.
<a href=http://www.worldcommunitygrid.org/team/ ... TZ9H4CGVP1 target="WCGWin">
</a>
</a>
Hi ArndW,
Thankyou very much for your answer.
Our requirement is like after execution of every DSjob a routine should be fired which will capture all the Joblog details and load them into three different control tables.
As i am beginner in Datastage can you please help me by providing a specific function/macro to be used in the routine to call the Datastage job ?
Thanks in advance for your help.
Thankyou very much for your answer.
Our requirement is like after execution of every DSjob a routine should be fired which will capture all the Joblog details and load them into three different control tables.
As i am beginner in Datastage can you please help me by providing a specific function/macro to be used in the routine to call the Datastage job ?
Thanks in advance for your help.
If you need to extract this information immediately after every run you would have to either:
a) use an after-job subroutine that is fired off regardless of job status
b) If you use an external scheduler, create your own "interlude" dsjob call which not only starts the job but also starts job logging information collection.
c) have an independant process collect all the log information at a regular period. If the same job.instance is re-run during the regular period then only the last run's information is kept, so some loss might be possible
d) modify the DSD.RUN internal call to also execute a logging routine (this is not recommended)
e) If you use sequences for all you job runs then add a call to the logging procedure into each sequence.
Once you have decided on your logging method we can look at the easiest method of collecting this.
a) use an after-job subroutine that is fired off regardless of job status
b) If you use an external scheduler, create your own "interlude" dsjob call which not only starts the job but also starts job logging information collection.
c) have an independant process collect all the log information at a regular period. If the same job.instance is re-run during the regular period then only the last run's information is kept, so some loss might be possible
d) modify the DSD.RUN internal call to also execute a logging routine (this is not recommended)
e) If you use sequences for all you job runs then add a call to the logging procedure into each sequence.
Once you have decided on your logging method we can look at the easiest method of collecting this.
<a href=http://www.worldcommunitygrid.org/team/ ... TZ9H4CGVP1 target="WCGWin">
</a>
</a>
I should have also added that you can look at Kim's etl utilities, which also collect some informatinon.
At present I am doing something similar for a large implementation. This involves quite a lot of coding to make sure that no logging information is lost and it collects logging information for 8 large systems and over 60 project with >30,000 jobs running per day. This data is stored in 4 normalized tables in a database and is used for a lot of statistical analysis. It is not necessarily an easy task to do correctly and comprehensively.
At present I am doing something similar for a large implementation. This involves quite a lot of coding to make sure that no logging information is lost and it collects logging information for 8 large systems and over 60 project with >30,000 jobs running per day. This data is stored in 4 normalized tables in a database and is used for a lot of statistical analysis. It is not necessarily an easy task to do correctly and comprehensively.
<a href=http://www.worldcommunitygrid.org/team/ ... TZ9H4CGVP1 target="WCGWin">
</a>
</a>
EtlStats is free. You can download it on my tips page or the other link below. You can call one job and update several tables with row counts and sequence start and stop times. It takes about 30 minutes to create all the tables and set it up to run on an on-going basis. There is one job to call to get job stats for all jobs in a project. I usually do this on a new site to sort of take a snapshot of what happened today. I snapshot it a few more times to get good sample of what runs when everyday then I can predict based on what job is running how much longer the ETL has to run.
I slowly start to modify each sequence to make sure the row stats are gathered at the end of each sequence. This gives me what I need without snapshoting any more. There is a no wait version of the job now to gather stats in the background so the next sequence can start faster. So there is really no slowdown to the normal ETL run waiting for job stats to complete. There are lots of canned reports included to show which jobs are running slower than normal and similar reports using the data created.
I slowly start to modify each sequence to make sure the row stats are gathered at the end of each sequence. This gives me what I need without snapshoting any more. There is a no wait version of the job now to gather stats in the background so the next sequence can start faster. So there is really no slowdown to the normal ETL run waiting for job stats to complete. There are lots of canned reports included to show which jobs are running slower than normal and similar reports using the data created.
Mamu Kim
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
MetaStage automatically captures these statistics from DataStage job runs and stores them in its database. You can report on them using canned queries or you can create your own. (ETLstats is cheaper, of course).
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Well, the original request included end time. That's precisely when the stats are captured for MetaStage. And automatically loaded into the MetaStage Directory, to be queried at leisure. Immediately is OK; they're available as soon as they've been imported.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
It was always easy in a sequencer to call a routine activity after a job activity finished, this routine would get the job status and run information and load it to a flat file. The file would finally be uploaded to a database using another data stage job.
This routine would get a job handle using DSAttachJob and
then use the following functions
It would create a delimited record
Then would write to a seq file in append mode
Finally releases the job Handle
I dont have the exact copy of this routine now, but i guess you could build up the routine using whatever details i have given. Arguments for this routine is the job name( derived from activity variable in the sequencer)
This routine would get a job handle using DSAttachJob and
then use the following functions
Code: Select all
JobHandle = DSAttachJob(hJob,DSJ.ERRFATAL)
JobName = DSGetJobInfo(JobHandle,DSJ.JOBNAME)
JobStarted = DSGetJobInfo(JobHandle,DSJ.JOBSTARTTIMESTAMP)
JobEnded = DSGetJobInfo(JobHandle,DSJ.JOBLASTTIMESTAMP)
JobStatus = DSGetJobInfo(JobHandle,DSJ.JOBSTATUS)
Code: Select all
ReportText = ReportText:"Fatal :":JobName:",Aborted,":JobStarted:",":JobEnded:
Code: Select all
PathName = DirName:File
OpenSeq PathName To F.Var Then
Loop
ReadSeq Dummy From F.Var Else Exit ;* at end-of-file
Repeat
WriteSeqF ReportText To F.Var Else
Call DSLogFatal("Cannot write to ":PathName, "MyRoutine")
End
End Else
Call DSLogFatal("Cannot open file ":PathName, "MyRoutine")
End
Code: Select all
DET = DSDetachJob(JobHandle)
Ans = 0