Joblogging

jzijl · Post by **jzijl** » Fri Aug 18, 2006 9:11 am

Hello,

For each DS-Job, which retrieves data from a database, I want to put the start- and endtime and some more information in a database table.

For the starttime-entry I use a StageVariable RowCount. When RowCount = 1 I write the start of the Job in the table.

For the end of the DS-Job I have been looking in the DS-Functions but couldn't find anything appropriate.

Can someone help me with this challange and is there a more efficient way to determine the start of this job than the RowCount-version?

Kind Regards,

Jan

ArndW · Post by **ArndW** » Fri Aug 18, 2006 9:31 am

It is much easier to log the job runtime informater after the job has finished, then you can use the documented calls such as DSGetJobInfo() to return this information.

kvsudheer · Post by **kvsudheer** » Mon Aug 21, 2006 4:09 am

Hello,

The solution provided by you to log the jobdetails by routine using DSGETJOBINFO() Macro is perfect. But how can we populate this joblogdata into a database table by directly running DSAfterjobRoutine.

ArndW wrote:It is much easier to log the job runtime informater after the job has finished, then you can use the documented calls such as DSGetJobInfo() to return this information. ...

ArndW · Post by **ArndW** » Mon Aug 21, 2006 4:18 am

The easiest way is to have the after job routine create a text file with the information, then call up a DataStage job which loads that to a table.

kvsudheer · Post by **kvsudheer** » Mon Aug 21, 2006 5:43 am

Hi ArndW,

Thankyou very much for your answer.

Our requirement is like after execution of every DSjob a routine should be fired which will capture all the Joblog details and load them into three different control tables.

As i am beginner in Datastage can you please help me by providing a specific function/macro to be used in the routine to call the Datastage job ?

Thanks in advance for your help.

ArndW · Post by **ArndW** » Mon Aug 21, 2006 6:25 am

If you need to extract this information immediately after every run you would have to either:
a) use an after-job subroutine that is fired off regardless of job status
b) If you use an external scheduler, create your own "interlude" dsjob call which not only starts the job but also starts job logging information collection.
c) have an independant process collect all the log information at a regular period. If the same job.instance is re-run during the regular period then only the last run's information is kept, so some loss might be possible
d) modify the DSD.RUN internal call to also execute a logging routine (this is not recommended)
e) If you use sequences for all you job runs then add a call to the logging procedure into each sequence.

Once you have decided on your logging method we can look at the easiest method of collecting this.

ArndW · Post by **ArndW** » Mon Aug 21, 2006 6:29 am

I should have also added that you can look at Kim's etl utilities, which also collect some informatinon.

At present I am doing something similar for a large implementation. This involves quite a lot of coding to make sure that no logging information is lost and it collects logging information for 8 large systems and over 60 project with >30,000 jobs running per day. This data is stored in 4 normalized tables in a database and is used for a lot of statistical analysis. It is not necessarily an easy task to do correctly and comprehensively.

kduke · Post by **kduke** » Mon Aug 21, 2006 6:51 am

EtlStats is free. You can download it on my tips page or the other link below. You can call one job and update several tables with row counts and sequence start and stop times. It takes about 30 minutes to create all the tables and set it up to run on an on-going basis. There is one job to call to get job stats for all jobs in a project. I usually do this on a new site to sort of take a snapshot of what happened today. I snapshot it a few more times to get good sample of what runs when everyday then I can predict based on what job is running how much longer the ETL has to run.

I slowly start to modify each sequence to make sure the row stats are gathered at the end of each sequence. This gives me what I need without snapshoting any more. There is a no wait version of the job now to gather stats in the background so the next sequence can start faster. So there is really no slowdown to the normal ETL run waiting for job stats to complete. There are lots of canned reports included to show which jobs are running slower than normal and similar reports using the data created.

ray.wurlod · Post by **ray.wurlod** » Mon Aug 21, 2006 9:28 am

MetaStage automatically captures these statistics from DataStage job runs and stores them in its database. You can report on them using canned queries or you can create your own. (ETLstats is cheaper, of course).

kduke · Post by **kduke** » Mon Aug 21, 2006 3:50 pm

Ray, I think the issue is when do you get your stats. I think it should always be a part of of the ETL process and not something after the fact. This information is more useful when it is current. I think Ken and others agree with me.

Ray, what do you think?

ray.wurlod · Post by **ray.wurlod** » Mon Aug 21, 2006 10:02 pm

Well, the original request included end time. That's precisely when the stats are captured for MetaStage. And automatically loaded into the MetaStage Directory, to be queried at leisure. Immediately is OK; they're available as soon as they've been imported.

rameshrr3 · Post by **rameshrr3** » Tue Aug 22, 2006 6:32 am

It was always easy in a sequencer to call a routine activity after a job activity finished, this routine would get the job status and run information and load it to a flat file. The file would finally be uploaded to a database using another data stage job.

This routine would get a job handle using DSAttachJob and
then use the following functions

Code: Select all

JobHandle = DSAttachJob(hJob,DSJ.ERRFATAL) 
JobName = DSGetJobInfo(JobHandle,DSJ.JOBNAME) 
JobStarted = DSGetJobInfo(JobHandle,DSJ.JOBSTARTTIMESTAMP) 
JobEnded = DSGetJobInfo(JobHandle,DSJ.JOBLASTTIMESTAMP) 
JobStatus = DSGetJobInfo(JobHandle,DSJ.JOBSTATUS)

It would create a delimited record

Code: Select all

ReportText = ReportText:"Fatal   :":JobName:",Aborted,":JobStarted:",":JobEnded:

Then would write to a seq file in append mode

Code: Select all

PathName = DirName:File

OpenSeq PathName To F.Var Then
Loop
ReadSeq Dummy From F.Var Else Exit ;* at end-of-file
Repeat
WriteSeqF ReportText To F.Var Else
Call DSLogFatal("Cannot write to ":PathName, "MyRoutine")
End
End Else
Call DSLogFatal("Cannot open file ":PathName, "MyRoutine")
End

Finally releases the job Handle

Code: Select all

DET = DSDetachJob(JobHandle) 
Ans = 0

I dont have the exact copy of this routine now, but i guess you could build up the routine using whatever details i have given. Arguments for this routine is the job name( derived from activity variable in the sequencer)

jzijl · Post by **jzijl** » Thu Aug 24, 2006 6:42 am

Thnks for all the information.

Jan

DSXchange

Joblogging

Joblogging

How to load joblog from routine into Database table