Page 1 of 1

Log files for all job runs

Posted: Tue Sep 04, 2018 4:56 am
by charlie2k
Hi Gurus,

I am pretty new to the DS admin side of things. I am working to compile and gather all the stats that are available for job running on datastage, the intention being to be able to create a report to identify our long running jobs and their stages.

Can someone here point me in the right direction on how to start with it. I believe the log files will be a good place to start. Is that right? And where can I find these logs and how to read them? Any response is much appreciated :)

Thanks in advance!

Posted: Tue Sep 04, 2018 7:09 am
by chulett
Rather than coming at this via the job logs which can be intimidating to farm information from until one gets very familiar with them, I would suggest getting to know the Operations Console. An exact search here will turn up discussions, you can also search for the underlying database name: DSODB.

Posted: Tue Sep 04, 2018 8:09 am
by charlie2k
Thanks Craig! That's a great pointer!

Agreed on Operations Console being a good place to start with. But one caveat is that I need to be able to ship all of the data that I see in operations console to a separate reporting database built for a capturing the jobs and stages run stats data with reports running on top it. Which is why I am looking for the source of this data down below (if that means anything) :)

Posted: Tue Sep 04, 2018 8:19 am
by PaulVL
Oh if only you had a tool that could query a database, create a comma separated file, then sftp the file to that remote host or even connect directly to that target database and load it...

If you got your hands on that tool... man the possibilities...

Posted: Tue Sep 04, 2018 8:26 am
by chulett
:wink:

Posted: Tue Sep 04, 2018 10:41 am
by ray.wurlod
The logs are accessible, and exportable, from the Operations Console.

Posted: Wed Sep 05, 2018 12:49 am
by charlie2k
PaulVL wrote:Oh if only you had a tool that could query a database, create a comma separated file, then sftp the file to that remote host or even connect directly to that target database and load it...

If you got your hands on that tool... man the possibilities...
lol

Posted: Wed Sep 05, 2018 12:51 am
by charlie2k
ray.wurlod wrote:The logs are accessible, and exportable, from the Operations Console. ...
Thank you for the response! :)

Posted: Wed Sep 05, 2018 6:27 am
by chulett
Ray means manually, of course.

Posted: Thu Sep 06, 2018 12:08 am
by ray.wurlod
Actually, given your stated requirement, it could probably be resolved via a query or two against the DataStage Operations Database (DSODB).

Structure of DSODB is fully documented in this IBM developerWorks Technical Paper.

A couple of examples from that document

Posted: Thu Sep 06, 2018 12:13 am
by ray.wurlod

Code: Select all

-- List start and finish times, names, and statuses of all jobs that have
--    been run on host H, started after time YYYY-MM-DD HH:MM:SS, 
--    and have finished.
SELECT
      X.ProjectName, X.JobName,
      R.RunStartTimestamp, R.RunEndTimestamp, R.RunMinorStatus
FROM
      DSODB.JOBRUN AS R
JOIN  DSODB.JOBEXEC AS X ON R.JOBID = X.JOBID
JOIN  DSODB.HOST AS H    ON X.HOSTID = H.HOSTID
WHERE
      H.HostName = "H"
  AND R.RunMajorStatus = "FIN"
  AND R.RunStartTimestamp >= "YYYY-MM-DD HH:MM:SS"
ORDER BY R.RunStartTimestamp


-- List names of all jobs on host H that were running at time T and sort them
--     in descending order of total CPU usage for the whole run.
SELECT
      X.ProjectName, X.FolderPath, X.JobName,
      R.RunStartTimestamp, R.RunEndTimestamp, R.TotalCPU
FROM
      DSODB.JOBRUN AS R
JOIN  DSODB.JOBEXEC AS X ON R.JOBID = X.JOBID
JOIN  DSODB.HOST AS H    ON X.HOSTID = H.HOSTID
WHERE
      H.HostName = "H"
  AND R.RunStartTimestamp <= "YYYY-MM-DD HH:MM:SS"
  AND ( R.RunEndTimestamp >= "YYYY-MM-DD HH:MM:SS"
    OR R.RunEndTimestamp IS NULL )
ORDER BY R.TotalCPU DESC

Posted: Wed Sep 19, 2018 7:07 am
by charlie2k
Revisiting this post with everything that I managed to do so far.. and where I hit a wall :)

Thanks all for pointing to the DSODB. That was a great place to start mining all the job run information. Plus, its so nicely documented! Thumbs up to IBM!

Now, the caveat, our DSODB is not logging stage level info correctly. I know that there is a flag in one of the config files to turn on the link and stage level logging. Our dsadmin was kind enough to do it for a few test runs. But the data was all messed up. The start and end times for all stages in the job was the same (even though I know it isn't), so were the elapsed secs. Anyone else noticed this issue? Is this a bug?

Alternatively, we do see all the stage wise run information in the datastage director when we monitor a job run. Isn't that saved anywhere? May be i can think of pulling that data out. Does anyone know where (and if) that data is stored?

Thanks again in advance Gurus! :)

Posted: Thu Oct 25, 2018 1:57 am
by charlie2k
Giving back to the community :)

--Job Runs on DSODB

Code: Select all

SELECT 
	UPPER(J.PROJECTNAME) PROJECT_NAME,
	J.JOBTYPE JOB_TYPE, 
	UPPER(J.JOBNAME) JOB_NAME, 
	R.RUNID RUN_ID, 
	R.INVOCATIONID INVOCATION_ID, 
	R.RUNSTARTTIMESTAMP RUN_START_TIME, 
	R.RUNENDTIMESTAMP RUN_END_TIME, 
	TIMESTAMPDIFF(2, CHAR(TIMESTAMP(R.RUNENDTIMESTAMP) - TIMESTAMP(R.RUNSTARTTIMESTAMP))) DURATION, 
	REF1.NAME RUN_TYPE,
	REF2.NAME RUN_STATUS,
	REF3.NAME RUN_SUB_STATUS,
	R.NUMMESSAGESFATAL NO_OF_ERRORS,
	R.NUMMESSAGESWARNING NO_OF_WARNINGS, 
 	R.TOTALROWSCONSUMED ROWS_INPUT, 
	R.TOTALROWSPRODUCED ROWS_OUTPUT, 
	R.TOTALCPU CPU_TIME_MILLISECS
FROM DSODB.JOBRUN R
INNER JOIN DSODB.JOBEXEC J ON R.JOBID = J.JOBID
INNER JOIN DSODB.MASTERREF REF1 ON REF1.CODE = R.RUNTYPE AND REF1.ENUMERATION = 'RunType'
INNER JOIN DSODB.MASTERREF REF2 ON REF2.CODE = R.RUNMAJORSTATUS AND REF2.ENUMERATION = 'RunMajorStatus'
INNER JOIN DSODB.MASTERREF REF3 ON REF3.CODE = R.RUNMINORSTATUS AND REF3.ENUMERATION = 'RunMinorStatus'
;
--Job and Stage Runs on DSODB

Code: Select all

SELECT 
	R.RUNID RUN_ID, 
	UPPER(J.PROJECTNAME) PROJECT_NAME,
	J.JOBTYPE JOB_TYPE, 
	UPPER(J.JOBNAME) JOB_NAME, 
	R.INVOCATIONID INVOCATION_ID, 
	R.RUNSTARTTIMESTAMP JOB_START_TIME, 
	R.RUNENDTIMESTAMP JOB_END_TIME, 
	TIMESTAMPDIFF(2, CHAR(TIMESTAMP(R.RUNENDTIMESTAMP) - TIMESTAMP(R.RUNSTARTTIMESTAMP))) JOB_DURATION,
	R.TOTALCPU JOB_CPU_MILLISECS,
 	R.TOTALROWSCONSUMED JOB_ROWS_INPUT, 
	R.TOTALROWSPRODUCED JOB_ROWS_OUTPUT,
	REF1.NAME JOB_RUN_TYPE,
	REF2.NAME JOB_RUN_STATUS,
	REF3.NAME JOB_RUN_SUB_STATUS,
	R.NUMMESSAGESFATAL JOB_NO_OF_ERRORS,
	R.NUMMESSAGESWARNING JOB_NO_OF_WARNINGS, 
	UPPER(M.STAGENAME) STAGE_NAME,
	S.STAGESTARTTIMESTAMP STAGE_START_TIME,
	S.STAGEENDTIMESTAMP STAGE_END_TIME,
	TIMESTAMPDIFF(2, CHAR(TIMESTAMP(S.STAGEENDTIMESTAMP) - TIMESTAMP(S.STAGESTARTTIMESTAMP))) STAGE_DURATION,
	S.TOTALCPU STAGE_CPU_MILLISECS,
	REF4.NAME STAGE_RUN_STATUS
FROM DSODB.JOBRUN R
INNER JOIN DSODB.JOBEXEC J ON R.JOBID = J.JOBID
LEFT OUTER JOIN DSODB.JOBRUNSTAGE S ON R.RUNID = S.RUNID
LEFT OUTER JOIN DSODB.JOBSTAGE M ON S.STAGEID = M.STAGEID
INNER JOIN DSODB.MASTERREF REF1 ON REF1.CODE = R.RUNTYPE AND REF1.ENUMERATION = 'RunType'
INNER JOIN DSODB.MASTERREF REF2 ON REF2.CODE = R.RUNMAJORSTATUS AND REF2.ENUMERATION = 'RunMajorStatus'
INNER JOIN DSODB.MASTERREF REF3 ON REF3.CODE = R.RUNMINORSTATUS AND REF3.ENUMERATION = 'RunMinorStatus'
LEFT OUTER JOIN DSODB.MASTERREF REF4 ON REF4.CODE = S.STAGESTATUS AND REF4.ENUMERATION = 'StageStatus'
;
--Job-Stage Master on XMETA

Code: Select all

SELECT 
	UPPER(SUBSTRING(A.DSNAMESPACE_XMETA,DBMS_LOB.INSTR(A.DSNAMESPACE_XMETA,':',1)+1,50))||'-'||UPPER(A.NAME_XMETA)||'-'||UPPER(S.NAME_XMETA)||'-'||UPPER(S.INTERNALID_XMETA) JOB_STAGE_KEY
	,UPPER(SUBSTRING(A.DSNAMESPACE_XMETA,DBMS_LOB.INSTR(A.DSNAMESPACE_XMETA,':',1)+1,50)) PROJECT_NAME
	,A.DSNAMESPACE_XMETA
	,UPPER(A.NAME_XMETA) JOB_NAME 
	,UPPER(S.NAME_XMETA) STAGE_NAME
	,UPPER(S.INTERNALID_XMETA) STAGE_INTERNAL_ID
	,A.CATEGORY_XMETA JOB_PATH
	,CASE WHEN A.JOBTYPE_XMETA = 3 THEN 'SEQ'
		WHEN A.JOBTYPE_XMETA = 4 THEN 'PAR' ELSE 'NA' END JOB_TYPE
	,A.SHORTDESCRIPTION_XMETA JOB_DESC
	,S.STAGETYPECLASSNAME_XMETA STAGE_TYPE 
	,S.STAGETYPE_XMETA STAGE_SUB_TYPE
	,(TIMESTAMP('01/01/1970', '00:00:00') + (S.XMETA_MODIFICATION_TIMESTAMP_XMETA/ 1000) SECONDS) AS STAGE_LAST_MODIFIED
	,(TIMESTAMP('01/01/1970', '00:00:00') + (A.XMETA_MODIFICATION_TIMESTAMP_XMETA/ 1000) SECONDS) AS JOB_LAST_MODIFIED
FROM
	XMETA.DATASTAGEX_DSSTAGE S,
	XMETA.DATASTAGEX_DSJOBDEF A
WHERE
	S.CONTAINER_RID = A.XMETA_REPOS_OBJECT_ID_XMETA
;