Profile all Jobs end-to-end

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

djm
Participant
Posts: 68
Joined: Wed Mar 02, 2005 3:42 am
Location: N.Z.

Profile all Jobs end-to-end

Post by djm »

Want to profile your end-to-end DataStage processing to see potential bottlenecks? Seeing significantly day-to-day variations in your end-to-end job completion time? Having to undertake performance impact testing in a test environment prior to deployment to your production environment without having a sound basis to compare your two environments?

Want a tool that may be able to assist you undertake these activities? If so then keep reading.

I previously posted a thread that contained DataStage objects that can be used to obtain the content of the Job Log. This thread takes that a step further. The content of the log can be rendered into a format suitable for processing with a tool that provides both a visual and a comparative means of presenting the information. This thread contains the
DataStage objects necessary to perform the rendering and contains a link to a location from where the Excel spreadsheet can be downloaded.

Aside from importing the DataStage objects within this thread, you are welcome to download the spreadsheet and use it. However, you may neither redistribute nor sell the spreadsheet nor may it be sold as part of any service that you or any company for which you work without explicit authorisation from me.

The DataStage objects and spreadsheet have been tested for DataStage 6.5 and Excel 2002. Any feedback on the success (or otherwise) for other version of DataStage and/or Excel is appreciated. Note that for versions on Excel prior to Excel 2002 will not display a "progress" window but will (should!) display progress messages within the Status Bar instead. This
is due to the absence of non-modal dialog windows in VBA prior to Excel 2002.

The Excel spreadsheet is available for downloading. The link also contains a couple of screenshots illustrate how I use this tool at a client site for profiling their DataStage job performance.

Two DataStage objects are contained in replies below:
  • JobLogAnalysis. Required over and above that from the original thread. This job interprets the Job Log and renders it into the format suitable for input into the spreadsheet.
  • DataStageJobLogContent, which is an alternative to that within the original post. It is almost identical to the original except that the hash files use the hash directory rather than the account mechanism for the location of the scratch hash files. The hash file directory is passed as a parameter to the instance of DataStageJobLogContent within JobLogAnalysis. Use whichever version complies with your environment's standards and, if necessary, drop (or ignore) the hash directory parameter to JobLogAnalysis.
The Excel spreadsheet is meant to be fairly self explanatory to use. The instant summary is:
  • Import as many instances of the Job log that you wish. Each imported file should be given a distinct reference value.
  • Click the "Process" button
There is a subtle difference to the environment comparison pivot table when only one analysis log as opposed to multiple analysis logs have been imported. When a single analysis log has been imported and processed, the environment comparison is listed in the order of longest duration job to shortest duration job. When multiple analysis logs are imported, the comparison are ordered by largest variation in job elapsed times between the environments. Note that assessing a multiple analysis importation can be a bit tricky to get one's head if there are multiple instances of a job since it will make the comparison between the shortest elapsed time in one environment verses the longest elapsed time in the other.

The only known caveat about the objects that follow is that JobLogAnalysis should not be run when other DataStage jobs are running. I think that it hangs until the completion of the running jobs before continuing. If your experience is different then please reply to this post and dispel my misconception.

Hopefully this profiler will be as useful for you as it has been for me.

If you encounter any issues with the objects or spreadsheet, please post a reply to this thread and I'll follow up on it in due course.

David
djm
Participant
Posts: 68
Joined: Wed Mar 02, 2005 3:42 am
Location: N.Z.

Post by djm »

Following suggestion in next reply and deleted code posting. Replaced with links to code at own web site.
Last edited by djm on Tue Oct 25, 2005 4:33 am, edited 1 time in total.
kduke
Charter Member
Charter Member
Posts: 5227
Joined: Thu May 29, 2003 9:47 am
Location: Dallas, TX
Contact:

Post by kduke »

Instead of posting this much code why not have a link to your web site. It is really hard to cut and paste of these types of posts. I have also tried to do these types of posts and people complained. Zip them up into one complete download on your site and add a readme.txt to use it. I think you will get more people trying to use it.
Mamu Kim
djm
Participant
Posts: 68
Joined: Wed Mar 02, 2005 3:42 am
Location: N.Z.

Zip file to replace horrendous code posting.

Post by djm »

As per Kim's suggestion, listing the code was pretty horrendous. If you want the objects and spreadsheet in one, download the zip file.

Sorry Kim - it's late (nearly midnight). Amended the post.
Last edited by djm on Tue Oct 25, 2005 4:43 am, edited 1 time in total.
kduke
Charter Member
Charter Member
Posts: 5227
Joined: Thu May 29, 2003 9:47 am
Location: Dallas, TX
Contact:

Post by kduke »

Ken Bland ... Kim Duke. My name is KIM.
Mamu Kim
Sunshine2323
Charter Member
Charter Member
Posts: 130
Joined: Mon Sep 06, 2004 3:05 am
Location: Dubai,UAE

Post by Sunshine2323 »

Hi D,

I have been successful in importing the log file created by using your code in the excel Template.
But when I am try to process it I am getting an error in the macro saying

Runtime Error '1004'

Unable to get RowFields Property of the Pivot Table Class

When I click Debug it highlights the following row,

'Otherwise sort by the biggest variation in time across projects ...
.RowFields("Job").AutoSort xlDescending, "Better than worst in " & _
"other projects by"
Warm Regards,
Amruta Bandekar

<b>If A equals success, then the formula is: A = X + Y + Z, X is work. Y is play. Z is keep your mouth shut. </b>
--Albert Einstein
djm
Participant
Posts: 68
Joined: Wed Mar 02, 2005 3:42 am
Location: N.Z.

Post by djm »

Hmmm. Okay,
(a) which version of Excel are you using
(b) What did you use as your "reference" for the import? (I'm wondering if there is a character in the value that makes it stumble).

David.
Sunshine2323
Charter Member
Charter Member
Posts: 130
Joined: Mon Sep 06, 2004 3:05 am
Location: Dubai,UAE

Post by Sunshine2323 »

Hi David,
(a) which version of Excel are you using
Microsoft Excel 2003
(b) What did you use as your "reference" for the import? (I'm wondering if there is a character in the value that makes it stumble).
I have entered 1 as the reference number
Warm Regards,
Amruta Bandekar

<b>If A equals success, then the formula is: A = X + Y + Z, X is work. Y is play. Z is keep your mouth shut. </b>
--Albert Einstein
djm
Participant
Posts: 68
Joined: Wed Mar 02, 2005 3:42 am
Location: N.Z.

Use file created by JobLogAnalysis as spreadsheet input

Post by djm »

Clarification: the input for the spreadsheet is the file that is created by the DataStage job JobLogAnalysis, not that from JobLogDump.

(Thanks to Sunshine2323 for the comms via messages).

David
djm
Participant
Posts: 68
Joined: Wed Mar 02, 2005 3:42 am
Location: N.Z.

Post by djm »

A newer zip file has been added that now contains all of the required DataStage objects (in XML format, for import via DataStage manager) to simplify getting the spreadsheet and objects in one hit.

David

Image
RodBarnes
Charter Member
Charter Member
Posts: 182
Joined: Fri Mar 18, 2005 2:10 pm

Post by RodBarnes »

Ok, for us Windows guys who don't know much unix: Can you explain the purpose of the single_row_sc stage? All I see is that it is sourcing from /dev/null. I know what /dev/null is but don't understand its purpose as a source file here. Thanks.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

For Windows replace all /dev/null with .\NUL - there is a NUL file in every folder.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
RodBarnes
Charter Member
Charter Member
Posts: 182
Joined: Fri Mar 18, 2005 2:10 pm

Post by RodBarnes »

ray.wurlod wrote:For Windows replace all /dev/null with .\NUL - there is a NUL file in every folder.
Thanks for the info but I already tried that. It is not able to open that as a source file. But I'll keep looking into it.

I thought if I understood what the job was attempting to do, there are likely other ways to get there. For example, if it is just generating a single row(?), this can be done from a transform without having to read from nul.
narasimha
Charter Member
Charter Member
Posts: 1236
Joined: Fri Oct 22, 2004 8:59 am
Location: Staten Island, NY

Post by narasimha »

David,

First thing, your application rocks!
But a slight problem, I get a RT when I click "Process", I could go in and debug the code but I dont think it is a good idea.
I do get my output file with a lot of useful information.
But due to the RT I think I might be missing something.(May be not)

I am using
- Microsoft Excel 2003
- File from the file from JobLogAnalysis and not from JobLogDump job.
Can you suggest the solution please.

If I click debug it takes me to this line (Does that ring a bell)

Code: Select all

Set schedule_data_lo = ThisWorkbook.Names("schedule").RefersToRange
This is the RT error I get -

Image

P.S. I have made some changes/added funtionalities to pick up jobs from a particular category.
Narasimha Kade

Finding answers is simple, all you need to do is come up with the correct questions.
djm
Participant
Posts: 68
Joined: Wed Mar 02, 2005 3:42 am
Location: N.Z.

Post by djm »

Rod,

you are correct, the purpose of single_row_sc is to generate a single row. You have two choices:
(a) In the zip file there are three objects beginning with the name "singlerow". By default, "singlerow_20051003.xml" is the Unix one. Within DataStage, if you delete the object "SingleRow", take a copy of the object "singlerowwindows" and rename it to "SingleRow" that will give you the Windows version
(b) By all means use your own transform.


Narasimha,

you said you made same changes to pick up jobs from a particular categor. Can you try running the original spreadsheet with the original output generated by JobLogAnalysis and let me know whether or not that is successful.


David
(Previously known as D)

Be alturistic and donate your spare CPU cycles to research. http://www.worldcommunitygrid.org/team/ ... TZ9H4CGVP1
Post Reply