DSGetStageInfo giving inaccurate results

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
thompsonp
Premium Member
Premium Member
Posts: 205
Joined: Tue Mar 01, 2005 8:41 am

DSGetStageInfo giving inaccurate results

Post by thompsonp »

Jobs that work as expected on v8.1 have been migrated to v11.5
As part of an after job subroutine the link counts are examined using

Code: Select all

DSGetLinkInfo(JobHandle,ThisStage,ThisLink,DSJ.LINKROWCOUNT)
Based on the naming convention of the links a simple check is performed which is essentially sum of inputs = sum of outputs

In v8.1 this is working, but in v11.5 some jobs report incorrect numbers on links.
As an example a link writing to a dataset might report 5000 records in the after job subroutine, but if I open the dataset in DataSet Management the number of records will be as expected, say 9500.
As a consequence the reconciliation fails and the subroutine aborts the job.
If I manually run the same code in the subroutine after the job has completed it is also giving the wrong link counts.

I have raised a case with IBM and their feedback so far is that JobMonApp can suffer a lag on heavily loaded systems. This system however is new and only being used by me (for now) and is not at all heavily utilised when these failures occur.

I'd appreciate any suggestions about what could be going wrong whilst I await further feedback and suggestions from IBM.

Also if anyone knows the mechanism by which the link counts are captured and stored please can you explain the process; that might point to somewhere else I can look to try and identify the underlying issue.

All I have been able to see is that JobMonApp.log is being written to with details of the counts very frequently. Does another process examine this file and store the results elsewhere when the job completes?

In an effort to workaround the problem during initial testing on v11.5 I added a sleep 10 to the subroutine and it hasn't happened since, but that's not a long term solution.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Re: DSGetStageInfo giving inaccurate results

Post by chulett »

thompsonp wrote:In an effort to workaround the problem during initial testing on v11.5 I added a sleep 10 to the subroutine and it hasn't happened since, but that's not a long term solution.
I know, right? The long term solution might be a sleep 30. :wink:

Sorry, wish I had something more useful to add. Others may know the gory details of how it all works under the covers, all I seem to recall is it getting them from one of the job related hashed files in the project (or maybe XMETA now) but don't quote me on that.
-craig

"You can never have too many knives" -- Logan Nine Fingers
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Check that the 11.5 version is not returning a dynamic array of row counts, with one element per node.

If it is, you could apply a SUM() function to the dynamic array.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
thompsonp
Premium Member
Premium Member
Posts: 205
Joined: Tue Mar 01, 2005 8:41 am

Post by thompsonp »

With option DSJ.LINKROWCOUNT the count is returned as a single value.
If I change that to DSJ.INSTROWCOUNT I get a comma separated list of counts for each partition.

Does anyone know what these DataStage Basic functions are examining to get the results? Is it something I can check using some other mechanism?

I could see counts being sent to JobMonApp.log, but are the counts from there written elsewhere before the DS basic functions are able to retrieve them?

Thanks
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Historically the row counts were stored in the RT_STATUSnnn table for the job. There are separate records for each stage and link in that table, and the structure of the records has always been undocumented (and are different for job records, stage records and link records).

I don't know whether this storage mechanism is still the case in version 11.x.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
thompsonp
Premium Member
Premium Member
Posts: 205
Joined: Tue Mar 01, 2005 8:41 am

Post by thompsonp »

The issue went away for a while but has now resurfaced in a test environment.
It's back with IBM support with all kinds of tracing and debugging enabled.
Post Reply