Aggregator showing different count for different runs

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
datastagedw
Participant
Posts: 53
Joined: Fri Mar 07, 2008 1:17 am

Aggregator showing different count for different runs

Post by datastagedw »

Hi All,

My job design is as follows:

dataset--->transformer---->aggreagator---->ODBC conector

I have hash partiitoning and perform sort on the aggregator. I have 4 key columns(Not Null) and one column whose sum I am finding out. Somehow I get different count coming from the aggregator stage everytime i run. The job runs on 4 nodes. Could this be because the data is wrong or due to partitioning issues. Actually one of the key columns had some null values and spaces, however we have handled them in the transformer before the aggregator stage and so no nulls are passing. However i fear this is because of any special characters in that key column which might not have been handled.

Any help is appreciated.
ETL DEVELOPER
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

It is in the nature of programs to do the same thing with the same data. If that is not happening, then usually some factor is affecting the outcome. In this case, are you 100% certain that your source dataset has the same cnotents and that every single parameter to the job is the same between runs? What about writing to a sequential file instead of the ODBC stage and doing a UNIX level "diff" of two runs so that you can simplify the problem.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Are you hash partitioning on those '4 key columns(Not Null)'?
-craig

"You can never have too many knives" -- Logan Nine Fingers
Raftsman
Premium Member
Premium Member
Posts: 335
Joined: Thu May 26, 2005 8:56 am
Location: Ottawa, Canada

Post by Raftsman »

If you have AUTO partitioning selected, would it not sort and partition before aggregating. I used to sort and partition the data prior to the aggregate and then I experimented with a few different ways. With AUTO my totals were identical even with 13 million rows.
Jim Stewart
Post Reply