Sort and Filter gives different results run to run

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
ppp
Participant
Posts: 21
Joined: Mon Aug 31, 2009 11:53 am

Sort and Filter gives different results run to run

Post by ppp »

I am performing a Sort and then Filter to remove duplicates.

However I have noticed that with every run the results change by 1 record.
The output from filter after the sort give 15 rows (where keychange=1) for run 1 and produces 2 records rejected i.e. (where keychange=0).

But for Run 2, 3 records are rejected (where keychange=0) and output from filter is 14 records (where keychange=1) .

Can you please help me understand why the result changes from run to run. Is there any particular partitioning or environment variable I should be using?
Thank you
aartlett
Charter Member
Charter Member
Posts: 152
Joined: Fri Apr 23, 2004 6:44 pm
Location: Australia

Post by aartlett »

G'day PPP,
First thing that comes to mind is a partitioning error.

1) Are you partitioning going into the sort (or carrying it Same from a previous partitioning)
2) Is the partitioning against the first part(s) of the sort

These are the 2 most common reasons for this error I have seen

3) Is it null effected
(No 3 reason).

I'm sure others here will have other scenarios, but these are the most common I have seen before doing a deep drill.
Andrew

Think outside the Datastage you work in.

There is no True Way, but there are true ways.
ppp
Participant
Posts: 21
Joined: Mon Aug 31, 2009 11:53 am

Post by ppp »

Thank you for the reply.

I am using the default partitioning for both Sort and Filter - Auto.
And there are no NULLs.
ssreeni3
Participant
Posts: 29
Joined: Fri May 18, 2012 1:35 am

Post by ssreeni3 »

Hi PPP,
Please specify the records of input,output and rejects for better understanding.
--------------------------------
Srini
aartlett
Charter Member
Charter Member
Posts: 152
Joined: Fri Apr 23, 2004 6:44 pm
Location: Australia

Post by aartlett »

ppp wrote:I am using the default partitioning for both Sort and Filter - Auto.
And there are no NULLs.
"Well there's your problem".

You need to partition, I normally go Hash, on the first field of the sort if it's a large enough domain.
Andrew

Think outside the Datastage you work in.

There is no True Way, but there are true ways.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Actually the first thing that came to my mind was the question "are you processing the same data in each run?"
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply