Sort Stage Question

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
sriec12
Participant
Posts: 56
Joined: Mon Nov 01, 2010 5:34 pm

Sort Stage Question

Post by sriec12 »

I have a small question on SORT stage. I am trying to get only unique records but in my stage property I used Execution mode as Parallel.

Here my question is ?

Does it really removes duplicates after sorting or should i change to sequential ?
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

It really removes duplicates. However, it is sorting each node separately so, to get the results you require, your data need to be partitioned based on the first sort key (or more, if that has fewer values than your number of nodes).
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
prasson_ibm
Premium Member
Premium Member
Posts: 536
Joined: Thu Oct 11, 2007 1:48 am
Location: Bangalore

Post by prasson_ibm »

Sort stage is used for sorting the data,if you want to remove duplicate records user "remove dupicate" stage.you have to only make sure all similar records should land up to same partition.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

prasson_ibm wrote:Sort stage is used for sorting the data,if you want to remove duplicate records user "remove dupicate" stage.
Or you can sort and remove duplicates while sorting by setting the "Allow Duplicates" option to False in the Sort stage, i.e. a unique sort akin to a sort -u in UNIX. Of course, you have more control over which duplicates are removed using the RD stage, if you need that.
-craig

"You can never have too many knives" -- Logan Nine Fingers
chandra.shekhar@tcs.com
Premium Member
Premium Member
Posts: 353
Joined: Mon Jan 17, 2011 5:03 am
Location: Mumbai, India

Post by chandra.shekhar@tcs.com »

If removing duplicates is your primary goal then you can use either of them
Sort Stage, Remove Duplicates, Transformer, Copy.
Thanx and Regards,
ETL User
Post Reply