Page 1 of 1

Sort Stage Question

Posted: Fri May 03, 2013 1:15 pm
by sriec12
I have a small question on SORT stage. I am trying to get only unique records but in my stage property I used Execution mode as Parallel.

Here my question is ?

Does it really removes duplicates after sorting or should i change to sequential ?

Posted: Fri May 03, 2013 2:40 pm
by ray.wurlod
It really removes duplicates. However, it is sorting each node separately so, to get the results you require, your data need to be partitioned based on the first sort key (or more, if that has fewer values than your number of nodes).

Posted: Fri May 03, 2013 5:49 pm
by prasson_ibm
Sort stage is used for sorting the data,if you want to remove duplicate records user "remove dupicate" stage.you have to only make sure all similar records should land up to same partition.

Posted: Fri May 03, 2013 7:53 pm
by chulett
prasson_ibm wrote:Sort stage is used for sorting the data,if you want to remove duplicate records user "remove dupicate" stage.
Or you can sort and remove duplicates while sorting by setting the "Allow Duplicates" option to False in the Sort stage, i.e. a unique sort akin to a sort -u in UNIX. Of course, you have more control over which duplicates are removed using the RD stage, if you need that.

Posted: Mon May 06, 2013 12:33 am
by chandra.shekhar@tcs.com
If removing duplicates is your primary goal then you can use either of them
Sort Stage, Remove Duplicates, Transformer, Copy.