remove duplicates using Transformer

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
neena
Participant
Posts: 90
Joined: Mon Mar 31, 2003 4:32 pm

remove duplicates using Transformer

Post by neena »

Hi,

I am trying to remove duplictes only using Transformer. In the input tab of transformer I am doing hash partitioning and doing perform sort. On one of key I am doing sort partitioning and then on other key I did just sorting Asc and then other column I did sorting Descending.

Key1 (Sorting,partitioning)
Key2(Sorting Asc)
column1(sorting Des)

Then I checked the Stable and Unique check box in the tab expecting to retain the first record when there are duplictes, but I don't see any duplicate records getting dropped.
Could any one please let me know how this stable and unique works because in documentation it is mentioned that if I check both stable and Unique the first duplicte record will be retained. Please let me know if I am missing anything or any other postes regardign this.
Any help would be really appreciated.
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

Are all 3 keys supposed to denote the duplicates or just the first or second keys?
neena
Participant
Posts: 90
Joined: Mon Mar 31, 2003 4:32 pm

Post by neena »

Its first and second keys, both of them.
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

But since the comparison is done on all 3 sorted columns you won't get duplicates...
neena
Participant
Posts: 90
Joined: Mon Mar 31, 2003 4:32 pm

Post by neena »

Thank you much, you are right I tested with only key 1 and key 2 and it worked just fine, removing the duplicates. I guess I has to use remove duplicate stage and retain the first record.
After the transformer stage I will use the same partitioning in the remove duplicate stage and retain the first record. Please let me know if thats not correct approach.
betterthanever
Participant
Posts: 152
Joined: Tue Jan 13, 2009 8:59 am

Post by betterthanever »

neena wrote:Thank you much, you are right I tested with only key 1 and key 2 and it worked just fine, removing the duplicates. I guess I has to use remove duplicate stage and retain the first record.
After the transformer stage I will use the same partitioning in the remove duplicate stage and retain the first record. Please let me know if thats not correct approach.
by default...the remove dups stage again inserts the sort operator...
neena
Participant
Posts: 90
Joined: Mon Mar 31, 2003 4:32 pm

Post by neena »

The reason I was trying to avoid using remove duplicate stage is because this is an existing code and I am trying to avoid adding stages.
What I did was, in transformer I did Hash partitioning and perform sort but didn't checked the stable and unique check boxe's.

Key1 (Sorting,partitioning)
Key2(Sorting Asc)
column1(sorting Des)

Next stage after this transformer is copy stage, so I used "same" partitioning in copy stage and checked perform sort, stable and Unique check boxes and selected the Key1 and Key2

Key1 (Sorting, Asc)
Key2(Sorting Asc)

It worked fine, but please let me know if there are any down sides of doing this.
Post Reply