Trap Duplicate record

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
prasson_ibm
Premium Member
Premium Member
Posts: 536
Joined: Thu Oct 11, 2007 1:48 am
Location: Bangalore

Trap Duplicate record

Post by prasson_ibm »

Hi
I have record like this:-
Col1 Col2
1 a
1 b
1 c
2 d
2 e
2 f
I want to trap duplicate record in PX.
My output1 should be like this:-
Col1 Col2
1 b
1 c
2 e
2 f

and Output2 should be like this:-
Col1 Col2
1 a
2 d
kindly help me to implement this in parallel
Thanks in advance
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Have you undertaken a Search?

This question has been answered previously.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
talk2shaanc
Charter Member
Charter Member
Posts: 199
Joined: Tue Jan 18, 2005 2:50 am
Location: India

Re: Trap Duplicate record

Post by talk2shaanc »

prasson_ibm wrote:Hi
I have record like this:-
Col1 Col2
1 a
1 b
1 c
2 d
2 e
2 f
I want to trap duplicate record in PX.
My output1 should be like this:-
Col1 Col2
1 b
1 c
2 e
2 f

and Output2 should be like this:-
Col1 Col2
1 a
2 d
kindly help me to implement this in parallel
Thanks in advance
Which of the two is the key column? I don't see a duplicate in your input if I assume that both Col1 and Col2 are part of keys
Shantanu Choudhary
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

... on which basis you'd be safe enough to deduce that Col1 is the key for the purposes of identifying duplicates.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
prasson_ibm
Premium Member
Premium Member
Posts: 536
Joined: Thu Oct 11, 2007 1:48 am
Location: Bangalore

Re: Trap Duplicate record

Post by prasson_ibm »

My Col1 is key col. and i want to send first repeated record to output2 and rest of the records to output1 on the basis of this key col....................i have trapped this using server stage but in parallel i m getting problem......... :(
mahadev.v
Participant
Posts: 111
Joined: Tue May 06, 2008 5:29 am
Location: Bangalore

Post by mahadev.v »

Are we supposed to guess the problem you are facing in parallel jobs?
"given enough eyeballs, all bugs are shallow" - Eric S. Raymond
gabrielac
Participant
Posts: 29
Joined: Mon Sep 26, 2005 3:39 pm

Post by gabrielac »

If I understood correctly the problem, I would divide it into two jobs.
1. Create Output 2, using remove duplicates.
2. Using Output 2 in a lookup, and create Output 1, with leftover records.
HTH,
Gaby
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

It can be done with one parallel job, which has been explained (fully, I believe) in the past. In outline the technique involves splitting the stream of sorted rows using a Copy stage, sending the key through an Aggregator stage that counts them, then bringing the streams back together using a Join stage. In the Sort stage upstream you have generated a key change column; you use this it identify the first row from each group and the other rows from each group, using a Switch, Filter or Transformer stage.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply