Trap Duplicate record

prasson_ibm · Post by **prasson_ibm** » Thu Sep 11, 2008 11:37 pm

Hi
I have record like this:-
Col1 Col2
1 a
1 b
1 c
2 d
2 e
2 f
I want to trap duplicate record in PX.
My output1 should be like this:-
Col1 Col2
1 b
1 c
2 e
2 f

and Output2 should be like this:-
Col1 Col2
1 a
2 d
kindly help me to implement this in parallel
Thanks in advance

ray.wurlod · Post by **ray.wurlod** » Thu Sep 11, 2008 11:50 pm

Have you undertaken a Search?

This question has been answered previously.

talk2shaanc · Post by **talk2shaanc** » Fri Sep 12, 2008 1:26 am

prasson_ibm wrote:Hi
I have record like this:-
Col1 Col2
1 a
1 b
1 c
2 d
2 e
2 f
I want to trap duplicate record in PX.
My output1 should be like this:-
Col1 Col2
1 b
1 c
2 e
2 f

and Output2 should be like this:-
Col1 Col2
1 a
2 d
kindly help me to implement this in parallel
Thanks in advance

Which of the two is the key column? I don't see a duplicate in your input if I assume that both Col1 and Col2 are part of keys

ray.wurlod · Post by **ray.wurlod** » Fri Sep 12, 2008 2:24 am

... on which basis you'd be safe enough to deduce that Col1 is the key for the purposes of identifying duplicates.

prasson_ibm · Post by **prasson_ibm** » Fri Sep 12, 2008 6:14 am

My Col1 is key col. and i want to send first repeated record to output2 and rest of the records to output1 on the basis of this key col....................i have trapped this using server stage but in parallel i m getting problem.........

mahadev.v · Post by **mahadev.v** » Fri Sep 12, 2008 6:34 am

Are we supposed to guess the problem you are facing in parallel jobs?

gabrielac · Post by **gabrielac** » Fri Sep 12, 2008 7:18 am

If I understood correctly the problem, I would divide it into two jobs.
1. Create Output 2, using remove duplicates.
2. Using Output 2 in a lookup, and create Output 1, with leftover records.
HTH,
Gaby

ray.wurlod · Post by **ray.wurlod** » Fri Sep 12, 2008 5:40 pm

It can be done with one parallel job, which has been explained (fully, I believe) in the past. In outline the technique involves splitting the stream of sorted rows using a Copy stage, sending the key through an Aggregator stage that counts them, then bringing the streams back together using a Join stage. In the Sort stage upstream you have generated a key change column; you use this it identify the first row from each group and the other rows from each group, using a Switch, Filter or Transformer stage.

DSXchange

Trap Duplicate record

Trap Duplicate record

Re: Trap Duplicate record

Re: Trap Duplicate record