how to remove the duplicate records
Moderators: chulett, rschirm, roy
how to remove the duplicate records
Hi,
How can i remove the duplicate records by using a sequential stage componenet.
my flow is like this...
Sequential stage--->transformer--->sequential stage/Oracle OCI
suppose i am using a flat file as a source and it has having some duplicate records.I need to remove those duplicate records in tranformer stage and want to insert the clean records into the target file or target table.
I need your help .Please give me some idea how to comeout from this problem.Its very urgent to me.
Regards
Prithivi
How can i remove the duplicate records by using a sequential stage componenet.
my flow is like this...
Sequential stage--->transformer--->sequential stage/Oracle OCI
suppose i am using a flat file as a source and it has having some duplicate records.I need to remove those duplicate records in tranformer stage and want to insert the clean records into the target file or target table.
I need your help .Please give me some idea how to comeout from this problem.Its very urgent to me.
Regards
Prithivi
Use a UNIX level sort (or if you really want to, use a sort stage) to sort your input data - optionally the sort program can and will remove duplicate records for you.
If your data is sorted, then you can use a stage variable in a transform stage to compare the current record with the previously read one and to not pass than on to the subsequent stage.
If your data is sorted, then you can use a stage variable in a transform stage to compare the current record with the previously read one and to not pass than on to the subsequent stage.
-
- Participant
- Posts: 3337
- Joined: Mon Jan 17, 2005 4:49 am
- Location: United Kingdom
ArndW wrote:Use a UNIX level sort (or if you really want to, use a sort stage) to sort your input data - optionally the sort program can and will remove duplicate records for you.
If your data is sorted, then you can use a stage variable in a transform stage to compare the current record with the previously read one and to not pass than on to the subsequent stage.
prithivi-- Can u tell me briefly.I have used the sort stage and getting the data in sorted order.Then after that how can i check the duplicate records through the stage variable.
need more infomation about it.
Prithivi
You already have needful information above. Are you trying to do what you are trying to do? Or you want someone to do it for you ?
Here is one solution:
Use filter command(sort command) in sequential file stage.
IN SEQFileStage-------->Xfm-------->OUT SEQFileStage
Open IN SEQFileStage and click on stage tab and check on Stage uses fileter command. Now click on the output tab and write your sort command in filter command box.
Your sort command should be:
sort -u <positions of sort keys>
You don't have to redirect it to newfile. It will read from stdin.
This fileter command will dedupe your input file. And you will write the resultant records to another file.
Kris~
Here is one solution:
Use filter command(sort command) in sequential file stage.
IN SEQFileStage-------->Xfm-------->OUT SEQFileStage
Open IN SEQFileStage and click on stage tab and check on Stage uses fileter command. Now click on the output tab and write your sort command in filter command box.
Your sort command should be:
sort -u <positions of sort keys>
You don't have to redirect it to newfile. It will read from stdin.
This fileter command will dedupe your input file. And you will write the resultant records to another file.
Kris~
-
- Participant
- Posts: 3337
- Joined: Mon Jan 17, 2005 4:49 am
- Location: United Kingdom
-
- Charter Member
- Posts: 130
- Joined: Mon Sep 06, 2004 3:05 am
- Location: Dubai,UAE
Hi,
Please refer to the below post for more answers
viewtopic.php?t=92746&highlight=duplicate
Hope this helps![Smile :)](./images/smilies/icon_smile.gif)
Please refer to the below post for more answers
viewtopic.php?t=92746&highlight=duplicate
Hope this helps
![Smile :)](./images/smilies/icon_smile.gif)
Warm Regards,
Amruta Bandekar
<b>If A equals success, then the formula is: A = X + Y + Z, X is work. Y is play. Z is keep your mouth shut. </b>
--Albert Einstein
Amruta Bandekar
<b>If A equals success, then the formula is: A = X + Y + Z, X is work. Y is play. Z is keep your mouth shut. </b>
--Albert Einstein
We can specify positions as well and dedupe occordingly.Sainath.Srinivasan wrote:Note : sort -u as such performs a full row comparison.
Example on fixed width file: sort on two keys with priority order, one being from position 45 to 57 and other being from position 1 to 2
Code: Select all
sort -u +0.44 -0.57 +0.0 -0.3