Removing duplicates with RCP

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
chanaka
Premium Member
Premium Member
Posts: 96
Joined: Tue Sep 15, 2009 4:06 am
Location: United States

Removing duplicates with RCP

Post by chanaka »

Dear Experts,

I have a multi-instance DataStage PX job that uses RCP to populate an oracle table from a dataset. We do not have any keys/columns defined. However the DataSet is having duplicate records (whole record duplicates) due to source system issues. Is there anyway for us to get the duplicate records removed in the flow? As there are no column metadata in the job, it is not possible to use remove duplicate or sort stage. Any suggestion is very much appreciated.

Note: DataSets are created with a very complex multiple hierarchy stage job with very large number of columns and it is very difficult to maintain it if we are do the remove duplicate there. Hence we are trying to do this with the least impactful way.
asorrell
Posts: 1707
Joined: Fri Apr 04, 2003 2:00 pm
Location: Colleyville, Texas

Re: Removing duplicates with RCP

Post by asorrell »

There is no way to remove duplicates without defining the columns that need duplicates removed. You'll need to define those columns so the data can be sorted and duplicates can be removed. RCP is easy for lift and shift, but details are required for those operations.
Andy Sorrell
Certified DataStage Consultant
IBM Analytics Champion 2009 - 2020
Post Reply