Removing duplicates with RCP

chanaka · Post by **chanaka** » Thu May 27, 2021 10:48 am

Dear Experts,

I have a multi-instance DataStage PX job that uses RCP to populate an oracle table from a dataset. We do not have any keys/columns defined. However the DataSet is having duplicate records (whole record duplicates) due to source system issues. Is there anyway for us to get the duplicate records removed in the flow? As there are no column metadata in the job, it is not possible to use remove duplicate or sort stage. Any suggestion is very much appreciated.

Note: DataSets are created with a very complex multiple hierarchy stage job with very large number of columns and it is very difficult to maintain it if we are do the remove duplicate there. Hence we are trying to do this with the least impactful way.

IBM Analytics Champion 2009 - 2020 · Post by **asorrell** » Fri May 28, 2021 11:48 am

There is no way to remove duplicates without defining the columns that need duplicates removed. You'll need to define those columns so the data can be sorted and duplicates can be removed. RCP is easy for lift and shift, but details are required for those operations.

DSXchange

Removing duplicates with RCP

Removing duplicates with RCP

Re: Removing duplicates with RCP