Removing duplicates with RCP
Posted: Thu May 27, 2021 10:48 am
Dear Experts,
I have a multi-instance DataStage PX job that uses RCP to populate an oracle table from a dataset. We do not have any keys/columns defined. However the DataSet is having duplicate records (whole record duplicates) due to source system issues. Is there anyway for us to get the duplicate records removed in the flow? As there are no column metadata in the job, it is not possible to use remove duplicate or sort stage. Any suggestion is very much appreciated.
Note: DataSets are created with a very complex multiple hierarchy stage job with very large number of columns and it is very difficult to maintain it if we are do the remove duplicate there. Hence we are trying to do this with the least impactful way.
I have a multi-instance DataStage PX job that uses RCP to populate an oracle table from a dataset. We do not have any keys/columns defined. However the DataSet is having duplicate records (whole record duplicates) due to source system issues. Is there anyway for us to get the duplicate records removed in the flow? As there are no column metadata in the job, it is not possible to use remove duplicate or sort stage. Any suggestion is very much appreciated.
Note: DataSets are created with a very complex multiple hierarchy stage job with very large number of columns and it is very difficult to maintain it if we are do the remove duplicate there. Hence we are trying to do this with the least impactful way.