Page 1 of 1

Duplicate records to Error

Posted: Thu Nov 01, 2018 11:04 pm
by Gius
Hello,
I 'm reading a file, where the fields , Person Identifier and the Item Identifier, are keys.
I would like to send to Error (for example an output file), all the duplicates records and just keep one (max load date). But don't know really how to do that , could you help me please ?
Thank You -

Posted: Fri Nov 02, 2018 5:32 am
by chulett
:?: Okay, what have you tried and what issues are you facing?

There is the Remove Duplicates stage which can retain one record per "key" checked for duplicates as long as they are sorted properly. Don't know if it supports any kind of a reject link or not, didn't see it during a quick check of the docs. Failing that... perhaps a mechanism to take the deduped result and join it back to the original source to get your "all the duplicates" for your error file?

Let's see what other folks think who may actually have had to do something like this.

Posted: Fri Nov 02, 2018 6:41 am
by rumu
In a similar kind of requirement, I used Sort stage. Sort your data based on

Person Identifier, the Item Identifier and load date(descending). Use Create key change column, this will assign 1 for the first record in the group and for remaining 0. Use transformer stage to output 2 links one for Records with 1 and other for records with 0.The last link will be your error output.

Posted: Fri Nov 02, 2018 7:28 am
by chulett
That's another approach but may not satisfy the "all duplicates" requirement... which I assume also includes the one to keep. So, if there is only one occurrence of a key, there's no error. However, if there are 12 occurrences, than I'm thinking they want one specific one (max load date) to be kept but all 12 should go to the error file rather than 11. Really depends on what "all the duplicate records" means in the OP. TBH, I may be over-thinking things. :wink:

Gius, can you clarify please?

Posted: Wed Nov 07, 2018 8:38 am
by Gius
yes thank you, the sort with create key change column worked fine.
Another solution was to do a Loop Condition within the Transform.

The requirement was if 12 records dup, 11 to error and 1 (max Date) to output.
If I had to send the 12 to Error, I think I would do a join after , between the unique key of the 11 Error rec with all records.