Author |
Message |
Gius
Participant
Joined: 09 Mar 2015
Posts: 23
Points: 321
|
|
DataStage® Release: 11x |
Job Type: Parallel |
OS: Windows |
Additional info: Send duplicates Error (for a key), to Error |
Hello,
I 'm reading a file, where the fields , Person Identifier and the Item Identifier, are keys.
I would like to send to Error (for example an output file), all the duplicates records and just keep one (max load date). But don't know really how to do that , could you help me please ?
Thank You -
|
|
|
|
 |
chulett
 since January 2006
Group memberships: Premium Members, Inner Circle, Server to Parallel Transition Group
Joined: 12 Nov 2002
Posts: 42878
Location: Denver, CO
Points: 221062
|
|
|
|
|
|
Okay, what have you tried and what issues are you facing?
There is the
Remove Duplicates
stage which can retain one record per "key" checked for duplicates as long as they are sorted properly. Don't know if it supports any kind of a reject link or not, didn't see it during a quick check of the docs. Failing that... perhaps a mechanism to take the deduped result and join it back to the original source to get your "all the duplicates" for your error file?
Let's see what other folks think who may actually have had to do something like this.
|
_________________ -craig
The Old Ones were, the Old Ones are, and the Old Ones shall be. Not in the spaces we know, but between them. They walk serene and primal, undimensioned and to us unseen.
|
|
|
 |
rumu
Participant
Joined: 06 Jun 2005
Posts: 282
Points: 2830
|
|
|
|
|
|
In a similar kind of requirement, I used Sort stage. Sort your data based on
Person Identifier, the Item Identifier and load date(descending). Use Create key change column, this will assign 1 for the first record in the group and for remaining 0. Use transformer stage to output 2 links one for Records with 1 and other for records with 0.The last link will be your error output.
|
_________________ Rumu
IT Consultant |
|
|
 |
chulett
 since January 2006
Group memberships: Premium Members, Inner Circle, Server to Parallel Transition Group
Joined: 12 Nov 2002
Posts: 42878
Location: Denver, CO
Points: 221062
|
|
|
|
|
|
That's another approach but may not satisfy the "all duplicates" requirement... which I assume also includes the one to keep. So, if there is only one occurrence of a key, there's no error. However, if there are 12 occurrences, than I'm thinking they want one specific one (max load date) to be kept but
all 12
should go to the error file rather than 11. Really depends on what "all the duplicate records" means in the OP. TBH, I may be over-thinking things.
Gius, can you clarify please?
|
_________________ -craig
The Old Ones were, the Old Ones are, and the Old Ones shall be. Not in the spaces we know, but between them. They walk serene and primal, undimensioned and to us unseen.
|
|
|
 |
Gius
Participant
Joined: 09 Mar 2015
Posts: 23
Points: 321
|
|
|
|
|
|
yes thank you, the sort with create key change column worked fine.
Another solution was to do a Loop Condition within the Transform.
The requirement was if 12 records dup, 11 to error and 1 (max Date) to output.
If I had to send the 12 to Error, I think I would do a join after , between the unique key of the 11 Error rec with all records.
|
|
|
|
 |
|