Removed duplicate strings in a column

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
Harini
Premium Member
Premium Member
Posts: 80
Joined: Tue Mar 16, 2010 1:32 am

Removed duplicate strings in a column

Post by Harini »

Hi All,

Am trying to concatenate a error description column, based on key column.

Say my data is like this.

ID, Number, Error Code
100, MD350, Invalid Amount
101, MD310, Invalid Account
102, MD350, Invalid Code
103, MD400, Invalid Transaction
104, MD350, Invalid Code

I wanted combine the error code for all the transaction which has same number. Eg:

My output would look like:

100, MD350, Invalid Amount, Invalid Code, Invalid Code

I am able to achieve this using Sort, Key change and comparing previous and current values. The problem now i face is to remove duplicates there.

Say if i want to remove the 'Invalid Code' which is repeating twice there. Is there a way we can look for a string which is repeating in a column and remove it?

Thanks.
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

There are several ways to do this, but I'd keep it simple and use the INDEX() function before adding a new error code to the string holding the error codes. If "INDEX(ConcatenatedString,In.ErrorCode,1)=0 THEN {add} ELSE {don't add}".
ssnegi
Participant
Posts: 138
Joined: Thu Nov 15, 2007 4:17 am
Location: Sydney, Australia

Post by ssnegi »

you can use count function on the previous string.
if count(Previous String, "Current String") > 0 then don't concatenate else concatenate.
Harini
Premium Member
Premium Member
Posts: 80
Joined: Tue Mar 16, 2010 1:32 am

Post by Harini »

Premium membership expired :(

Actually, the way, i would have to identify multiple errors would be in the same variable. Eg: If a record is going through multiple lookups, i check for the derived values which is null in the transformer in the same variable and am populating it.
mesuku
Participant
Posts: 0
Joined: Sun Jun 22, 2014 7:57 am

Post by mesuku »

You can get this output by deleting duplicates ( Second & Third Columns in source) and then concatinate the error code coumn data using sort and transformer stage


Thanks
Suresh
shepli
Participant
Posts: 79
Joined: Fri Dec 17, 2004 9:56 am

Post by shepli »

1) Sort the data with the Number and (then) ID as key
2) In a transformer, use a Stage Variable, e.g. MySV, to hold the "ID, Number, Error Code" values from the first row (no output).
2) Go thru the dataset row by row,
If the Nunber in a current row is the same as the previous row, concatenate the Error Code to the Stage Variable, no output (Note: you may not want to use comma as delimiter for the Error Code because it may created multiple columns);
If the Number is different from the previous row, output the value held in the Stage Variable and reset the Stage Variable to the "ID, Number, Error Code" values from the currentt row.
3) You need to have a way to handle the last row.

Hope the above helps.
Post Reply