Hi All,
Am trying to concatenate a error description column, based on key column.
Say my data is like this.
ID, Number, Error Code
100, MD350, Invalid Amount
101, MD310, Invalid Account
102, MD350, Invalid Code
103, MD400, Invalid Transaction
104, MD350, Invalid Code
I wanted combine the error code for all the transaction which has same number. Eg:
My output would look like:
100, MD350, Invalid Amount, Invalid Code, Invalid Code
I am able to achieve this using Sort, Key change and comparing previous and current values. The problem now i face is to remove duplicates there.
Say if i want to remove the 'Invalid Code' which is repeating twice there. Is there a way we can look for a string which is repeating in a column and remove it?
Thanks.
Removed duplicate strings in a column
Moderators: chulett, rschirm, roy
There are several ways to do this, but I'd keep it simple and use the INDEX() function before adding a new error code to the string holding the error codes. If "INDEX(ConcatenatedString,In.ErrorCode,1)=0 THEN {add} ELSE {don't add}".
<a href=http://www.worldcommunitygrid.org/team/ ... TZ9H4CGVP1 target="WCGWin">
</a>
</a>
1) Sort the data with the Number and (then) ID as key
2) In a transformer, use a Stage Variable, e.g. MySV, to hold the "ID, Number, Error Code" values from the first row (no output).
2) Go thru the dataset row by row,
If the Nunber in a current row is the same as the previous row, concatenate the Error Code to the Stage Variable, no output (Note: you may not want to use comma as delimiter for the Error Code because it may created multiple columns);
If the Number is different from the previous row, output the value held in the Stage Variable and reset the Stage Variable to the "ID, Number, Error Code" values from the currentt row.
3) You need to have a way to handle the last row.
Hope the above helps.
2) In a transformer, use a Stage Variable, e.g. MySV, to hold the "ID, Number, Error Code" values from the first row (no output).
2) Go thru the dataset row by row,
If the Nunber in a current row is the same as the previous row, concatenate the Error Code to the Stage Variable, no output (Note: you may not want to use comma as delimiter for the Error Code because it may created multiple columns);
If the Number is different from the previous row, output the value held in the Stage Variable and reset the Stage Variable to the "ID, Number, Error Code" values from the currentt row.
3) You need to have a way to handle the last row.
Hope the above helps.