Merging multiple rows into single row

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
gayathrisivakumar
Premium Member
Premium Member
Posts: 60
Joined: Thu Dec 17, 2009 1:36 am

Merging multiple rows into single row

Post by gayathrisivakumar »

Hi,'

I have a requirement to merge multiple rows into one row. I tried using stage variables but it seems that logic is going wrong somewhere.

My Input Data
ColA ColB ColC
1 A 100
1 B 200
1 C 300

Desired Output
1,A,100,B,200,C,300

How can I achieve this?

I tried comparing Currkey = PrevKey and appending values to stage variables, but some of the records give desired output and others not.

Can anybody please help.
asorrell
Posts: 1707
Joined: Fri Apr 04, 2003 2:00 pm
Location: Colleyville, Texas

Post by asorrell »

You are using the correct approach. Did you hash partition and sort on your key? (The column that holds the "1" in the example)

If you are, can you show some example failures or your stage variable logic?
Andy Sorrell
Certified DataStage Consultant
IBM Analytics Champion 2009 - 2020
gayathrisivakumar
Premium Member
Premium Member
Posts: 60
Joined: Thu Dec 17, 2009 1:36 am

Post by gayathrisivakumar »

Thanks for the quick reply.

My job design is like this:

seqfile ->sort ->transformer->remove duplictes->transformer->seqfile

This logic works fine

when

colA colB colC
1 A 10
1 B 20
2 C 30
2 D 40

Output comes as 1,A,10,B,20
2,c,30,D,40

But it fails when
colA colB colC
1 A 10
1 B 20
2 C 30
2 D 40
3 E 50

Output is coming as 1,A,10,B,20
2,C,30,D,40
3,A,C,10,B,20,E,50.

I am doing hash partitioning based on colA.Inside the stage variable if CurrKey = PrevKey then concatenate input columns into the stage variable.

Please let me know where am I going wrong.

Regards,
Gayathri
srinivas.nettalam
Participant
Posts: 134
Joined: Tue Jun 15, 2010 2:10 am
Location: Bangalore

Post by srinivas.nettalam »

Just to make sure nothing wrong in your derivation.Enable KeyChange Column in sort and write your derivation like

Code: Select all

svCurrent = If KeyChange=1 Then ColB:',':ColC Else svPrev:', ':ColB:',':ColC
svPrev=svCurrent
N.Srinivas
India.
gayathrisivakumar
Premium Member
Premium Member
Posts: 60
Joined: Thu Dec 17, 2009 1:36 am

Post by gayathrisivakumar »

Yes I am doing the same derivation. The logic goes wrong from the third change in key. Anybody has any insight?
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

If you run this on a single node, does it run correctly?
-craig

"You can never have too many knives" -- Logan Nine Fingers
gayathrisivakumar
Premium Member
Premium Member
Posts: 60
Joined: Thu Dec 17, 2009 1:36 am

Post by gayathrisivakumar »

when I try to run it on single node the results are completely different.

1,A,C,10,B,20,C,30,D,40,E,50.
boxtoby
Premium Member
Premium Member
Posts: 138
Joined: Mon Mar 13, 2006 5:11 pm
Location: UK

Post by boxtoby »

When I have done this sort of thing in the past I have had success but doing the following:

Sort the data by colA,hash partition by colA and set the key change indicator on.

I use three stage vars as follows:

svLastRowInGroup:
LastRowInGroup(colA)

(LastRowInGroup() function is set to 1 for true, zero for false)

svDataOut
if svLastRowInGroup and keyChange then colB:",":colC else if svLastRowInGroup then svDataOut:",":colB:",":colC else ""

(This logic checks single row records as well)

svData
if keyChange then colB:",":colC else svData:",":colB:",":colC

The transformer has a constraint of svLastRowInGroup
Bob Oxtoby
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

gayathrisivakumar wrote:when I try to run it on single node the results are completely different.
Then your logic is flawed. You'd need to supply the details of your job design to allow anyone to help without continuing to guess at this point.

What field(s) did you specify as the change keys in the Sort stage? Is the KeyChange column being set correctly? Post your exact stage variables and derivations and the output link constraint you are using.

ps. You don't need "current" and "prev" variables when the Sort stage is setting the KeyChange column.
Last edited by chulett on Wed Aug 28, 2013 7:15 am, edited 1 time in total.
-craig

"You can never have too many knives" -- Logan Nine Fingers
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Was going to get to the 'LastRowInGroup' part as something needs to push out the last 'group' when taking this approach but wanted to check the version you are running as it was included starting with 8.5.
-craig

"You can never have too many knives" -- Logan Nine Fingers
boxtoby
Premium Member
Premium Member
Posts: 138
Joined: Mon Mar 13, 2006 5:11 pm
Location: UK

Post by boxtoby »

ChangeKey would be colA

I wouldn't have thought the logic was that difficult to follow.....
Bob Oxtoby
gayathrisivakumar
Premium Member
Premium Member
Posts: 60
Joined: Thu Dec 17, 2009 1:36 am

Post by gayathrisivakumar »

Thanks all for your help.

Craig I am using Datastage 8.7. I couldn't read your whole post as I am yet to renew my membership.

I rebuilt the whole job again based on your suggestions and now it is working. :)

Since I was busy I dint get time to debug my existing code.
Post Reply