Grouping in transformer

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
manoj_23sakthi
Participant
Posts: 47
Joined: Tue Feb 23, 2010 12:16 am
Location: CHENNAI

Grouping in transformer

Post by manoj_23sakthi »

Hi,

could you please help me to solve this .

In particular group if any value changes I have to pass the particular group in a link 2 . If value doesn't change pass it in link1

Input :
-------
Key | Value
A|01
A|01
A|01
B|01
B|01
B|02
C|01
D|02
D|03

Output:
-------

Link 1 :
A|01
A|01
A|01
C|01

Link 2:
B|01
B|01
B|02
D|02
D|03

I tried in transformer looping to achieve this by last row in group , i am not able to achieve this ..
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Since you need to process the entire group before you know what link any of them should go down, I'd suggest a "fork join" design where you evaluate the groups for the number of distinct values and then use that as a lookup for the main data flow. If that Key has 1 distinct value, then route all in that group to Link 1. More than 1 distinct value? Link 2.
-craig

"You can never have too many knives" -- Logan Nine Fingers
manoj_23sakthi
Participant
Posts: 47
Joined: Tue Feb 23, 2010 12:16 am
Location: CHENNAI

Post by manoj_23sakthi »

Yes
even if i have more than one distinct value i have to send in link 1
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

No, according to your text and samples more than one distinct value in the "Value" column for a Key makes it go to Link 2. So in your example, A has three values but only one distinct value -> Link 1. B has three values but two distinct values -> Link 2.

The fork join will handle all that for you. The lookup should have one row per Key value with a count of distinct Values that Key contains.
-craig

"You can never have too many knives" -- Logan Nine Fingers
manoj_23sakthi
Participant
Posts: 47
Joined: Tue Feb 23, 2010 12:16 am
Location: CHENNAI

Post by manoj_23sakthi »

Hi,
Even if it is distinct or duplicate ,i have to retain if a key contains same value in a group in link 1,Else retain the group in link 2
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

I think there's some confusion around what the word "distinct" means. Regardless, as noted twice now, a fork join design will get you your desired outcome.
-craig

"You can never have too many knives" -- Logan Nine Fingers
manoj_23sakthi
Participant
Posts: 47
Joined: Tue Feb 23, 2010 12:16 am
Location: CHENNAI

Post by manoj_23sakthi »

distinct (unique in group )
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Use a fork join design, as already indicated. Search DSXchange for details about how to do fork join designs.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
ssnegi
Participant
Posts: 138
Joined: Thu Nov 15, 2007 4:17 am
Location: Sydney, Australia

Post by ssnegi »

From source have copy stage. create two outputs from copy stage output1 & output2.
From output2--> Remove Duplicate stage --> hash partitioned sorted key, sorted only value
Then from remove duplicate stage -->
Aggregator stage-->partitioning same --> Group Key-->count rows.
Then join output1 from copy to output from aggregator based on key.
Then filter constraint count = 1 in link1 (having same values) and count > 1 in link2 (having different values).
Post Reply