grouping

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
dspxlearn
Premium Member
Premium Member
Posts: 291
Joined: Sat Sep 10, 2005 1:26 am

grouping

Post by dspxlearn »

Hi all,

I have a requirement where, i have some groups coming up from the source and i have to capture the number of groups and pass the count further. I believe we cannot do this in aggregator because it will count the no of records for each group.
Ex:

Col1:- 10,10,10,10
col1:- 20,20
Col1:- 30,30,30

I want the value as 3 as there are 3 groups.But when we use aggregator by using 'group by column' as col1 and using 'type of calucation as = row count' i will get 4,2,3.

So, how do i achieve this.
Thanks and Regards!!
dspxlearn
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

The sample data you have listed is somewhat unclear - you have 3 rows with differing number of columns. A "group" in this context is normally a set of records where one column contains the same value. Is it "Col1:-" in this case? Or have you displayed only values for "Col1"?
Could you explain just a bit in detail what your data values are and what your "groups" are; I think you are not trying to do something overly complex and it can be done in an aggregator stage or even with a transform stage.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Two Aggregator stages - one to form the groups and the next to count them?
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
kumar_s
Charter Member
Charter Member
Posts: 5245
Joined: Thu Jun 16, 2005 11:00 pm

Post by kumar_s »

Or with the same Aggregator stage, with additional Dummy column with a value '1' and sum it up.
Or with the sort stage with couple of columns defined as Key and use KeyValueChange option to differentiate a group.
Impossible doesn't mean 'it is not possible' actually means... 'NOBODY HAS DONE IT SO FAR'
dspxlearn
Premium Member
Premium Member
Posts: 291
Joined: Sat Sep 10, 2005 1:26 am

Post by dspxlearn »

Hi thanks for your replies..

ArndW,
My data is like this..

Col1
10
10
10
20
20
30
30
30
30

I should get the total no. of groups for this column. I mean in this case i should get 3.

ray.wurlod,
Do you mean to say that, i should do groupwise counting in first one and total groups counting in the second aggregator?? I this case in the second aggregator if i use the 'grouping column' as col1 and use 'type of calucation' as 'row count', the job is aborting as the single aggregator will not do grouping and calucation on the same column.

kumar_s,
i did the same thing but it needed 2 additional transformers which i want to reduce. So, i looking for other way around. I will try your second suggestion using sort stage...
Thanks and Regards!!
dspxlearn
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

If that is all you need then a simple transform stage and not an aggregator is sufficient for you.
Create a stage variable "GroupCount" with a derivation of "IF In.Col1=LastGroup THEN GroupCount += 1 ELSE GroupCount" and an initial value of 0
Creat another stage variable after that called "LastGroup" with a derivation of "In.Col1".

Output the value of GroupCount with each row.
mpouet
Participant
Posts: 34
Joined: Mon Oct 18, 2004 10:23 am
Location: France

Post by mpouet »

Hi,

I think I would sort and "Remove Duplicates" (partitioned) to keep the distinct group values, and then count (sequential) the number of groups with @inrownum or an aggregator...

Matthieu.
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

I just noticed that I posted a SERVER response in the EE forum. Ignore my previous post!

Matthieu's response is a good approach in a PX job.
kumar_s
Charter Member
Charter Member
Posts: 5245
Joined: Thu Jun 16, 2005 11:00 pm

Post by kumar_s »

You can accomplish with single sort stage with keyValueChange option. Assigning the col1 as key. (With the same as hash partitioned)
Impossible doesn't mean 'it is not possible' actually means... 'NOBODY HAS DONE IT SO FAR'
mpouet
Participant
Posts: 34
Joined: Mon Oct 18, 2004 10:23 am
Location: France

Post by mpouet »

Hi,

If your are working with a "fixed width column" flat file you can also use this unix command line :
cut -cx-y file | sort -u | wc -l > tempfile

x : starting character
y : ending character
file : your file's name (better with the path)
tempfile : file to store the number of groups (better with the path)

This is not a Datastage solution, but it works well. It can be a good solution if you don't have any transformation.

Matthieu.
kumar_s
Charter Member
Charter Member
Posts: 5245
Joined: Thu Jun 16, 2005 11:00 pm

Post by kumar_s »

Delimeted file shouldnt stop you from this aproach.
cut -d 'x' n file | sort -u | wc -l > tempfile.
Where x is the delimeter used. And n is the required nth column.
Impossible doesn't mean 'it is not possible' actually means... 'NOBODY HAS DONE IT SO FAR'
mpouet
Participant
Posts: 34
Joined: Mon Oct 18, 2004 10:23 am
Location: France

Post by mpouet »

Right !

Matthieu
Post Reply