grouping
Moderators: chulett, rschirm, roy
grouping
Hi all,
I have a requirement where, i have some groups coming up from the source and i have to capture the number of groups and pass the count further. I believe we cannot do this in aggregator because it will count the no of records for each group.
Ex:
Col1:- 10,10,10,10
col1:- 20,20
Col1:- 30,30,30
I want the value as 3 as there are 3 groups.But when we use aggregator by using 'group by column' as col1 and using 'type of calucation as = row count' i will get 4,2,3.
So, how do i achieve this.
I have a requirement where, i have some groups coming up from the source and i have to capture the number of groups and pass the count further. I believe we cannot do this in aggregator because it will count the no of records for each group.
Ex:
Col1:- 10,10,10,10
col1:- 20,20
Col1:- 30,30,30
I want the value as 3 as there are 3 groups.But when we use aggregator by using 'group by column' as col1 and using 'type of calucation as = row count' i will get 4,2,3.
So, how do i achieve this.
Thanks and Regards!!
dspxlearn
dspxlearn
The sample data you have listed is somewhat unclear - you have 3 rows with differing number of columns. A "group" in this context is normally a set of records where one column contains the same value. Is it "Col1:-" in this case? Or have you displayed only values for "Col1"?
Could you explain just a bit in detail what your data values are and what your "groups" are; I think you are not trying to do something overly complex and it can be done in an aggregator stage or even with a transform stage.
Could you explain just a bit in detail what your data values are and what your "groups" are; I think you are not trying to do something overly complex and it can be done in an aggregator stage or even with a transform stage.
<a href=http://www.worldcommunitygrid.org/team/ ... TZ9H4CGVP1 target="WCGWin">
</a>
</a>
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Or with the same Aggregator stage, with additional Dummy column with a value '1' and sum it up.
Or with the sort stage with couple of columns defined as Key and use KeyValueChange option to differentiate a group.
Or with the sort stage with couple of columns defined as Key and use KeyValueChange option to differentiate a group.
Impossible doesn't mean 'it is not possible' actually means... 'NOBODY HAS DONE IT SO FAR'
Hi thanks for your replies..
ArndW,
My data is like this..
Col1
10
10
10
20
20
30
30
30
30
I should get the total no. of groups for this column. I mean in this case i should get 3.
ray.wurlod,
Do you mean to say that, i should do groupwise counting in first one and total groups counting in the second aggregator?? I this case in the second aggregator if i use the 'grouping column' as col1 and use 'type of calucation' as 'row count', the job is aborting as the single aggregator will not do grouping and calucation on the same column.
kumar_s,
i did the same thing but it needed 2 additional transformers which i want to reduce. So, i looking for other way around. I will try your second suggestion using sort stage...
ArndW,
My data is like this..
Col1
10
10
10
20
20
30
30
30
30
I should get the total no. of groups for this column. I mean in this case i should get 3.
ray.wurlod,
Do you mean to say that, i should do groupwise counting in first one and total groups counting in the second aggregator?? I this case in the second aggregator if i use the 'grouping column' as col1 and use 'type of calucation' as 'row count', the job is aborting as the single aggregator will not do grouping and calucation on the same column.
kumar_s,
i did the same thing but it needed 2 additional transformers which i want to reduce. So, i looking for other way around. I will try your second suggestion using sort stage...
Thanks and Regards!!
dspxlearn
dspxlearn
If that is all you need then a simple transform stage and not an aggregator is sufficient for you.
Create a stage variable "GroupCount" with a derivation of "IF In.Col1=LastGroup THEN GroupCount += 1 ELSE GroupCount" and an initial value of 0
Creat another stage variable after that called "LastGroup" with a derivation of "In.Col1".
Output the value of GroupCount with each row.
Create a stage variable "GroupCount" with a derivation of "IF In.Col1=LastGroup THEN GroupCount += 1 ELSE GroupCount" and an initial value of 0
Creat another stage variable after that called "LastGroup" with a derivation of "In.Col1".
Output the value of GroupCount with each row.
<a href=http://www.worldcommunitygrid.org/team/ ... TZ9H4CGVP1 target="WCGWin">
</a>
</a>
I just noticed that I posted a SERVER response in the EE forum. Ignore my previous post!
Matthieu's response is a good approach in a PX job.
Matthieu's response is a good approach in a PX job.
<a href=http://www.worldcommunitygrid.org/team/ ... TZ9H4CGVP1 target="WCGWin">
</a>
</a>
Hi,
If your are working with a "fixed width column" flat file you can also use this unix command line :
cut -cx-y file | sort -u | wc -l > tempfile
x : starting character
y : ending character
file : your file's name (better with the path)
tempfile : file to store the number of groups (better with the path)
This is not a Datastage solution, but it works well. It can be a good solution if you don't have any transformation.
Matthieu.
If your are working with a "fixed width column" flat file you can also use this unix command line :
cut -cx-y file | sort -u | wc -l > tempfile
x : starting character
y : ending character
file : your file's name (better with the path)
tempfile : file to store the number of groups (better with the path)
This is not a Datastage solution, but it works well. It can be a good solution if you don't have any transformation.
Matthieu.