In my job I have to sum data in 120 columns grouping by 10 other columns. Total number of rows to aggregate is about 5-6 millions. All rows are sorted and partitioned in sort stage before aggregation. But aggregation performance still very low on Aggregator stage - only 2000-3000 rows/sec
I tried to use 5 and 8 node in configuration files, but this didn't significantly affect the performance. And it's strange to me, but we have only 20-30% CPU usage while running this job.
Without Aggregator stage we have excellent performance on this job - reading from datasets, sorting, joining, filtering, output to file etc. are very fast.
Maybe there are some project parameters or other for increase performance of aggregation?
How to improve aggregator performance?
Moderators: chulett, rschirm, roy
-
- Participant
- Posts: 2
- Joined: Thu Dec 10, 2009 3:19 am
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
-
- Participant
- Posts: 2
- Joined: Thu Dec 10, 2009 3:19 am
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Ignore rows/sec as a metric of performance because they are meaningless on the output of an Aggregator stage; the clock is running during all of the wait time while rows are coming in.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
-
- Participant
- Posts: 14
- Joined: Wed Nov 11, 2009 4:52 am
- Location: New York
-
- Premium Member
- Posts: 1735
- Joined: Thu Mar 01, 2007 5:44 am
- Location: Troy, MI
Please don't use SMS/text style words as this is not a mobile phone.sridinesh2009 wrote:in aggregator stage use this option... ur performance may increase
METHOD=Hash
As the OP said he has 5-6 million records to aggregate and Hash method is only used when number of records are less. In case you set method to hash it starts thowing warning when number of records reaches 16K mark. Also there are other implications when hash grows beyond a level.
Priyadarshi Kunal
Genius may have its limitations, but stupidity is not thus handicapped.
Genius may have its limitations, but stupidity is not thus handicapped.