DSXchange: DataStage and IBM Websphere Data Integration Forum
View next topic
View previous topic
Add To Favorites
Author Message
perspicax



Group memberships:
Premium Members

Joined: 07 Dec 2017
Posts: 21
Location: USA
Points: 202

Post Posted: Mon Aug 20, 2018 5:14 pm Reply with quote    Back to top    

DataStage® Release: 11x
Job Type: Parallel
OS: Unix
Is pipe-lining possible when aggregator stage is used. For instance, I have a job using aggregator, that sums up colD by grouping 3 input columns, ColA, ColB, ColC, so before aggregator start spitting the rows out, it waits for all the input rows to be read.

Is there a way to make aggregator stage producing output even before all the input data is read?
chulett

Premium Poster


since January 2006

Group memberships:
Premium Members, Inner Circle, Server to Parallel Transition Group

Joined: 12 Nov 2002
Posts: 42944
Location: Denver, CO
Points: 221471

Post Posted: Mon Aug 20, 2018 7:02 pm Reply with quote    Back to top    

Sort the data to support the aggregation and then assert that it is sorted in the stage. It will bust you if you lie, be forewarned. Wink

_________________
-craig

The Old Ones were, the Old Ones are, and the Old Ones shall be. Not in the spaces we know, but between them. They walk serene and primal, undimensioned and to us unseen.
Rate this response:  
Not yet rated
ray.wurlod

Premium Poster
Participant

Group memberships:
Premium Members, Inner Circle, Australia Usergroup, Server to Parallel Transition Group

Joined: 23 Oct 2002
Posts: 54514
Location: Sydney, Australia
Points: 295617

Post Posted: Mon Aug 20, 2018 9:13 pm Reply with quote    Back to top    

What Craig said. The reason this "pipelines" the data is that the Aggregator stage does not have to build a table in memory to accumulate all the results. It can processing a single grouping key ...

_________________
RXP Services Ltd
Melbourne | Canberra | Sydney | Hong Kong | Hobart | Brisbane
Rate this response:  
Not yet rated
perspicax



Group memberships:
Premium Members

Joined: 07 Dec 2017
Posts: 21
Location: USA
Points: 202

Post Posted: Tue Aug 21, 2018 1:58 pm Reply with quote    Back to top    

Thanks. It worked. Earlier, the sort order was different than order of group key column. I selected partition type 'Same'. I am not sure what it means. 'Auto' and 'Entire' didn't work. What is the recommended partition type?
Rate this response:  
Not yet rated
chulett

Premium Poster


since January 2006

Group memberships:
Premium Members, Inner Circle, Server to Parallel Transition Group

Joined: 12 Nov 2002
Posts: 42944
Location: Denver, CO
Points: 221471

Post Posted: Tue Aug 21, 2018 6:40 pm Reply with quote    Back to top    

I'm not sure there's a "recommended" so much as a default one but others can address that. The partitioning types need to be something you become very familiar with unless you only ever run your jobs on a single node. The moment you open that up to more than one node, how the data is partitioned can make or break things. Simplest case is it runs longer that it should. Worst case, your output data is wrong or incomplete. Or the job goes boom. Wink

Perhaps this will help.

_________________
-craig

The Old Ones were, the Old Ones are, and the Old Ones shall be. Not in the spaces we know, but between them. They walk serene and primal, undimensioned and to us unseen.
Rate this response:  
Not yet rated
ray.wurlod

Premium Poster
Participant

Group memberships:
Premium Members, Inner Circle, Australia Usergroup, Server to Parallel Transition Group

Joined: 23 Oct 2002
Posts: 54514
Location: Sydney, Australia
Points: 295617

Post Posted: Tue Aug 21, 2018 9:39 pm Reply with quote    Back to top    

Same is recommended iff the upstream stage:is executing in parallel mode is executing on the same nodes is partitioned using the correct algorithm In the case of an Aggregator stage I would typi ...

_________________
RXP Services Ltd
Melbourne | Canberra | Sydney | Hong Kong | Hobart | Brisbane
Rate this response:  
Not yet rated
perspicax



Group memberships:
Premium Members

Joined: 07 Dec 2017
Posts: 21
Location: USA
Points: 202

Post Posted: Mon Aug 27, 2018 10:41 am Reply with quote    Back to top    

Yes the job is executing on 6 nodes with all the stages running on the same nodes. In this eg, the upstream stage is a transformer stage. The input stage is a Oracle stage where partitioned read is enabled (with default rowid range).

The group is performed on Year(kind of redundant because we only process current year and target inserts into current year partition only), month, id1, id2...Id10. That forms the grain of the target table. So Year is the first grouping key and sorting is in the same as grouping key order.

I did get a significant performance improvement with 'same' partition, aggregating 26 M rows in less than 4 min. But when I specified 'Hash' or 'Modulus', I do not see the pipe-lining and the job takes long time to complete
Rate this response:  
Not yet rated
ray.wurlod

Premium Poster
Participant

Group memberships:
Premium Members, Inner Circle, Australia Usergroup, Server to Parallel Transition Group

Joined: 23 Oct 2002
Posts: 54514
Location: Sydney, Australia
Points: 295617

Post Posted: Mon Aug 27, 2018 7:58 pm Reply with quote    Back to top    

How is the upstream Transformer stage partitioned?

_________________
RXP Services Ltd
Melbourne | Canberra | Sydney | Hong Kong | Hobart | Brisbane
Rate this response:  
Not yet rated
perspicax



Group memberships:
Premium Members

Joined: 07 Dec 2017
Posts: 21
Location: USA
Points: 202

Post Posted: Mon Aug 27, 2018 10:15 pm Reply with quote    Back to top    

The transformer is set to auto. So if 'hash' or 'modulus' is selected in Agg stage then the same partition type should be set in the transformer? I will change the settings and report back.

Thanks
Rate this response:  
Not yet rated
ray.wurlod

Premium Poster
Participant

Group memberships:
Premium Members, Inner Circle, Australia Usergroup, Server to Parallel Transition Group

Joined: 23 Oct 2002
Posts: 54514
Location: Sydney, Australia
Points: 295617

Post Posted: Tue Aug 28, 2018 2:09 am Reply with quote    Back to top    

That would depend on what partitioning is used upstream of the Transformer stage. Your answer suggests an unfamiliarity with what partitioning does and achieves. Is there any explicit partitioning u ...

_________________
RXP Services Ltd
Melbourne | Canberra | Sydney | Hong Kong | Hobart | Brisbane
Rate this response:  
Not yet rated
chulett

Premium Poster


since January 2006

Group memberships:
Premium Members, Inner Circle, Server to Parallel Transition Group

Joined: 12 Nov 2002
Posts: 42944
Location: Denver, CO
Points: 221471

Post Posted: Tue Aug 28, 2018 6:26 am Reply with quote    Back to top    

chulett wrote:
Perhaps this will help.

Did you click on the included link? Partitioning is core, key knowledge required for success using the tool. IMHO.

_________________
-craig

The Old Ones were, the Old Ones are, and the Old Ones shall be. Not in the spaces we know, but between them. They walk serene and primal, undimensioned and to us unseen.
Rate this response:  
Not yet rated
Display posts from previous:       

Add To Favorites
View next topic
View previous topic
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum



Powered by phpBB © 2001, 2002 phpBB Group
Theme & Graphics by Daz :: Portal by Smartor
All times are GMT - 6 Hours