Pipeline Parallelism

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
bks_prasad
Participant
Posts: 10
Joined: Thu Mar 18, 2004 12:08 am

Pipeline Parallelism

Post by bks_prasad »

Hi All,
How to define pipeline parallelism explicitely????Is there any way to control pipe line parallelism explicitely like partition parallelism?


Thanks in Advance
Regards
Prasad
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

A simple explanation of pipeline parallelism is the ability for a downstream stage to begin processing a row as soon as an upstream stage has finished processing that row (rather than processing one row completely through the job before beginning the next row).

Pipeline parallelism is managed in parallel jobs automatically.

In server jobs you have the choice of employing or not employing row buffering, or of using an IPC (inter process communication) stage, or using a passive stage type. In each case, the idea is to introduce a process boundary, so that multiple processes can process the rows, and to provide some kind of buffering mechanism so that the rows can be passed between the processes.

The best place to look is Chapter 2 of the Server Job Developer's Guide, where these concepts are discussed in detail. A brief summary of what pipeline and partition parallelism are is in Chapter 2 of the Parallel Job Developer's Guide.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
bks_prasad
Participant
Posts: 10
Joined: Thu Mar 18, 2004 12:08 am

Re: Pipeline Parallelism

Post by bks_prasad »

Hi ray,
Thank you very much for your response,now i have got clear idea
about pipeline parallelism.
But i have some clarifications on partition parallelism.
Suppose If I choose "Round Robin" partition method and
if I select Node pool and resource constraints to a
specific pool lets say "pool1" which contains 1 processing node.
In this scenario Data will be partitioned into how many partitions??
I am using OracleEnterprise Stage

Thanks in Advance

Prasad
bks_prasad
Participant
Posts: 10
Joined: Thu Mar 18, 2004 12:08 am

Re: Pipeline Parallelism

Post by bks_prasad »

Hi ray,
Thank you very much for your response,now i have got clear idea
about pipeline parallelism.
But i have some clarifications on partition parallelism.
Suppose If I choose "Round Robin" partition method and
if I select Node pool and resource constraints to a
specific pool lets say "pool1" which contains 1 processing node.
In this scenario Data will be partitioned into how many partitions??
I am using OracleEnterprise Stage

Thanks in Advance

Prasad
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

That's easy. If you have one processing node, then you have only one processing node, and no partitioning of the data will take place. The round robin algorithm will place the first row onto node 1 (of 1), then the second row onto node 1 (of 1), and so on.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply