Parallel extender file configuration

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
ashok
Participant
Posts: 43
Joined: Tue Jun 22, 2004 3:04 pm

Parallel extender file configuration

Post by ashok »

Hi,
I need help regarding an issue placed in front of me, the company is using ds/390 on mainframes, they want to move to server/Clint approach, by using data stage enterprise edition, they have to transfer more than 200 million records in each job, previously their data is demoralized and now they want to normalize this data and load in to db2 UDB, to use parallel extenders how many nodes they need to have in configure file to handle this kind of data with minimum number of stages in each job.
I wish some one give me pros and cons for above problem,
Some solutions like, is it good to go for pipeline/partitioning methods, which one is better, or use both.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

One configuration node is capable of 200 million rows. You didn't specify what the time window requirement was, but even with one configuration node (or, indeed, one server job), you should be able to get through this amount of data in a single-digit number of hours. The actual time would, of course, depend on hardware as well as DataStage design; I am also assuming from your description that there is minimal transformation to be performed.
You can certainly create benchmark jobs to give you some feel for what can be done.
Using partition parallelism, you split the stream of data into N streams, where N is the number of configuration nodes. PX may re-partition data on the fly, if the partitioning of a downstream stage is different from that of the upstream stage. This can be particularly useful when loading DB2, because knowledge of how DB2's partitioning works exists within the DataStage engineering group. This is why you can choose DB2 as a partitioning algorithm.
Pipeline parallelism (row buffering) will not help in jobs have no, or a minimal number if, active stages.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply