Parallel extender file configuration

ashok · Post by **ashok** » Sun Aug 15, 2004 7:39 pm

Hi,
I need help regarding an issue placed in front of me, the company is using ds/390 on mainframes, they want to move to server/Clint approach, by using data stage enterprise edition, they have to transfer more than 200 million records in each job, previously their data is demoralized and now they want to normalize this data and load in to db2 UDB, to use parallel extenders how many nodes they need to have in configure file to handle this kind of data with minimum number of stages in each job.
I wish some one give me pros and cons for above problem,
Some solutions like, is it good to go for pipeline/partitioning methods, which one is better, or use both.

ray.wurlod · Post by **ray.wurlod** » Mon Aug 16, 2004 12:22 am

One configuration node is capable of 200 million rows. You didn't specify what the time window requirement was, but even with one configuration node (or, indeed, one server job), you should be able to get through this amount of data in a single-digit number of hours. The actual time would, of course, depend on hardware as well as DataStage design; I am also assuming from your description that there is minimal transformation to be performed.
You can certainly create benchmark jobs to give you some feel for what can be done.
Using partition parallelism, you split the stream of data into N streams, where N is the number of configuration nodes. PX may re-partition data on the fly, if the partitioning of a downstream stage is different from that of the upstream stage. This can be particularly useful when loading DB2, because knowledge of how DB2's partitioning works exists within the DataStage engineering group. This is why you can choose DB2 as a partitioning algorithm.
Pipeline parallelism (row buffering) will not help in jobs have no, or a minimal number if, active stages.