What is Parallel Extender ?

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
girishoak
Participant
Posts: 65
Joined: Wed Oct 29, 2003 3:54 am

What is Parallel Extender ?

Post by girishoak »

Hi,

I am new to datastage. I came across Parallel Extender many times in this forum. What is this Parallel Extender :?: what is its use and benefit :?:

Thanks
Girish Oak
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

DataStage is three ETL products in one.

Server jobs (with which you are becoming involved) generate BASIC and execute in the DataStage Engine environment.

Parallel Extender is the result of Ascential's acquisition of Torrent Systems and their Orchestrate technology, which allows really huge volumes of data to be processed by cloning parallel execution over N "processing nodes", which can be CPUs in a symmetric multiprocessor (SMP) system, or even a group of machines in a massively parallel processing (MPP) environment.

The mainframe edition of DataStage generates COBOL source code, and job control language (JCL) scripts for compiling and running that code on a mainframe.

Check out Ascential's web site (http://www.ascentialsoftware.com) for longer descriptions, benchmark results, and so on.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Teej
Participant
Posts: 677
Joined: Fri Aug 08, 2003 9:26 am
Location: USA

Re: What is Parallel Extender ?

Post by Teej »

girishoak wrote:Hi,

I am new to datastage. I came across Parallel Extender many times in this forum. What is this Parallel Extender :?: what is its use and benefit :?:

Thanks
Girish Oak
It is the answer to the question:

"What is something that is pretty cool although raw, and would allow me to make my resume so attractive to others?"

Or perhaps using Steve Jobs' quote:

"The World Fastest Personal ETL Tool"

Seriously, Ray does a nice job covering PX. One thing to note, although Ascential is brining the Server and PX together over multiple releases, it still differs significantly in several core areas. Instead of doing the Transform stage for practically everything, it is all broken down in multiple stages in PX, especially for 7.0. Instead of doing Lookup in Transform, you have the Lookup stage. Merge and Join are active stages.

The hardest thing about PX is the optimization of your job. Instead of doing whatever you want, you will be obsessed with figuring out how to squeeze another 20% performance boost out of your job. This is the reason why you, and everyone else should be looking forward to 7.0.1 -- it's our fault. Really. ;-) We discovered a serious performance problem when it comes to large dataset on lookup stages, so we reported it to Ascential. After a lot of mangling and tumbling across the ground beating each other up, we both were able to identify a few things that will get improved on 7.0.1, making the Lookup stage much faster if you use Varchar() fields, and if you have large datasets, among other things.

Performance-wise, it is a dramatic boost in performance. Case in point (actual job writing from an Oracle table straight to fixed length flat file for a Feed program):

Server: Oracle [bulk load] -> Transform -> flat file: Averages about 3,000 rows per second. 25 - 30 minutes run.

Parallel: Oracle -> Transform -> flat file: Averages about 45,000 rows per second. 1 1/2 - 2 1/2 minutes run. I can probably get this part a lot faster if Oracle wasn't so busy at that time, and we get faster SAN disks. I know a pure Oracle -> Peek stage run will produce over 300,000 rows per second. And that is also for 6.0.1. I'm still waiting for 7.0.1 to see how much of better performance it will be.

It's fast. It's tricky to keep fast. It requires people to really understand their system resources, and willingness to test different solutions. It requires an even more careful design process to minimize repeated performance, and repeated use of the database (especially update tasks!) It requires a good System Administrator to optimize the scratch space and configuration files for that extra 2-5% boost in performance.

It also require patience among the upper management who may have been oversold the benefits without knowing anything about the shortcomings (i.e. crappy workers = crappy design = crappy jobs = crappy performance, no matter what PX promises.)

As Ken Bland said on a different post -- it's a tool that does not replace due diligence on your design. In fact, it increase the relevance of due diligence on your design.

-T.J.

P.S. Apologies for the grammar and spelling that may have gotten by me on this post.
Developer of DataStage Parallel Engine (Orchestrate).
Post Reply