Parallel Extender usage

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
sathyanveshi
Participant
Posts: 66
Joined: Tue Dec 07, 2004 12:48 pm

Parallel Extender usage

Post by sathyanveshi »

Hi,

I need to load a Oracle table of 20 million rows to another Oracle table. My server is a 4 CPU machine. May I know how the use of Parallel Extender (or Enterrpise Edition) will be advantageous? I need to know how the rows are processed to impress upon the need for using the Enterprise Edition.

Cheers,
Mohan
nkumar_home
Participant
Posts: 19
Joined: Fri Apr 02, 2004 10:13 am

Post by nkumar_home »

PX is real fast but depends on how many processors you have set your PX config file to use. It tends to take over a majority of the CPU utilization for those processors. If you are using PX with just 4 processors then you should design your batch so that relatively very few jobs run during the PX jobs. Have you tried partiton / parallel inserts directly using sql. You can put the sql in the "before sql" portion of the datastage job.

We have found PX very useful for insert/update jobs involving much more than 20 millions rows. We have about 20 PX jobs out of 300 total datastage jobs but we have the luxury a HP superdome with 24 processors on production.
sathyanveshi
Participant
Posts: 66
Joined: Tue Dec 07, 2004 12:48 pm

Re: Parallel Extender usage

Post by sathyanveshi »

Hi,

Thanks for the response. I need to know the approximate time taken to load the say 10 GB of data using Parallel Extender with 8 CPUs? I also need to know the technical details of how Parallel Extender executes a job. What does it do when it has to run a job that loads 20 million rows of data using 8 CPUs? Can you please answer all the above questions?

Cheers,
Mohan
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

You can not estimate time without knowing how large the rows are and whether there are other factors influencing throughput such as network bandwidth, number of Oracle listeners, parallelism within the query, other locking operations on the target table and so on.

This is a "how long is a piece of string?" question.

Once you've run it, you'll be able to answer the question, but only for the exact conditions under which that run was performed.

Some technical details are to be found in the Parallel Job Developer's Guide. Execution model of the engine is only found in Orchestrate documentation, which is no longer available.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
T42
Participant
Posts: 499
Joined: Thu Nov 11, 2004 6:45 pm

Post by T42 »

It also depends on what you want to do with the data between the read and load.

It also depends on how much memory you have.

It also depends on how fast your disk I/O is.

It also depends on how fast your network I/O is.

PX will use whatever that is provided for it. But for us to give you some hard numbers would either entice you to unduely high expectations, or low expectations based on what you have and is willing to invest in.
sathyanveshi
Participant
Posts: 66
Joined: Tue Dec 07, 2004 12:48 pm

Post by sathyanveshi »

Hi,

We are doing a table to table load....it's a direct load from a source table to a target table. There are no transformations in the interim. Also, the target table has no database constraints like Primary Key and Not Null. But the only thing is that we are embarking on a table to table load instead of table to sequential file to table. Our production environment is on Sun Solaris 15K, 16 GB memory, 8 CPU server, 548 SAN, 1 Gbit Fibre optic.

Given the above, can you tell me the average time taken to load 5 GB of data.

Have a ncie day...

Cheers,
Mohan
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

No.

There are too many variables, such as row size, that you haven't revealed.

You won't do it in seconds, but it won't take days either.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
sathyanveshi
Participant
Posts: 66
Joined: Tue Dec 07, 2004 12:48 pm

Post by sathyanveshi »

Hi Ray and T42,

Let me know the variables/parameters that you need and I shall provide them. I need some approximate estimates.

Cheers,
Mohan
T42
Participant
Posts: 499
Joined: Thu Nov 11, 2004 6:45 pm

Post by T42 »

Also, where are the databases?

To be honest, if both tables are on the same schema, on the same server, and that server is separate from the PX server, it would definitely be advantageous to do a simple SQL query to handle the table-to-table load.

Just throw together a job and run the test on your own. Calculate the number of bytes being thrown around, and investigate the type of network/disk you have, and its I/O rate. Observe the rates using an investigation tool provided by your operating system (or third party).

Not only will you learn a lot about your system, you also will know whether or not PX is being optimized for your system. We can't do that for you without being contracted.

Sorry, I'm not available.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

I'm fully booked through to March.

What you're seeking, and what we've made clear from our answers thus far to be a very difficult task to perform and requiring expert analysis, is not something that you would ordinarily expect to get for free. Would someone (some expert) volunteer to tune your database configuration?
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply