TOAD vs DataStage

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

arpitchopra
Participant
Posts: 35
Joined: Mon Nov 01, 2010 2:48 am
Location: New Delhi

TOAD vs DataStage

Post by arpitchopra »

Hi everyone,

I am facing a very strange problem and would be extremely grateful if any of you could help me with the solution.

I have a query which performs actions like pivoting up, aggregating (and then merging with the same data set) and finally a join. The 2 tables involved contain 30 million (main table) and 2.5 million (join) records respectively. This query takes close to 15 minutes to run on TOAD and returns 17 million records.
However, when I paste the same query inside the OCI stage and run the DS job, it takes more than 3 hrs to run !!!

Why does this happen ?
And more importantly, WHAT can I do to improve this performance ?
3 hrs for this query is unacceptable. If this remains the trend then the client will have to wait for a month to get his data :(
Arpit Chopra
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Post your job design. Have you done any tests to determine the bottleneck?

Comparing Toad to DataStage is an 'apples and oranges' discussion.
-craig

"You can never have too many knives" -- Logan Nine Fingers
Mike
Premium Member
Premium Member
Posts: 1021
Joined: Sun Mar 03, 2002 6:01 pm
Location: Tampa, FL

Post by Mike »

Also, consider that TOAD returned one screen full of records in 15 minutes. I doubt that you scrolled through all 17 million records.

Mike
Last edited by Mike on Fri Nov 12, 2010 8:27 am, edited 1 time in total.
arpitchopra
Participant
Posts: 35
Joined: Mon Nov 01, 2010 2:48 am
Location: New Delhi

Post by arpitchopra »

Job design !
I told you already:

Oracle OCI ---> Transformer -----> Oracle OCI

The only task that it performs is load data into the target table using the query in the source.
Arpit Chopra
arpitchopra
Participant
Posts: 35
Joined: Mon Nov 01, 2010 2:48 am
Location: New Delhi

Post by arpitchopra »

Oh, to clear that doubt. I ran that query with a "create table" on top of it so that I'm not viewing 500 records.
Arpit Chopra
Mike
Premium Member
Premium Member
Posts: 1021
Joined: Sun Mar 03, 2002 6:01 pm
Location: Tampa, FL

Post by Mike »

If that's all you're doing, you're probably better off leaving the data on the database server. Why drag 17 million rows of data across the network just to push them back across the network again?

If your transformer stage is doing transformation work that can't be accomplished on the database server, then traversing the network is your cost of doing those transformations.

Mike
arpitchopra
Participant
Posts: 35
Joined: Mon Nov 01, 2010 2:48 am
Location: New Delhi

Post by arpitchopra »

Mike,
That is exactly what I want to do. But through DataStage. This is one of the steps in the entire ETL design being developed using DS. How can I achieve the thing that you mentioned (running the query on the DB itself instead of dragging data) using a DS server job ???
Arpit Chopra
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

arpitchopra wrote:Job design !
I told you already:

Oracle OCI ---> Transformer -----> Oracle OCI
And where exactly did you already post this information? :?

There are many sides to this equation and how long the select allegedly takes is only a part of it. Add a constraint to your transformer and set it to @FALSE. That will shut off the load aspect and we'll see how long just the select part takes.

While we're at it, what settings are you using in both stages? In particular, the Array Size on both ends plus the Transaction Size and Update Action of the target. All of those will affect "performance".
-craig

"You can never have too many knives" -- Logan Nine Fingers
arpitchopra
Participant
Posts: 35
Joined: Mon Nov 01, 2010 2:48 am
Location: New Delhi

Post by arpitchopra »

Source
Array Size: 10000
Prefetch Mem Setting: 1024


Target
Array size: 1
Transaction size: 0
Rows per transaction: 0

3 months into DS and I'm looking at these numbers for the 1st time. Pardon my naivety :(

But again, I'm not performing any task in the transformer. Isn't there a feature in DS which is similar to "Pushdown Automation" in Informatica ?
Using which I can actually perform all these actions on the DB itself without transferring 17 mil rows over the n/w...
Arpit Chopra
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Your target array size is killing you. Did you try the @FALSE suggestion? I'd still be curious what your 'Update Action' is but in the meantime bump up your target array size to something larger so you are not sending one record at a time across the network to the database. Perhaps 1000 to start with.
arpitchopra wrote:But again, I'm not performing any task in the transformer. Isn't there a feature in DS which is similar to "Pushdown Automation" in Informatica ? Using which I can actually perform all these actions on the DB itself without transferring 17 mil rows over the n/w...
Not in the Server product and not in your version. I've heard PX 8.5 may have that option, not sure if that is the case or not.
-craig

"You can never have too many knives" -- Logan Nine Fingers
HariK
Participant
Posts: 68
Joined: Thu May 17, 2007 1:50 am

Post by HariK »

You can replace the transformer by IPC stage if there are no derivations involved Or set the inter process property on Job properties.

"Pushdown Automation" --> Work around would be to create a job which reads some constant from dual table and load to a file. In the before or after SQL of source stage use the query you used on toad. But you would never know how many records are proccessed.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

IPC is no magic bullet and you'll still end up with a transformer in the job. Best to learn what those OCI settings mean and control them properly.

http://it.toolbox.com/blogs/infosphere/ ... e-85-42312

Reason #10 is the 'Balanced Optimizer', the equivalent of your 'pushdown automation'.
-craig

"You can never have too many knives" -- Logan Nine Fingers
arpitchopra
Participant
Posts: 35
Joined: Mon Nov 01, 2010 2:48 am
Location: New Delhi

Post by arpitchopra »

Thanks a lot guys. I increased the target and source array size and memory prefetch values and that did the trick. The job runs in 30 mins or so now.

Thanks again for your help.
Arpit Chopra
arpitchopra
Participant
Posts: 35
Joined: Mon Nov 01, 2010 2:48 am
Location: New Delhi

Post by arpitchopra »

Hey guys,

First of all, apologies for re-opening a resolved thread. But I had to do it (or reference this thread in the new post) since I'm getting a very similar (actually the same) problem in another one of my jobs.

Here is the job design:

Oracle OCI ------> Transformer --------> Oracle OCI

The source has a custom SQL which returns the complete result set in 18 mins (I checked this by adding the @FALSE constraint in the transformer). The source stats are given below:

Array Size: 20,000 (What is the upper limit on this size ?)
Prefetch memory setting: 10,000

The transformer provides value to a column using a parameter's value.

The target performs a truncate/insert load on another table. The target stats are given below:

Array Size: 20,000
Transaction Size: 10,000
Rows per transaction: 10,000

When I run this entire job, it takes 55 mins (triple the amount taken to select the data from the source).
Why such strange behavior ?? Why, DS why... Why do you do this to me !?
Arpit Chopra
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Because you - YOU - are setting your array sizes ridiculously high. The "sweet spot" is usually low thousands, even just hundreds, depending on your row sizes.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply