TOAD vs DataStage
Moderators: chulett, rschirm, roy
-
- Participant
- Posts: 35
- Joined: Mon Nov 01, 2010 2:48 am
- Location: New Delhi
TOAD vs DataStage
Hi everyone,
I am facing a very strange problem and would be extremely grateful if any of you could help me with the solution.
I have a query which performs actions like pivoting up, aggregating (and then merging with the same data set) and finally a join. The 2 tables involved contain 30 million (main table) and 2.5 million (join) records respectively. This query takes close to 15 minutes to run on TOAD and returns 17 million records.
However, when I paste the same query inside the OCI stage and run the DS job, it takes more than 3 hrs to run !!!
Why does this happen ?
And more importantly, WHAT can I do to improve this performance ?
3 hrs for this query is unacceptable. If this remains the trend then the client will have to wait for a month to get his data
I am facing a very strange problem and would be extremely grateful if any of you could help me with the solution.
I have a query which performs actions like pivoting up, aggregating (and then merging with the same data set) and finally a join. The 2 tables involved contain 30 million (main table) and 2.5 million (join) records respectively. This query takes close to 15 minutes to run on TOAD and returns 17 million records.
However, when I paste the same query inside the OCI stage and run the DS job, it takes more than 3 hrs to run !!!
Why does this happen ?
And more importantly, WHAT can I do to improve this performance ?
3 hrs for this query is unacceptable. If this remains the trend then the client will have to wait for a month to get his data
Arpit Chopra
Also, consider that TOAD returned one screen full of records in 15 minutes. I doubt that you scrolled through all 17 million records.
Mike
Mike
Last edited by Mike on Fri Nov 12, 2010 8:27 am, edited 1 time in total.
-
- Participant
- Posts: 35
- Joined: Mon Nov 01, 2010 2:48 am
- Location: New Delhi
-
- Participant
- Posts: 35
- Joined: Mon Nov 01, 2010 2:48 am
- Location: New Delhi
If that's all you're doing, you're probably better off leaving the data on the database server. Why drag 17 million rows of data across the network just to push them back across the network again?
If your transformer stage is doing transformation work that can't be accomplished on the database server, then traversing the network is your cost of doing those transformations.
Mike
If your transformer stage is doing transformation work that can't be accomplished on the database server, then traversing the network is your cost of doing those transformations.
Mike
-
- Participant
- Posts: 35
- Joined: Mon Nov 01, 2010 2:48 am
- Location: New Delhi
And where exactly did you already post this information?arpitchopra wrote:Job design !
I told you already:
Oracle OCI ---> Transformer -----> Oracle OCI
There are many sides to this equation and how long the select allegedly takes is only a part of it. Add a constraint to your transformer and set it to @FALSE. That will shut off the load aspect and we'll see how long just the select part takes.
While we're at it, what settings are you using in both stages? In particular, the Array Size on both ends plus the Transaction Size and Update Action of the target. All of those will affect "performance".
-craig
"You can never have too many knives" -- Logan Nine Fingers
"You can never have too many knives" -- Logan Nine Fingers
-
- Participant
- Posts: 35
- Joined: Mon Nov 01, 2010 2:48 am
- Location: New Delhi
Source
Array Size: 10000
Prefetch Mem Setting: 1024
Target
Array size: 1
Transaction size: 0
Rows per transaction: 0
3 months into DS and I'm looking at these numbers for the 1st time. Pardon my naivety
But again, I'm not performing any task in the transformer. Isn't there a feature in DS which is similar to "Pushdown Automation" in Informatica ?
Using which I can actually perform all these actions on the DB itself without transferring 17 mil rows over the n/w...
Array Size: 10000
Prefetch Mem Setting: 1024
Target
Array size: 1
Transaction size: 0
Rows per transaction: 0
3 months into DS and I'm looking at these numbers for the 1st time. Pardon my naivety
But again, I'm not performing any task in the transformer. Isn't there a feature in DS which is similar to "Pushdown Automation" in Informatica ?
Using which I can actually perform all these actions on the DB itself without transferring 17 mil rows over the n/w...
Arpit Chopra
Your target array size is killing you. Did you try the @FALSE suggestion? I'd still be curious what your 'Update Action' is but in the meantime bump up your target array size to something larger so you are not sending one record at a time across the network to the database. Perhaps 1000 to start with.
Not in the Server product and not in your version. I've heard PX 8.5 may have that option, not sure if that is the case or not.arpitchopra wrote:But again, I'm not performing any task in the transformer. Isn't there a feature in DS which is similar to "Pushdown Automation" in Informatica ? Using which I can actually perform all these actions on the DB itself without transferring 17 mil rows over the n/w...
-craig
"You can never have too many knives" -- Logan Nine Fingers
"You can never have too many knives" -- Logan Nine Fingers
You can replace the transformer by IPC stage if there are no derivations involved Or set the inter process property on Job properties.
"Pushdown Automation" --> Work around would be to create a job which reads some constant from dual table and load to a file. In the before or after SQL of source stage use the query you used on toad. But you would never know how many records are proccessed.
"Pushdown Automation" --> Work around would be to create a job which reads some constant from dual table and load to a file. In the before or after SQL of source stage use the query you used on toad. But you would never know how many records are proccessed.
IPC is no magic bullet and you'll still end up with a transformer in the job. Best to learn what those OCI settings mean and control them properly.
http://it.toolbox.com/blogs/infosphere/ ... e-85-42312
Reason #10 is the 'Balanced Optimizer', the equivalent of your 'pushdown automation'.
http://it.toolbox.com/blogs/infosphere/ ... e-85-42312
Reason #10 is the 'Balanced Optimizer', the equivalent of your 'pushdown automation'.
-craig
"You can never have too many knives" -- Logan Nine Fingers
"You can never have too many knives" -- Logan Nine Fingers
-
- Participant
- Posts: 35
- Joined: Mon Nov 01, 2010 2:48 am
- Location: New Delhi
-
- Participant
- Posts: 35
- Joined: Mon Nov 01, 2010 2:48 am
- Location: New Delhi
Hey guys,
First of all, apologies for re-opening a resolved thread. But I had to do it (or reference this thread in the new post) since I'm getting a very similar (actually the same) problem in another one of my jobs.
Here is the job design:
Oracle OCI ------> Transformer --------> Oracle OCI
The source has a custom SQL which returns the complete result set in 18 mins (I checked this by adding the @FALSE constraint in the transformer). The source stats are given below:
Array Size: 20,000 (What is the upper limit on this size ?)
Prefetch memory setting: 10,000
The transformer provides value to a column using a parameter's value.
The target performs a truncate/insert load on another table. The target stats are given below:
Array Size: 20,000
Transaction Size: 10,000
Rows per transaction: 10,000
When I run this entire job, it takes 55 mins (triple the amount taken to select the data from the source).
Why such strange behavior ?? Why, DS why... Why do you do this to me !?
First of all, apologies for re-opening a resolved thread. But I had to do it (or reference this thread in the new post) since I'm getting a very similar (actually the same) problem in another one of my jobs.
Here is the job design:
Oracle OCI ------> Transformer --------> Oracle OCI
The source has a custom SQL which returns the complete result set in 18 mins (I checked this by adding the @FALSE constraint in the transformer). The source stats are given below:
Array Size: 20,000 (What is the upper limit on this size ?)
Prefetch memory setting: 10,000
The transformer provides value to a column using a parameter's value.
The target performs a truncate/insert load on another table. The target stats are given below:
Array Size: 20,000
Transaction Size: 10,000
Rows per transaction: 10,000
When I run this entire job, it takes 55 mins (triple the amount taken to select the data from the source).
Why such strange behavior ?? Why, DS why... Why do you do this to me !?
Arpit Chopra
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Because you - YOU - are setting your array sizes ridiculously high. The "sweet spot" is usually low thousands, even just hundreds, depending on your row sizes.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.