Job Performance

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
logic
Participant
Posts: 115
Joined: Thu Feb 24, 2005 10:48 am

Job Performance

Post by logic »

Hi,
Trying to Select 28M rows from an Oracle table using DRS stage with Oracle Connectivity.

The sql has joins, sub queries,sum , max.. blah blah blah. The sql runs for about 5-6 mins on DB for selecting. Same sql when run thru a DS job is processing at about 156 rps :evil: .

Here is what I have tried so far:
The job has to be designed to read from a table and write to another table on Same DB. to identify the bottleneck, removed the Transformer and write stage. With Read and write, the processing speed was about 120-150 rps. Just selecting and dumping into a file(tried both hash as well as seq)processes at around 150-160 rps. There is no major CPU utilization. Also no major processes are running on DS or DB. not much wait activity too. reads and writes for all processes are happening almost evenly.
If the sql selects data in 5-6 mins on DB, should'nt the speed of DS job with array size 1 be more for selecting and wrting to a file?

Will try partitioning table on DB or parallel processing in server job. But before i go that route have i missed something :?: because i find is rather hard to digest that just selecting and writing should take so longer as compared to running the select sql on DB :roll: .

TIA
Last edited by logic on Wed Apr 23, 2008 12:49 pm, edited 1 time in total.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Forget rows per second. The seconds includes the time spent waiting for the Oracle query to deliver the first row, as well as the time taken to process all rows. Further it will vary depending on row size and a number of other factors. Total execution time should be your metric.
The job log (the time difference between two events) will indicate how long Oracle took to deliver the first row.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

A row of data has to come out of the database, traverse the network to the DS process and then go back across the network to the database.

In addition, just running a query and timing how long it takes to get the first row returned is NO indication of how long it will take to get to the LAST row. That's the true measure for an expected runtime.

If you are moving results from one table to another within the same instance there is no faster way than to stay within Oracle and use parallel DML with DIRECt path loading. Everything else involves third party processes (network, DS processes, loading process, etc) and by definition HAS to be slower.
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
logic
Participant
Posts: 115
Joined: Thu Feb 24, 2005 10:48 am

Post by logic »

kcbland wrote:A row of data has to come out of the database, traverse the network to the DS process and then go back across the network to the database.

In addition, just running a query and timing how long it takes to get the first row returned is NO indication of how long it will take to get to the LAST row. That's the true measure for an expected runtime.

If you are moving results from one table to another within the same instance there is no faster way than to stay within Oracle and use parallel DML with DIRECt path loading. Everything else involves third party processes (network, DS processes, loading process, etc) and by definition HAS to be slower.
Thanks Ray and Ken.

Even if i disregard the rps..i agree its not a good measure of performance..the job takes around 6 hours to extract and dump in a file. I removed the table insert part and transformers to see if loading to a file will be faster. still i am getting the same jogb performance. From the log the select sql does complete in approximate same time compared to sql run directly on DB. The overall job still takes @ 6 hours to select and dump into a file.

Thanks,
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

Ok, now what is the cpu utilization on the DS server? Do you have spare cpus? How about partitioning the query and running as many job instances as partitions as spare cpus? How about a bulk load of the extracted data?
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
logic
Participant
Posts: 115
Joined: Thu Feb 24, 2005 10:48 am

Post by logic »

kcbland wrote:Ok, now what is the cpu utilization on the DS server? Do you have spare cpus? How about partitioning the query and running as many job instances as partitions as spare cpus? How about a bulk load o ...
Thanks Ken. I do have 4 CPU's and trying to do break the select query into 4 subsets to process them in parallel by using multiple insatnce jobs.

Did not try the Bulk Load because after removing the insert part and dumping staright to a file still takes 6 hours so thought insert might not be the problem.

Thanks,
dinakaran_s
Participant
Posts: 22
Joined: Wed Jul 02, 2008 7:01 am
Location: London

Post by dinakaran_s »

For this problem, Please split the data base on period or based on key value and try to extract from the database. When you try to fetch bulk of data with minimum condition from oracle. The display of the initial rows will be very quick, but to start extract it will take very long time. This is because oracle used to execute the query and try to gather all the data into temporary loaction, so it will take long time to actually extract the data.
SO i suggest you split the data based on some key and pass each key as input to the datastage job. This can be done using simple routinue.


Lert me know if you still have the same problem.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Tune your query. As Ken notes, the time to the first record is NO indication of the overall time it will take. Check your explain plan for nested loops and hint to 'use_hash' joins if needed. Also consider 'full' and 'parallel' hints and 'alter session' statements before sql to get the maximum dumpage speed from Oracle.
-craig

"You can never have too many knives" -- Logan Nine Fingers
dinakaran_s
Participant
Posts: 22
Joined: Wed Jul 02, 2008 7:01 am
Location: London

Increase the array size in OCI Oracle stage

Post by dinakaran_s »

Hi,

Try to increase array size in hte OCI stage to 10,000. This will help to increase the performance.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Why that particular number?

If 10,000 is good, why not 10 million? Should be 10,000 time better, shouldn't it?

Do you understand precisely what this Array Size parameter does?
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply