Factors to decide the run mode of Oracle connector

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
Lavanya B
Participant
Posts: 20
Joined: Mon Oct 30, 2006 12:32 am

Factors to decide the run mode of Oracle connector

Post by Lavanya B »

Hi,

We are doing performance testing
We are reading data from an oracle table in the oracle connector stage.
The query used is
select col1,col2,col3
from table1
where col2='Z'
The query returns 50 million records.
We are reading data from another oracle table
select col1,col2,col3
from table2
where col2='1'
The query returns 41 million records.
We are joining the data returned from these two tables using join stage.
We are doing performance testing.
For extracting data from table1,when we set the option "Enable Partitioned reads" to No, the stage runs on one node and the time taken to read is around 2 min.
But When we set the option "Enable Partitioned reads" to Yes, the stage runs on multiple nodes(we are using 10 node config file) and the time taken to read is around 10 min.
The table is not partitioned in the db.
For extracting data from table2, the "Enable Partitioned reads" to Yes fetches the data faster. Elapsed time is 30 sec.
What are the factors to be considered to set the "Enable Partitioned reads" to Yes or No?
kumar_s
Charter Member
Charter Member
Posts: 5245
Joined: Thu Jun 16, 2005 11:00 pm

Post by kumar_s »

Perhaps Datastage undergoing overhead based on Partition read method.
Partitioned read works well for the Partitioned data.
Have you tried the Rowid round robin method?
Ideally this should do a simple partitioning.
Impossible doesn't mean 'it is not possible' actually means... 'NOBODY HAS DONE IT SO FAR'
Lavanya B
Participant
Posts: 20
Joined: Mon Oct 30, 2006 12:32 am

Post by Lavanya B »

Yes we have tried with Rowid round robin method but no luck. It is taking almost the same time as rowid range method.
We have another job which is extracting the data from table1 and processing is done in transformer and aggregator stages and there is no join/lookup stage.
In this job the elapsed time of reading the data from table1 is less when "Enable Partitioned reads" is set to Yes and more when "Enable Partitioned reads" to No.
So our question is-
Is it because of the join stage immediately after the oracle connector that the read from the table is taking time when "Enable Partitioned reads" to Yes? As the join stage pre sorts the data from both the links.
The data in table1 is 100 million rows. Table is not partitioned. The query in the job fetches 50 million rows.
The data in table2 is 50 million rows.Table is not partitioned. The query fetches 40 million rows.
mouthou
Participant
Posts: 208
Joined: Sun Jul 04, 2004 11:57 pm

Re: Factors to decide the run mode of Oracle connector

Post by mouthou »

It is understood that you are trying to do some performance testing and that too using Join stage in DataStage.

Is there any difficulty in joining the 2 queries using DB join itself? And what is the difficulty in creating table indexes and partitions which is much worth than dealing with DataStage features.
Lavanya B
Participant
Posts: 20
Joined: Mon Oct 30, 2006 12:32 am

Post by Lavanya B »

We understand that performance can be improved by indexing and partitioning on database side. But we want to know from DataStage end what all can be done to improve the performance.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

As noted, don't use the Join stage as it's meant for heterogeneous data. If those two tables are in the same instance, simply join them in the source.
-craig

"You can never have too many knives" -- Logan Nine Fingers
Post Reply