Factors to decide the run mode of Oracle connector

Lavanya B · Post by **Lavanya B** » Mon Apr 08, 2019 12:48 pm

Hi,

We are doing performance testing
We are reading data from an oracle table in the oracle connector stage.
The query used is
select col1,col2,col3
from table1
where col2='Z'
The query returns 50 million records.
We are reading data from another oracle table
select col1,col2,col3
from table2
where col2='1'
The query returns 41 million records.
We are joining the data returned from these two tables using join stage.
We are doing performance testing.
For extracting data from table1,when we set the option "Enable Partitioned reads" to No, the stage runs on one node and the time taken to read is around 2 min.
But When we set the option "Enable Partitioned reads" to Yes, the stage runs on multiple nodes(we are using 10 node config file) and the time taken to read is around 10 min.
The table is not partitioned in the db.
For extracting data from table2, the "Enable Partitioned reads" to Yes fetches the data faster. Elapsed time is 30 sec.
What are the factors to be considered to set the "Enable Partitioned reads" to Yes or No?

kumar_s · Post by **kumar_s** » Tue Apr 09, 2019 12:05 am

Perhaps Datastage undergoing overhead based on Partition read method.
Partitioned read works well for the Partitioned data.
Have you tried the Rowid round robin method?
Ideally this should do a simple partitioning.

Lavanya B · Post by **Lavanya B** » Wed Apr 10, 2019 12:23 pm

Yes we have tried with Rowid round robin method but no luck. It is taking almost the same time as rowid range method.
We have another job which is extracting the data from table1 and processing is done in transformer and aggregator stages and there is no join/lookup stage.
In this job the elapsed time of reading the data from table1 is less when "Enable Partitioned reads" is set to Yes and more when "Enable Partitioned reads" to No.
So our question is-
Is it because of the join stage immediately after the oracle connector that the read from the table is taking time when "Enable Partitioned reads" to Yes? As the join stage pre sorts the data from both the links.
The data in table1 is 100 million rows. Table is not partitioned. The query in the job fetches 50 million rows.
The data in table2 is 50 million rows.Table is not partitioned. The query fetches 40 million rows.

mouthou · Post by **mouthou** » Wed Apr 10, 2019 2:54 pm

It is understood that you are trying to do some performance testing and that too using Join stage in DataStage.

Is there any difficulty in joining the 2 queries using DB join itself? And what is the difficulty in creating table indexes and partitions which is much worth than dealing with DataStage features.

Lavanya B · Post by **Lavanya B** » Wed Apr 10, 2019 5:29 pm

We understand that performance can be improved by indexing and partitioning on database side. But we want to know from DataStage end what all can be done to improve the performance.

chulett · Post by **chulett** » Wed Apr 10, 2019 7:38 pm

As noted, don't use the Join stage as it's meant for heterogeneous data. If those two tables are in the same instance, simply join them in the source.

DSXchange

Factors to decide the run mode of Oracle connector

Factors to decide the run mode of Oracle connector

Re: Factors to decide the run mode of Oracle connector