Hi,
We are doing performance testing
We are reading data from an oracle table in the oracle connector stage.
The query used is
select col1,col2,col3
from table1
where col2='Z'
The query returns 50 million records.
We are reading data from another oracle table
select col1,col2,col3
from table2
where col2='1'
The query returns 41 million records.
We are joining the data returned from these two tables using join stage.
We are doing performance testing.
For extracting data from table1,when we set the option "Enable Partitioned reads" to No, the stage runs on one node and the time taken to read is around 2 min.
But When we set the option "Enable Partitioned reads" to Yes, the stage runs on multiple nodes(we are using 10 node config file) and the time taken to read is around 10 min.
The table is not partitioned in the db.
For extracting data from table2, the "Enable Partitioned reads" to Yes fetches the data faster. Elapsed time is 30 sec.
What are the factors to be considered to set the "Enable Partitioned reads" to Yes or No?
Factors to decide the run mode of Oracle connector
Moderators: chulett, rschirm, roy
Perhaps Datastage undergoing overhead based on Partition read method.
Partitioned read works well for the Partitioned data.
Have you tried the Rowid round robin method?
Ideally this should do a simple partitioning.
Partitioned read works well for the Partitioned data.
Have you tried the Rowid round robin method?
Ideally this should do a simple partitioning.
Impossible doesn't mean 'it is not possible' actually means... 'NOBODY HAS DONE IT SO FAR'
Yes we have tried with Rowid round robin method but no luck. It is taking almost the same time as rowid range method.
We have another job which is extracting the data from table1 and processing is done in transformer and aggregator stages and there is no join/lookup stage.
In this job the elapsed time of reading the data from table1 is less when "Enable Partitioned reads" is set to Yes and more when "Enable Partitioned reads" to No.
So our question is-
Is it because of the join stage immediately after the oracle connector that the read from the table is taking time when "Enable Partitioned reads" to Yes? As the join stage pre sorts the data from both the links.
The data in table1 is 100 million rows. Table is not partitioned. The query in the job fetches 50 million rows.
The data in table2 is 50 million rows.Table is not partitioned. The query fetches 40 million rows.
We have another job which is extracting the data from table1 and processing is done in transformer and aggregator stages and there is no join/lookup stage.
In this job the elapsed time of reading the data from table1 is less when "Enable Partitioned reads" is set to Yes and more when "Enable Partitioned reads" to No.
So our question is-
Is it because of the join stage immediately after the oracle connector that the read from the table is taking time when "Enable Partitioned reads" to Yes? As the join stage pre sorts the data from both the links.
The data in table1 is 100 million rows. Table is not partitioned. The query in the job fetches 50 million rows.
The data in table2 is 50 million rows.Table is not partitioned. The query fetches 40 million rows.
Re: Factors to decide the run mode of Oracle connector
It is understood that you are trying to do some performance testing and that too using Join stage in DataStage.
Is there any difficulty in joining the 2 queries using DB join itself? And what is the difficulty in creating table indexes and partitions which is much worth than dealing with DataStage features.
Is there any difficulty in joining the 2 queries using DB join itself? And what is the difficulty in creating table indexes and partitions which is much worth than dealing with DataStage features.