DSXchange: DataStage and IBM Websphere Data Integration Forum
View next topic
View previous topic
Add To Favorites
Author Message
Lavanya B
Participant



Joined: 30 Oct 2006
Posts: 20

Points: 260

Post Posted: Mon Apr 08, 2019 12:48 pm Reply with quote    Back to top    

DataStage® Release: 11x
Job Type: Parallel
OS: Unix
Hi,

We are doing performance testing
We are reading data from an oracle table in the oracle connector stage.
The query used is
select col1,col2,col3
from table1
where col2='Z'
The query returns 50 million records.
We are reading data from another oracle table
select col1,col2,col3
from table2
where col2='1'
The query returns 41 million records.
We are joining the data returned from these two tables using join stage.
We are doing performance testing.
For extracting data from table1,when we set the option "Enable Partitioned reads" to No, the stage runs on one node and the time taken to read is around 2 min.
But When we set the option "Enable Partitioned reads" to Yes, the stage runs on multiple nodes(we are using 10 node config file) and the time taken to read is around 10 min.
The table is not partitioned in the db.
For extracting data from table2, the "Enable Partitioned reads" to Yes fetches the data faster. Elapsed time is 30 sec.
What are the factors to be considered to set the "Enable Partitioned reads" to Yes or No?
kumar_s

Premium Poster


since February 2006

Group memberships:
Premium Members, Heartland Usergroup

Joined: 16 Jun 2005
Posts: 5244

Points: 26374

Post Posted: Tue Apr 09, 2019 12:05 am Reply with quote    Back to top    

Perhaps Datastage undergoing overhead based on Partition read method. Partitioned read works well for the Partitioned data. Have you tried the Rowid round robin method? Ideally this should do ...

_________________
Impossible doesn't mean 'it is not possible' actually means... 'NOBODY HAS DONE IT SO FAR'
Rate this response:  
Not yet rated
Lavanya B
Participant



Joined: 30 Oct 2006
Posts: 20

Points: 260

Post Posted: Wed Apr 10, 2019 12:23 pm Reply with quote    Back to top    

Yes we have tried with Rowid round robin method but no luck. It is taking almost the same time as rowid range method.
We have another job which is extracting the data from table1 and processing is done in transformer and aggregator stages and there is no join/lookup stage.
In this job the elapsed time of reading the data from table1 is less when "Enable Partitioned reads" is set to Yes and more when "Enable Partitioned reads" to No.
So our question is-
Is it because of the join stage immediately after the oracle connector that the read from the table is taking time when "Enable Partitioned reads" to Yes? As the join stage pre sorts the data from both the links.
The data in table1 is 100 million rows. Table is not partitioned. The query in the job fetches 50 million rows.
The data in table2 is 50 million rows.Table is not partitioned. The query fetches 40 million rows.
Rate this response:  
Not yet rated
mouthou
Participant



Joined: 04 Jul 2004
Posts: 208

Points: 1991

Post Posted: Wed Apr 10, 2019 2:54 pm Reply with quote    Back to top    

It is understood that you are trying to do some performance testing and that too using Join stage in DataStage.

Is there any difficulty in joining the 2 queries using DB join itself? And what is the difficulty in creating table indexes and partitions which is much worth than dealing with DataStage features.
Rate this response:  
Not yet rated
Lavanya B
Participant



Joined: 30 Oct 2006
Posts: 20

Points: 260

Post Posted: Wed Apr 10, 2019 5:29 pm Reply with quote    Back to top    

We understand that performance can be improved by indexing and partitioning on database side. But we want to know from DataStage end what all can be done to improve the performance.
Rate this response:  
Not yet rated
chulett

Premium Poster


since January 2006

Group memberships:
Premium Members, Inner Circle, Server to Parallel Transition Group

Joined: 12 Nov 2002
Posts: 42964
Location: Denver, CO
Points: 221565

Post Posted: Wed Apr 10, 2019 7:38 pm Reply with quote    Back to top    

As noted, don't use the Join stage as it's meant for heterogeneous data. If those two tables are in the same instance, simply join them in the source.

_________________
-craig

Now I've got that feeling once again, I can't explain; you would not understand
this is now who I am. I have become comfortably numb.
Rate this response:  
Not yet rated
Display posts from previous:       

Add To Favorites
View next topic
View previous topic
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum



Powered by phpBB © 2001, 2002 phpBB Group
Theme & Graphics by Daz :: Portal by Smartor
All times are GMT - 6 Hours