Oracle Enterprise - Reading in parallel

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
rickrambo
Participant
Posts: 21
Joined: Wed Jul 27, 2005 11:27 am

Oracle Enterprise - Reading in parallel

Post by rickrambo »

How do I READ ( Not write) in parallel using Oracle Enterprise stage.?

The "Execution Mode" tab is defaulted to Sequential and is greyed. How would I change this to Parallel?
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

You don't (read in parallel) unless the Oracle table is a partitioned table.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
rickrambo
Participant
Posts: 21
Joined: Wed Jul 27, 2005 11:27 am

Post by rickrambo »

Oracle table is partitinoned.

Now, how do I specify this table to be read in parallel? Are you suggesting that partitioned table will be read in parallel even though the "Executeion Mode" is sequential? If that is the case.....

Can I not read a partitioned table sequentially?
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

You can read a partitioned table sequentially. What you can not do is read a non-partitioned table in parallel.

Look for the Partition Table property.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
keshav0307
Premium Member
Premium Member
Posts: 783
Joined: Mon Jan 16, 2006 10:17 pm
Location: Sydney, Australia

Post by keshav0307 »

Ray,
do you mean that although the execution mode of the oracle enterprise stage is sequential, if the partiotion table property is specified then the job will read the table in parallel?

even after defining the partition table, The "Execution Mode" tab is defaulted to Sequential and is greyed
jasper
Participant
Posts: 111
Joined: Mon May 06, 2002 1:25 am
Location: Belgium

Post by jasper »

Ray,
I think your statement is incorrect.
When you have an Oracle stage in a parallel job there is an option for 'partition table'. By using this you can read in parallel.

How does it work: you give a tablename to this parameter. DS will look for the min and max rowid for this table and will add this in the where clause. So if the table would have min rowid 0 and max rowid 40 and you run with 4 nodes it will run 4 times the query with adding
'where rowid between 0 and 10', between 10 and 20',....

This has nothin to do with the table fysically being partitioned or not.

side-effects:
-if you do anything like sort or aggregate in your job you will need to repartition in the following stages
-it's based on the full table, if you have an extra where clause you could get very unbalanced results
-watch your explain plan, because of the rowid it will become a full tablescan on that table.
kirankota79
Premium Member
Premium Member
Posts: 315
Joined: Tue Oct 31, 2006 3:38 pm

Post by kirankota79 »

for four nodes ...does it reduce the time read by 4 times if we use the partition table property? i want to write to a dataset and want to use the same dataset to reload back to the same table
hello105
Participant
Posts: 9
Joined: Mon Sep 03, 2007 9:53 pm
Location: ShangHai,China

Post by hello105 »

You are right.But if the sql has a where clause,when you use the partiton,the result will have repeated records.Maybe it is a bug.
jasper wrote:Ray,
I think your statement is incorrect.
When you have an Oracle stage in a parallel job there is an option for 'partition table'. By using this you can read in parallel.

How does it work: you give a tablename to this parameter. DS will look for the min and max rowid for this table and will add this in the where clause. So if the table would have min rowid 0 and max rowid 40 and you run with 4 nodes it will run 4 times the query with adding
'where rowid between 0 and 10', between 10 and 20',....

This has nothin to do with the table fysically being partitioned or not.

side-effects:
-if you do anything like sort or aggregate in your job you will need to repartition in the following stages
-it's based on the full table, if you have an extra where clause you could get very unbalanced results
-watch your explain plan, because of the rowid it will become a full tablescan on that table.
jreddy
Premium Member
Premium Member
Posts: 202
Joined: Tue Feb 03, 2004 5:09 pm

Post by jreddy »

Yes hello105, you are right.
When i set the partition table option while using a user-defined SQL query that joins multiple tables, I end up with duplicates.
The data read becomes faster, but to avoid the duplicates if we add a remove dups stage, it slows down again negating the speed that i achieve with Partition Table option..

Is this really a bug or is there something that has to be set to avoid generating duplicates.
UPS
Premium Member
Premium Member
Posts: 56
Joined: Tue Oct 10, 2006 12:18 pm
Location: New Jersey

Reading an ORACLE non-partitioned table in parallel

Post by UPS »

I do not know if this will help but I use the modulus statement in the 'WHERE' clause to simulate a parallel read. If I want 4 degrees of parallelism I create 4 Oracle stages and use the mod function in the 'WHERE' clause. However, you should use it with a numerical key.
A hack I know but hey, it reads a non-partitioned table in parallel.
Example of 4 degrees of parallelism :
ORACLE_STAGE_1
WHERE mod( key_field, 4) = 0
ORACLE_STAGE_2
WHERE mod( key_field, 4) = 1
ORACLE_STAGE_3
WHERE mod( key_field, 4) = 2
ORACLE_STAGE_4
WHERE mod( key_field, 4) = 3
Post Reply