Datastage - Oracle Connector - Partitioned table-dump score

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
tehavele
Premium Member
Premium Member
Posts: 79
Joined: Wed Nov 12, 2008 5:41 am
Location: Melbourne

Datastage - Oracle Connector - Partitioned table-dump score

Post by tehavele »

Hi Guys,

I am trying to understand below job score.

Environment Details -
DataStage - 9.1.2
Oracle -- 11gR2
OS - Oracle Solaris
Config File - 4 node
Oracle connector Partition Type - Oracle Connector
Write Mode - Insert
Target table is partitioned by Range Method

Job Detail --
DataSet ------ > Copy ------> Oracle Connector Stage (Update Mode)
And here is the job Score

Code: Select all

main_program: This step has 3 datasets:
ds0: {/home/hdsprod/etl_home/interface/inbound/meter_data/interval_reading_updates.ds
      [pp] eSame=>eCollectAny
      op0[4p] (parallel APT_CombinedOperatorController:dsIntervalReadingUpdates)}
ds1: {op0[4p] (parallel APT_CombinedOperatorController:cpStub)
      [pp] eOther(PXBridgePart {})#>eCollectAny
      op1[36p] (parallel buffer(0))}
ds2: {op1[36p] (parallel buffer(0))
      [pp] eSame=>eCollectAny
      op2[36p] (parallel oraIntervalReading)}
It has 3 operators:
op0[4p] {(parallel APT_CombinedOperatorController:
      (dsIntervalReadingUpdates)
      (cpStub)
    ) on nodes (
      node1[op0,p0]
      node2[op0,p1]
      node3[op0,p2]
      node4[op0,p3]
    )}
op1[36p] {(parallel buffer(0))
    on nodes (
      node1[op1,p0]
      node2[op1,p1]
      node3[op1,p2]
      node4[op1,p3]
      node4[op1,p4]
      node4[op1,p5]
      node4[op1,p6]
      node4[op1,p7]
      node4[op1,p8]
      node4[op1,p9]
      node4[op1,p10]
      node4[op1,p11]
      node4[op1,p12]
      node4[op1,p13]
      node4[op1,p14]
      node4[op1,p15]
      node4[op1,p16]
      node4[op1,p17]
      node4[op1,p18]
      node4[op1,p19]
      node4[op1,p20]
      node4[op1,p21]
      node4[op1,p22]
      node4[op1,p23]
      node4[op1,p24]
      node4[op1,p25]
      node4[op1,p26]
      node4[op1,p27]
      node4[op1,p28]
      node4[op1,p29]
      node4[op1,p30]
      node4[op1,p31]
      node4[op1,p32]
      node4[op1,p33]
      node4[op1,p34]
      node4[op1,p35]
    )}
op2[36p] {(parallel oraIntervalReading)
    on nodes (
      node1[op2,p0]
      node2[op2,p1]
      node3[op2,p2]
      node4[op2,p3]
      node4[op2,p4]
      node4[op2,p5]
      node4[op2,p6]
      node4[op2,p7]
      node4[op2,p8]
      node4[op2,p9]
      node4[op2,p10]
      node4[op2,p11]
      node4[op2,p12]
      node4[op2,p13]
      node4[op2,p14]
      node4[op2,p15]
      node4[op2,p16]
      node4[op2,p17]
      node4[op2,p18]
      node4[op2,p19]
      node4[op2,p20]
      node4[op2,p21]
      node4[op2,p22]
      node4[op2,p23]
      node4[op2,p24]
      node4[op2,p25]
      node4[op2,p26]
      node4[op2,p27]
      node4[op2,p28]
      node4[op2,p29]
      node4[op2,p30]
      node4[op2,p31]
      node4[op2,p32]
      node4[op2,p33]
      node4[op2,p34]
      node4[op2,p35]
    )}
It runs 76 processes on 4 nodes.
My doubt is why have most of the oracle connector processes got created on node 4 only ? I would have liked these 35 oracle table partitions to be equally partitioned on 4 datastage nodes.

When I run this in Hash or RoundRobin partition data gets equally partitioned on all 4 nodes.

Any help would be greatly appreciated.
Tejas
rkashyap
Premium Member
Premium Member
Posts: 532
Joined: Fri Dec 02, 2011 12:02 pm
Location: Richmond VA

Post by rkashyap »

When using Oracle Partition Connector type, it is a good idea to configure the the number of nodes to match the number of partitions in the table.

In the case described above, number of nodes(4) is fewer than the number of partitions in the table(36), so the connector added 32 additional nodes to the end of the node list. Definition for each node that is added matches the definition of the last node in the original list, which was Node4, consequently most of the processes got concentrated on Node4.

See third bullet point of this link for details.
tehavele
Premium Member
Premium Member
Posts: 79
Joined: Wed Nov 12, 2008 5:41 am
Location: Melbourne

Post by tehavele »

Thanks rkashyap for your reply. From above reply it is pretty much clear now.
Tejas
rkashyap
Premium Member
Premium Member
Posts: 532
Joined: Fri Dec 02, 2011 12:02 pm
Location: Richmond VA

Post by rkashyap »

You are welcome. Large no of partitions (36) on the table may have adverse impact on performance. See following snippet from Tuning the Oracle Connector performance
If the stage is configured to run in parallel execution mode in fewer player processes than there are partitions in the table, the connector will automatically increase the number of player processes at runtime to match the number of table partitions. For example, if the table has 50 partitions, there will be 50 player processes running in parallel, each fetching rows from one dedicated table partition. The overhead of starting up this many player processes and establishing that many connections to the database may outweigh the benefits of performing the partitioned reads.
tehavele
Premium Member
Premium Member
Posts: 79
Joined: Wed Nov 12, 2008 5:41 am
Location: Melbourne

Post by tehavele »

rkashyap wrote:Large no of partitions (36) on the table may have adverse impact on performance.
I am experiencing adverse impact of such partitioning. I am planning to change it to same partitioning ans see how it performs. Will share the results once done.
Tejas
rkashyap
Premium Member
Premium Member
Posts: 532
Joined: Fri Dec 02, 2011 12:02 pm
Location: Richmond VA

Post by rkashyap »

Patch JR55886 is now available to balance compute node utilization when Oracle partitions out number compute nodes.
Post Reply