Datastage - Oracle Connector - Partitioned table-dump score

tehavele · Post by **tehavele** » Thu Jul 16, 2015 7:35 am

Hi Guys,

I am trying to understand below job score.

Environment Details -
DataStage - 9.1.2
Oracle -- 11gR2
OS - Oracle Solaris
Config File - 4 node
Oracle connector Partition Type - Oracle Connector
Write Mode - Insert
Target table is partitioned by Range Method

Job Detail --

DataSet ------ > Copy ------> Oracle Connector Stage (Update Mode)

And here is the job Score

Code: Select all

main_program: This step has 3 datasets:
ds0: {/home/hdsprod/etl_home/interface/inbound/meter_data/interval_reading_updates.ds
      [pp] eSame=>eCollectAny
      op0[4p] (parallel APT_CombinedOperatorController:dsIntervalReadingUpdates)}
ds1: {op0[4p] (parallel APT_CombinedOperatorController:cpStub)
      [pp] eOther(PXBridgePart {})#>eCollectAny
      op1[36p] (parallel buffer(0))}
ds2: {op1[36p] (parallel buffer(0))
      [pp] eSame=>eCollectAny
      op2[36p] (parallel oraIntervalReading)}
It has 3 operators:
op0[4p] {(parallel APT_CombinedOperatorController:
      (dsIntervalReadingUpdates)
      (cpStub)
    ) on nodes (
      node1[op0,p0]
      node2[op0,p1]
      node3[op0,p2]
      node4[op0,p3]
    )}
op1[36p] {(parallel buffer(0))
    on nodes (
      node1[op1,p0]
      node2[op1,p1]
      node3[op1,p2]
      node4[op1,p3]
      node4[op1,p4]
      node4[op1,p5]
      node4[op1,p6]
      node4[op1,p7]
      node4[op1,p8]
      node4[op1,p9]
      node4[op1,p10]
      node4[op1,p11]
      node4[op1,p12]
      node4[op1,p13]
      node4[op1,p14]
      node4[op1,p15]
      node4[op1,p16]
      node4[op1,p17]
      node4[op1,p18]
      node4[op1,p19]
      node4[op1,p20]
      node4[op1,p21]
      node4[op1,p22]
      node4[op1,p23]
      node4[op1,p24]
      node4[op1,p25]
      node4[op1,p26]
      node4[op1,p27]
      node4[op1,p28]
      node4[op1,p29]
      node4[op1,p30]
      node4[op1,p31]
      node4[op1,p32]
      node4[op1,p33]
      node4[op1,p34]
      node4[op1,p35]
    )}
op2[36p] {(parallel oraIntervalReading)
    on nodes (
      node1[op2,p0]
      node2[op2,p1]
      node3[op2,p2]
      node4[op2,p3]
      node4[op2,p4]
      node4[op2,p5]
      node4[op2,p6]
      node4[op2,p7]
      node4[op2,p8]
      node4[op2,p9]
      node4[op2,p10]
      node4[op2,p11]
      node4[op2,p12]
      node4[op2,p13]
      node4[op2,p14]
      node4[op2,p15]
      node4[op2,p16]
      node4[op2,p17]
      node4[op2,p18]
      node4[op2,p19]
      node4[op2,p20]
      node4[op2,p21]
      node4[op2,p22]
      node4[op2,p23]
      node4[op2,p24]
      node4[op2,p25]
      node4[op2,p26]
      node4[op2,p27]
      node4[op2,p28]
      node4[op2,p29]
      node4[op2,p30]
      node4[op2,p31]
      node4[op2,p32]
      node4[op2,p33]
      node4[op2,p34]
      node4[op2,p35]
    )}
It runs 76 processes on 4 nodes.

My doubt is why have most of the oracle connector processes got created on node 4 only ? I would have liked these 35 oracle table partitions to be equally partitioned on 4 datastage nodes.

When I run this in Hash or RoundRobin partition data gets equally partitioned on all 4 nodes.

Any help would be greatly appreciated.

rkashyap · Post by **rkashyap** » Thu Jul 16, 2015 8:09 pm

When using Oracle Partition Connector type, it is a good idea to configure the the number of nodes to match the number of partitions in the table.

In the case described above, number of nodes(4) is fewer than the number of partitions in the table(36), so the connector added 32 additional nodes to the end of the node list. Definition for each node that is added matches the definition of the last node in the original list, which was Node4, consequently most of the processes got concentrated on Node4.

See third bullet point of this link for details.

tehavele · Post by **tehavele** » Fri Jul 17, 2015 2:55 am

Thanks rkashyap for your reply. From above reply it is pretty much clear now.

rkashyap · Post by **rkashyap** » Fri Jul 17, 2015 11:45 am

You are welcome. Large no of partitions (36) on the table may have adverse impact on performance. See following snippet from Tuning the Oracle Connector performance

If the stage is configured to run in parallel execution mode in fewer player processes than there are partitions in the table, the connector will automatically increase the number of player processes at runtime to match the number of table partitions. For example, if the table has 50 partitions, there will be 50 player processes running in parallel, each fetching rows from one dedicated table partition. The overhead of starting up this many player processes and establishing that many connections to the database may outweigh the benefits of performing the partitioned reads.

tehavele · Post by **tehavele** » Tue Jul 21, 2015 12:52 am

rkashyap wrote:Large no of partitions (36) on the table may have adverse impact on performance.

I am experiencing adverse impact of such partitioning. I am planning to change it to same partitioning ans see how it performs. Will share the results once done.

rkashyap · Post by **rkashyap** » Wed Sep 28, 2016 12:47 pm

Patch JR55886 is now available to balance compute node utilization when Oracle partitions out number compute nodes.