Performance Boost

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
Teej
Participant
Posts: 677
Joined: Fri Aug 08, 2003 9:26 am
Location: USA

Performance Boost

Post by Teej »

This is true for Tru64 implementation of DataStage 6.0.1 PX with the latest patch using data pulled from Oracle (you know, varchar, numbers, et cetera). Whether this can be true or not for other platforms is up to you.

We have discovered that Lookup stage are extremely slow in handling matches, while using Full Outer Join with a Transform filtering results (if you use the reject link) would produce 10-20x the performance (analysis still going on right now) over a simple Lookup stage. Naturally, if you do not need a reject link, you should use full inner join.

This was our major stumbling block in seeing how viable PX can be -- for some jobs, we're shifting 700+ million rows of data from a single table with 5+ million rows of 'lookup' data.

So if you guys have very large schemas, or is planning to do so -- keep in mind the dramatic difference in performance for Joins compared to Lookup.

This free advice have been brought to you by the letter "D", "S", and the number "42". :D

-T.J.
bigpoppa
Participant
Posts: 190
Joined: Fri Feb 28, 2003 11:39 am

Performance Boost

Post by bigpoppa »

Teej,

Thanks for your post. How are your lookups and joins holding up in 7.0?

- BP
Teej
Participant
Posts: 677
Joined: Fri Aug 08, 2003 9:26 am
Location: USA

Re: Performance Boost

Post by Teej »

bigpoppa wrote:Teej,

Thanks for your post. How are your lookups and joins holding up in 7.0?

- BP
We still have not moved to 7.0 -- we're too busy ensuring that 6.0.1 is Production-certified. We've been in close contact with Ascential (which is pretty good), but thus far, the lookup fixes that we had are not doing well thus far. Hopefully this would be fixed sooner rather than later.

-T.J.
Developer of DataStage Parallel Engine (Orchestrate).
bigpoppa
Participant
Posts: 190
Joined: Fri Feb 28, 2003 11:39 am

Lookup table fits into memory

Post by bigpoppa »

Just to point out:

PX lookups are only performant if the lookup table fits comfortably into memory. On SMPs, use the 'entire' partition on your lookup to generate one copy of the table that is entirely shared in memory with all paritions. On MPPs, create a partitioned (typically hashed) to create partitions of the lookup that fully fit into non-shared memory segments.

If your lookup table is huge and will not fit into memory, you are better off using the join stage.

T.J. : What lookup enhancements did you ask Ascential for?

Thanks,
BP
Teej
Participant
Posts: 677
Joined: Fri Aug 08, 2003 9:26 am
Location: USA

Re: Lookup table fits into memory

Post by Teej »

bigpoppa wrote:Just to point out:

PX lookups are only performant if the lookup table fits comfortably into memory. On SMPs, use the 'entire' partition on your lookup to generate one copy of the table that is entirely shared in memory with all paritions. On MPPs, create a partitioned (typically hashed) to create partitions of the lookup that fully fit into non-shared memory segments.

If your lookup table is huge and will not fit into memory, you are better off using the join stage.

T.J. : What lookup enhancements did you ask Ascential for?
Enhancements? I meant bug fixes. :LOL: Seriously, there are a number of known performance issues with Tru64, including at least a known bug, that deals with memory management. We're working with Ascential (including flying in one of the Senior Director) to help maximize the performance capabilities for our servers. The performance was so bad in the past that we practically restored to doing Oracle-Oracle jobs instead of utilizing DataStage.

Once the dust settles, I'll be able to gossip. ;-)

-T.J.
Developer of DataStage Parallel Engine (Orchestrate).
Post Reply