Sequence Generation Anamoly

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
jerome_rajan
Premium Member
Premium Member
Posts: 376
Joined: Sat Jan 07, 2012 12:25 pm
Location: Piscataway

Sequence Generation Anamoly

Post by jerome_rajan »

Hi,
We're facing a peculiar situation where a sequence generator stage that generates surrogate keys from an Oracle sequence object (with a cache of 1000) generates the right values in 3 out of the 4 nodes. But the 4th node seems to be generating values way before the CURRVAL of the sequence object.

What could the reason be?
Jerome
Data Integration Consultant at AWS
Connect With Me On LinkedIn

Life is really simple, but we insist on making it complicated.
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

That sounds odd. If you change to a 3-node configuration, does one node also show incorrectly generated keys?
jerome_rajan
Premium Member
Premium Member
Posts: 376
Joined: Sat Jan 07, 2012 12:25 pm
Location: Piscataway

Post by jerome_rajan »

We were initially getting a mix of correct (current) keys and old ones. As part of the troubleshooting, put a node constraint once on each of the 4 nodes and narrowed down on node 4 that was always coming up with old keys.

I presumed that it was probably because of the caching of keys on that node but somehow it's probably not making sense.
Jerome
Data Integration Consultant at AWS
Connect With Me On LinkedIn

Life is really simple, but we insist on making it complicated.
jerome_rajan
Premium Member
Premium Member
Posts: 376
Joined: Sat Jan 07, 2012 12:25 pm
Location: Piscataway

Post by jerome_rajan »

ArndW wrote:That sounds odd. If you change to a 3-node configuration, does one node also show incorrectly generated keys? ...
To answer your question, one node (the same one in every run) always has the older set of values. All nodes are on the same machine
Jerome
Data Integration Consultant at AWS
Connect With Me On LinkedIn

Life is really simple, but we insist on making it complicated.
Teej
Participant
Posts: 677
Joined: Fri Aug 08, 2003 9:26 am
Location: USA

Post by Teej »

Sounds like one of the gap feature that we support. You will need to explicitly request that it always pull from the largest value if you wish to avoid generating using gaps in the surrogate key.

More details here:

http://www.ibm.com/support/knowledgecen ... _Keys.html
To generate keys in sequence from the highest value that was last used, set the Generate Key from Last Highest Value property to Yes. Any gaps in the key range are ignored.
Developer of DataStage Parallel Engine (Orchestrate).
jerome_rajan
Premium Member
Premium Member
Posts: 376
Joined: Sat Jan 07, 2012 12:25 pm
Location: Piscataway

Post by jerome_rajan »

The key source is an Oracle sequence object while the gap feature you mention probably applies only to state files. Thanks for the response though.

After several rounds of debugging,we narrowed down on the possible cause. Posting this for posterity -
The Oracle Database is an Exadata machine that runs in a multinode clustered environment. The sequence object has a high cache value (1000, in our case). Sequence objects in a RAC DB work in a way such that a cache(or block) of values get allotted to each node and each node works with this set till it reaches a point where it needs to bump up.

In our case, 2/4 of the DataStage nodes were creating a session with say node 1 of the database and the other 2 DS nodes were creating sessions with node 2 of the DB. Data skews probably caused only one of the nodes to get large chunks of data while the other node progressed very slowly. Over a period of time, the cache in one of the nodes had progressed so much that the differences became very noticeable.

To confirm this hypothesis, we disabled caching and enabled ordering in the sequence object which ensured that all nodes in the RAC DB are always in sync.
Jerome
Data Integration Consultant at AWS
Connect With Me On LinkedIn

Life is really simple, but we insist on making it complicated.
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

Thanks for updating the status and drilling down to the root cause.
Post Reply