Surrogate Key Suggestions from the experts

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
paulhill20
Participant
Posts: 11
Joined: Tue Jun 22, 2004 1:06 pm

Surrogate Key Suggestions from the experts

Post by paulhill20 »

All:
I am a DataStage novice so please provide as detailed feedback as you see fit and have time for; I am researching this but figured I would ask you, the expert DataStage users/developers, for your suggestions.
I have a Custom Stage written in C++/Oracle Pro*C/C++ that fetches a sequence value (NEXTVAL) from the database; my research on its usage suggests that its only function is to generate a surrogate key.
The stage takes an input data stream, the Oracle information (sequence name, schema, password, sid) and simply tacks on the nextval; if you picture it, it is one icon.
Having said that, I am looking to replace that one icon with a more "maintenace friendly solution". Could you please provide suggestions on possible approaches to replace the one icon with another icon that performs the same function (getting the nextval from Oracle in a datastream providing the sequence name).
Please keep in mind that because the existing stage is an Orchestrate based routine it comes with all the apparent "parallelism" that is part of the Parallel Extender and that I am looking to maintain the performance levels if possible.
Thank you,
Stephen de Vries
Mat01
Premium Member
Premium Member
Posts: 50
Joined: Wed Jun 02, 2004 11:12 am
Location: Montreal, Canada

Post by Mat01 »

One solution is to modify your sequence to add one step to get the value and one to assign it to a job. What I have is a sequence like the following:

1. Job to get DB Key and write to a file --> 2. Routine to get the value from UNIX (cat) --> 3. Job for your surrogate key job (Surrogate key stage using your key value as starting point).

It is a little more complex Datastage wise but probably simpler to maintain than a stored proc.

Regards,

Mat
T42
Participant
Posts: 499
Joined: Thu Nov 11, 2004 6:45 pm

Post by T42 »

You do realize that you can actually declare NEXTVAL on the output SQL within the Oracle Enterprise stage?

You also do realize that we have a Surrogate Key Generator stage that can be used in conjunction with reading the maximum value from the database (and merged into each record via lookup, and calculated in a transformer stage)?

There are quite a few options here that can be done, that it is really unnecessary to use a custom stage for generating surrogate keys.
paulhill20
Participant
Posts: 11
Joined: Tue Jun 22, 2004 1:06 pm

Post by paulhill20 »

This was all developed under 6.0 probably two years ago (or more) and when Ascential DataStage was first brought into the enterprise.

[quote="T42"]You do realize that you can actually declare NEXTVAL on the output SQL within the Oracle Enterprise stage?

The Surrogate Key generator is only a new feature of 7 or 7.5...

You also do realize that we have a Surrogate Key Generator stage that can be used in conjunction with reading the maximum value from the database (and merged into each record via lookup, and calculated in a transformer stage)?

I think that the custom stage approach works because it is truly parallel, where a stored procedure stage is serial (due to the fact that it is a server stage). You are correct - there are many approaches; I tried to frame my question so that, in the context of this particular environment, I could get suggestions on a specific way to replace this one icon that is in the middle of the stream.

It is my belief that I can also use the nextval notation in the sql loader control file (from my knowledge in 9i and above this is usable even when parallel loading) so it may be that I could do that as well and simply get rid of the stage...

Thanks for your suggestions.
There are quite a few options here that can be done, that it is really unnecessary to use a custom stage for generating surrogate keys.[/quote]
T42
Participant
Posts: 499
Joined: Thu Nov 11, 2004 6:45 pm

Post by T42 »

Actually, it's not the stored procedure stage I'm referring to.

insert into [table] (field)
values (field.NEXTVAL)

It may not be truly parallel, but it's all within the Oracle framework.

Surrogate Key Stage is a 7.x feature. It only simplify the same capability that can be found within the Column Generator which is available since 6.x.

The replacement of the one icon solution would be as follow:

Code: Select all

    [Oracle]
        |
        V
---->[Lookup]------>[Surr. Key Gen.]------>[Transformer]----
Get the maximum value from Oracle, merge it using a dummy key within Lookup to the main stream. Use Surrogate Key Generator stage to generate baseline value for each record. Within Transformer, add the baseline value with the maximum value, and you got yourself the keys.

However, the risk remains that the Oracle database will have new records between the time you pull the maximum value and the time you add these records to the table. If you can not guarantee that this will never happen, then using .NEXTVAL as I initially suggest will be your best option.
Post Reply