Include only selected columns in surrogate key stage

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
sreesuku2
Participant
Posts: 45
Joined: Tue Oct 22, 2013 11:45 am

Include only selected columns in surrogate key stage

Post by sreesuku2 »

Hi Team,

I am selecting a 4 columns (ID,Code, Name, Time) from my SQL to load in to my target table.
I have to generate a surrogate key only based on 2 specific columns. (Here code and Name). But I need all 4 columns to be part of my output
Is there a way I can specify that ?

Source Table

Code: Select all

ID	Code	Name	Time
1	ABC	XYZ	Jan
2	ABC	XYZ	Feb
3	PQR	MNO	Nov
4	PQR	MNO	Dec
Output from Surrogate Key stage

Code: Select all

Old ID	New SK	Code	 Name
1	        111	 ABC	XYZ
2	        111	 ABC	XYZ
3	        222	 PQR	MNO
4	        222	 PQR	MNO
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

A true surrogate key is not based on any columns; it guarantees uniqueness and conveys no other meaning of any kind.
Therefore your problem is not one to be resolved using a Surrogate Key Generator stage. You need some mechanism to map each distinct combination of your key values to some kind of unique value. The solution will depend primarily on whether there is a small or large number of distinct values of (CODE, NAME).
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

I'm going to go out on a limb here and say that I think they understand the concept of a surrogate key but that their post wasn't worded properly. I think rather than based on two specific columns, they really meant for a grouping of two specific columns - i.e. the business key fields. As in generate a new one when that group changes.

If that's true, you wouldn't use the stage for that but would instead use the function equivalent NextSurrogateKey( ) in a transformer. That would allow all of the columns to "be part of the output" but with proper group change detection via stage variables, you would be able to assign a new surrogate key value whenever the group changes.

If that's not true, please come back and clarify exactly what it is you need to do.
-craig

"You can never have too many knives" -- Logan Nine Fingers
Post Reply