Update SK Files with last surrogate Key values

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
rumu
Participant
Posts: 286
Joined: Mon Jun 06, 2005 4:07 am

Update SK Files with last surrogate Key values

Post by rumu »

Hi All,

We have a problem that our SK files got deleted hence to retrieve that we created a job which is taking max surrogate key as parameter value and one Surrogate Key Generator stage is used that Creates and Updates the State file this parameter value.
It seems the files is updated but when we re ran Jobs ,SK values in the tables look strange.Few records are created with Max value+1 onwards but other started from 1 and 1001.
Example,If my table has 100 inserts ,then 50 are getting inserted with max sk Value+1 onwards but 25 are from 1to 25 and rest 15 from 1001 to 1025.
Why this is happening?
We used NextsurrogateKey() to generate surrogate key in Jobs which take the same State File that we updated with max value.
Any help is appreciated.
Thanks.
Rumu
IT Consultant
Mike
Premium Member
Premium Member
Posts: 1021
Joined: Sun Mar 03, 2002 6:01 pm
Location: Tampa, FL

Post by Mike »

There are quite a few nuances to the surrogate key generator stage. It actually includes some interesting flexibility. You would have to spend some time experimenting to learn all of the nuances.

I have a generic utility job that I use to sync a state file with the maximum key value for a table that is specified by a job parameter.

For it to behave the way that you seem to expect, do not use the Create and Update action. Simply use the Update action.

The trick is that you need an empty state file to begin with.

To that end, I use a before job ExexSH routine to execute:

Code: Select all

cat /dev/null > mystatefile
Mike
rumu
Participant
Posts: 286
Joined: Mon Jun 06, 2005 4:07 am

Post by rumu »

Thanks Mike. I tried using empty SK file to start with and Update action in SK generator stage but problem is still there.
We are running in 4 node config file so I am running my Fix job in 4 node only .
But when this State file is used (after update) it starts with 3 different ranges--
1
1000
and Original Max SK values.
Rumu
IT Consultant
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Hmmmm.... shouldn't you be running the "fix" job on a single node?
-craig

"You can never have too many knives" -- Logan Nine Fingers
rumu
Participant
Posts: 286
Joined: Mon Jun 06, 2005 4:07 am

Post by rumu »

Hi Chulett,

I tried running this job first in single node .Then when ran actual Load job using the state File I found strange behaviour of SK Files.
Rumu
IT Consultant
Mike
Premium Member
Premium Member
Posts: 1021
Joined: Sun Mar 03, 2002 6:01 pm
Location: Tampa, FL

Post by Mike »

Then you did not successfully start with an empty state file.

It will have a length of 0 when it is empty. After your update it should have a length of 16.

Use an ls -l mystatefile to check file size.

The state file is a binary file. It will have no record or field delimiters. Each record will be made up of two bigint fields (8 bytes each). Each range will be represented with a 16 byte record: 16 bytes = 1 range, 32 bytes = 2 ranges, 48 bytes = 3 ranges, etc.

Use an od -x mystatefile command to view the contents of the state file in hex. Alternatively, you could create a DataStage job with just the Surrogate Key Generator stage set to view and see the contents in the job log.

Now that you know how to view the contents of the state file, you can try different things out to learn the nuances.

Mike
Mike
Premium Member
Premium Member
Posts: 1021
Joined: Sun Mar 03, 2002 6:01 pm
Location: Tampa, FL

Post by Mike »

The Surrogate Key Generator stage results in a sequential operator, so the number of processing nodes shouldn't matter. Partitioning is not needed or wanted.

I still use my default parallel configuration, but all 3 stages in my generic utility job are set to run in sequential mode.

Mike
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

From what I recall, these "nuances" are explained in fairly complete detail in the documentation, yes?
-craig

"You can never have too many knives" -- Logan Nine Fingers
rumu
Participant
Posts: 286
Joined: Mon Jun 06, 2005 4:07 am

Post by rumu »

Hi Mike,

when I updated SK files it was size was 16 ..

Main issues ,is SK's in tablkes are like below:

1
2
3
1001
1002
1003
2001
2002
2003

so when I pass max of SK ,it pass 2003 and after updates when I re run this table with new 6 rows SK's for new Rows are as

2004
2005
2006
1
2
3
We are running in 4 node.
Please suggest.
Rumu
IT Consultant
Mike
Premium Member
Premium Member
Posts: 1021
Joined: Sun Mar 03, 2002 6:01 pm
Location: Tampa, FL

Post by Mike »

Can't really say what your issue might be without actually looking over your shoulder. But in your example, the originally updated 16 byte surrogate key file should show

Code: Select all

2004 2004
when viewed with the stage in view mode (if I remember that correctly).

Running on 4 nodes, I would expect you to end up with 4 ranges after usage unless you happen to have some data skew causing empty partitions. The ranges would depend on block size, but none should start with a value less than 2004.

Every node will get its own block of key values.

What properties do you have on the surrogate key tab in your transformer stage?

@Craig, I have not actually looked at the documentation, so can not say if everything that I've seen is in the documentation.

Mike
Post Reply