Surrogate Key Perfromance

rajan.n · Post by **rajan.n** » Thu Dec 17, 2009 1:57 pm

Hi All,
I have a job running for 28 minutes. Just wanted to check if some one can help me in tuning my job.

Job design looks like this :

Dataset - Lkp(with DB2) - Xfm - DB2

This job was taking less than a minute , after I had included the Surrogate key in the Transformer this takes 28 minutes.

Surrogate Key generation options:

1st Job) Created a job with
DB2stage SurrogateKeyGeneratorstage with following options:
SourceTYpe : File
KeySOurceUpdateAction: Create and Update
ExecutionMode: Sequential
This job has loaded :22365500 records. (keys have holes )

2nd Job) This is the one takes 28 minutes.
Initial value :0 ( It has to take the values from File)
In Blocks of = 1
Execution Mode : Parallel

The max value from the database is '573793419' , Its supposed to start from '573793420', but it starts from some30237,30283,30318 ( took 3 values as sample.) these values are not existing in the table.

Can some one throw some light , how to get my job run faster ?

ray.wurlod · Post by **ray.wurlod** » Thu Dec 17, 2009 2:10 pm

Use a big block size. Don't be afraid to use 100000 or even 1000000. A block size of 1 has to read the state file for every record processed.

Set up a previous job to initialize the state file to the start value you want.

rajan.n · Post by **rajan.n** » Fri Dec 18, 2009 6:58 am

Thanks Ray , and Yes I can use the big number, but it wont generate sequence number as I am on 2 node structure.

Any more settings can be done to get it perform well ?

chulett · Post by **chulett** » Fri Dec 18, 2009 8:01 am

What does this mean?

rajan.n wrote:but it wont generate sequence number as I am on 2 node structure.

rajan.n · Post by **rajan.n** » Fri Dec 18, 2009 10:16 am

chulett wrote:What does this mean?

rajan.n wrote:but it wont generate sequence number as I am on 2 node structure.

Lets say If I give 1000 in the block, each node will be reserving the 1000 numbers at a time and if 1st node starts from 1,2,3... and the second node will starts from 1001,1002,1003... and so on. Please correct me If i am wrong , this is what its producing the records like.

Actually I am looking if there is any way of tuning the job by still having the block as 1.

ray.wurlod · Post by **ray.wurlod** » Fri Dec 18, 2009 3:07 pm

Surrogate keys by definition provide UNIQUENESS. Holes in the sequence are irrelevant.

With a block size of 1000 and starting at 1001, I would envisage (I have not tested it) that node 0 will generate 1001, 1002, ..., 2000, 3001, 3002, ... and node 1 will generate 2001, 2002, ..., 3000, 4001, 4002, ...

Each node makes a separate call to the state file to get (a block of) key values.