Surrogate Key Perfromance

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
rajan.n
Premium Member
Premium Member
Posts: 96
Joined: Mon Oct 09, 2006 7:47 am

Surrogate Key Perfromance

Post by rajan.n »

Hi All,
I have a job running for 28 minutes. Just wanted to check if some one can help me in tuning my job.

Job design looks like this :


Dataset - Lkp(with DB2) - Xfm - DB2

This job was taking less than a minute , after I had included the Surrogate key in the Transformer this takes 28 minutes.

Surrogate Key generation options:

1st Job) Created a job with
DB2stage SurrogateKeyGeneratorstage with following options:
SourceTYpe : File
KeySOurceUpdateAction: Create and Update
ExecutionMode: Sequential
This job has loaded :22365500 records. (keys have holes )

2nd Job) This is the one takes 28 minutes.
Initial value :0 ( It has to take the values from File)
In Blocks of = 1
Execution Mode : Parallel

The max value from the database is '573793419' , Its supposed to start from '573793420', but it starts from some30237,30283,30318 ( took 3 values as sample.) these values are not existing in the table.

Can some one throw some light , how to get my job run faster ?
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Use a big block size. Don't be afraid to use 100000 or even 1000000. A block size of 1 has to read the state file for every record processed.

Set up a previous job to initialize the state file to the start value you want.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
rajan.n
Premium Member
Premium Member
Posts: 96
Joined: Mon Oct 09, 2006 7:47 am

Post by rajan.n »

Thanks Ray , and Yes I can use the big number, but it wont generate sequence number as I am on 2 node structure.

Any more settings can be done to get it perform well ?
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

What does this mean? :?
rajan.n wrote:but it wont generate sequence number as I am on 2 node structure.
-craig

"You can never have too many knives" -- Logan Nine Fingers
rajan.n
Premium Member
Premium Member
Posts: 96
Joined: Mon Oct 09, 2006 7:47 am

Post by rajan.n »

chulett wrote:What does this mean? :?
rajan.n wrote:but it wont generate sequence number as I am on 2 node structure.
Lets say If I give 1000 in the block, each node will be reserving the 1000 numbers at a time and if 1st node starts from 1,2,3... and the second node will starts from 1001,1002,1003... and so on. Please correct me If i am wrong , this is what its producing the records like.

Actually I am looking if there is any way of tuning the job by still having the block as 1.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Surrogate keys by definition provide UNIQUENESS. Holes in the sequence are irrelevant.

With a block size of 1000 and starting at 1001, I would envisage (I have not tested it) that node 0 will generate 1001, 1002, ..., 2000, 3001, 3002, ... and node 1 will generate 2001, 2002, ..., 3000, 4001, 4002, ...

Each node makes a separate call to the state file to get (a block of) key values.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply