Generate sequence number using parallel job

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
rish0201
Participant
Posts: 2
Joined: Sun Jun 18, 2017 11:04 am

Generate sequence number using parallel job

Post by rish0201 »

I want to generate sequence numbers starting from 1 every time the job run.
Number should be generated in sequence without gaps in 4 node job.

Thing i have already tried
1) (@inrownum - 1) * @numpartitions + @partitionnum + 1
The above formula only works in 2 node.

2) Column generator - It creates gaps between numbers in 4 nodes and the numbers are not in sequence.

3) Surrogate Key - It takes the last highest value and starts from there. I want it to start from 1 for each job run. Is there any way to reset the key ?

Any suggestions is welcome.

Thanks in Advance.
Software Engineer
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

In what way does the given expression "not work" with four nodes?
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
rish0201
Participant
Posts: 2
Joined: Sun Jun 18, 2017 11:04 am

Post by rish0201 »

In 4 nodes the numbers are not generated in sequence. It leaves gaps in between.
Software Engineer
Mike
Premium Member
Premium Member
Posts: 1021
Joined: Sun Mar 03, 2002 6:01 pm
Location: Tampa, FL

Post by Mike »

Did you use round robin partitioning method?

Mike
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Re: Generate sequence number using parallel job

Post by chulett »

rish0201 wrote:I want it to start from 1 for each job run.
Then it would seem you do not have a need for any kind of surrogate key 'generator' but simply need a counter. At its most simple, you could use a stage variable for that, initial value of zero, increment for each record... as long as you handled the partitioning properly.
-craig

"You can never have too many knives" -- Logan Nine Fingers
Neelabh
Participant
Posts: 3
Joined: Wed Jun 14, 2017 2:40 am

Re: Generate sequence number using parallel job

Post by Neelabh »

As inferred by @Mike as well this method requires data to be evenly distributed across all partition hence it works well with Round Robin partitioning. If not it will generate sequence with "gaps" (i.e. what you are observing).

Use other methods as suggested by craig.

1) Use Stage variable.

2) If source is seq. file stage generate row number.

3) If database use row_number function.

4) Enable transformer stage to run on a single node and use @INROWNUM
- Neelabh Barve

- We aspire! Universe Conspires!!
boxtoby
Premium Member
Premium Member
Posts: 138
Joined: Mon Mar 13, 2006 5:11 pm
Location: UK

Post by boxtoby »

Hi,

You weren't far out with your original idea.

Here's what I use: (@PARTITIONNUM+1)*10 : @OUTROWNUM

Use @OUTROWNUM rather than @INROWNUM to aviod problems with funnel stages etc.

Hope that helps!

Cheers,
Bob.
Bob Oxtoby
Post Reply