How to implement control break in PX?

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
PilotBaha
Premium Member
Premium Member
Posts: 202
Joined: Mon Jan 12, 2004 8:05 pm

How to implement control break in PX?

Post by PilotBaha »

Here's my problem:
- I have data entered in target database with the following structure :
KEY1 KEY2 KEY3
XXXX YYYYY 1
XXXX YYYYY 2

Think of it key1 as employee number and key2 division number or something similar; basically they do not change. On the other hand whenever there is a new entry for this division and employee the KEY3 goes up by 1.

I have data coming in that will have XXXX and YYYYY as keys. The challange is to get the numbers in KEY3 increased by 1 all the time only for the combination of XXXX YYYYs

Sounds simple, at first, but considering I will have more than one data coming in with XXXX and YYYY (i.e. the same data in KEY1 KEY2) the sequence will have to be incremented more than once.

I can capture the maximum value for the combination of XXXX / YYYY with SELECT MAX(KEY3) from ABCtable group by KEY1, KEY2. That's not the issue.

The biggest problem I have is how do I keep the value of KEY3 increase everytime I have more than one data coming in from the stream.

The second biggest problem is how do I make sure the sort sequence doesn't get out of sequence; since this is parallel processing. Would it help to chance the Execution mode to "Sequential"?

I hope I didn't make it sound the situation more complicated than it already is..

thanks..



I have new data coming in from a stream
Earthbound misfit I..
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Have you investigated the Surrogate Key Generator stage?

Otherwise you'd design it like you would a server job, substituting a Data Set stage for the Hashed File stage. And you would have to restrict execution to sequential mode, and guarantee that the Data Set is updated before the next row is processed. (Or take the overhead of a SELECT MAX... for every row processed.)
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
PilotBaha
Premium Member
Premium Member
Posts: 202
Joined: Mon Jan 12, 2004 8:05 pm

Post by PilotBaha »

No I haven't.. I also did some playing around and noticed that I can used RowProcCompareWith in BASIC transformer stage. If I can dictate the SK Stage to start the key numbers from a result of a lookup, then that would be a bomb.. :)

Thanks Ray..
Earthbound misfit I..
vmcburney
Participant
Posts: 3593
Joined: Thu Jan 23, 2003 5:25 pm
Location: Australia, Melbourne
Contact:

Post by vmcburney »

There is a FAQ on implementing a counter in a parallel transformer using the @PARTITIONNUM and @PARTITIONCOUNT variables.

Consider this design: sort by KEY1 and KEY2. In a lookup stage retrieve the KEY1, KEY2 and MAX(KEY3) from the database to return MAX_KEY3. This should be a very efficient lookup in a parallel job as the SQL will only be run once. In a transformer set up a counter variables as per the FAQ. Add some additional logic, if the KEY1 and KEY2 values change reset the counter to MAX_KEY3, otherwise increment it as per the FAQ.

This should give you an incrementing key for each key combination that is unique between partitions. There may be some missing values in the key sequence if one partition has more rows then the others.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

The partition number and partition count are also available in the Increment property of the Surrogate Key Generator and Column Generator (and, I believe, the Row Generator) stages.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
PilotBaha
Premium Member
Premium Member
Posts: 202
Joined: Mon Jan 12, 2004 8:05 pm

Post by PilotBaha »

vmcburney wrote: Consider this design: sort by KEY1 and KEY2. In a lookup stage retrieve the KEY1, KEY2 and MAX(KEY3) from the database to return MAX_KEY3. This should be a very efficient lookup in a parallel job as the SQL will only be run once.
I am already capturing the maximum value for each key column like this.
vmcburney wrote: In a transformer set up a counter variables as per the FAQ. Add some additional logic, if the KEY1 and KEY2 values change reset the counter to MAX_KEY3, otherwise increment it as per the FAQ.
Here's the problem? How do i capture the change in the KEY1 value? (For the sake of argument let's assume I have only one key). I was able to capture the change in the key value using the RowProcPreviousValue using the BASIC Transformer. Your FAQ solution doesn't allow this because it uses the PX transformer.
vmcburney wrote: This should give you an incrementing key for each key combination that is unique between partitions. There may be some missing values in the key sequence if one partition has more rows then the others.
I think we are not at the same page as far as my requirement is concerned. I have no problems establishing an ever increasing number for each data in the stream either using your method or using a BASIC transformer. My challange is to be able to reset the counter every time I have a control break. Using a stage variable doesn't allow me to reset the value. (If there is a way, I am not aware of it)

To help you visiualize the issue let me give you the following sample data:

Code: Select all

KEY       MAXVALUE
------     ------------
AAAA      33
AAAA      33
AAAA      33
AAAA      33
BBBB      67
CCCC      88
CCCC      88
What i want to have is

Code: Select all

KEY       SEQNUMBER
AAAA        34
AAAA        35
AAAA        36
AAAA        37
BBBB        68
CCCC        89
CCCC        90
thanks..
Earthbound misfit I..
vmcburney
Participant
Posts: 3593
Joined: Thu Jan 23, 2003 5:25 pm
Location: Australia, Melbourne
Contact:

Post by vmcburney »

If you partition and sort by KEY1 and KEY2 then all records belonging to a particular KEY1 / KEY2 value will be on the same partition in a sorted order. This means you can use the transformer stage variables to keep the previous value of KEY1/KEY2 and compare it to the incoming row.

Code: Select all

svNewKey       input.KEY1 : '|' : input.KEY2
svKeyChange  svNewKey <> svLastKey or svLastKey = ""
svLastKey       input.KEY1 : '|' : input.KEY2
svCounter      if KeyChange then input.MAX_KEY3 else svCounter + @NUMPARTITIONS
PilotBaha
Premium Member
Premium Member
Posts: 202
Joined: Mon Jan 12, 2004 8:05 pm

Post by PilotBaha »

vmcburney wrote:If you partition and sort by KEY1 and KEY2 then all records belonging to a particular KEY1 / KEY2 value will be on the same partition in a sorted order.
Done..

vmcburney wrote: This means you can use the transformer stage variables to keep the previous value of KEY1/KEY2 and compare it to the incoming row.

Code: Select all

svNewKey       input.KEY1 : '|' : input.KEY2
svKeyChange  svNewKey <> svLastKey or svLastKey = ""
svLastKey       input.KEY1 : '|' : input.KEY2
svCounter      if KeyChange then input.MAX_KEY3 else svCounter + @NUMPARTITIONS
In this case both svLastKey and svNewKey have the same values. I am not sure if LastKey will hold a different value than the NewKey..

As a mattar of fact in the middle of the reply i tested this and I found to be that both NewKey and LastKey return the same values.

Am I missing something?

thanks..
Earthbound misfit I..
PilotBaha
Premium Member
Premium Member
Posts: 202
Joined: Mon Jan 12, 2004 8:05 pm

Post by PilotBaha »

Opps placed the SVs in wrong sequence.. Now it's working..

Thanks for all the help..
Earthbound misfit I..
Post Reply