Get Max Value using Transaformer stage.

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
bi_fujitsu
Premium Member
Premium Member
Posts: 46
Joined: Tue Mar 20, 2007 3:30 am
Location: India

Get Max Value using Transaformer stage.

Post by bi_fujitsu »

Hello,

I am trying to get max value through transformer stage. I am using stage variable, which will be updated for each incoming row...Finally i will get the max value in my stage variable...but the question is Suppose my job is like this:-

Oracle Stage----------------->Transformer-------------->Sequential File.

I need only one record with max value in my sequential file...If we don't provide any constraint to the tranformer output link, it will pass all the record to the sequential file and here i need only one record(The Last record)

Can any one please let me know..How to get this logic..What will be the constraint do i need to put in the transformer output link.

Thanks in Advance

BI
sud
Premium Member
Premium Member
Posts: 366
Joined: Fri Dec 02, 2005 5:00 am
Location: Here I Am

Re: Get Max Value using Transaformer stage.

Post by sud »

First of all, you should use an aggregate stage instead of a transformer.

Ok, even if you stick to the transformer, in the transformer input, sort the data in descending order for the column for which you want maximum. Make the processing in the transformer sequential and put a constraint of inrownum = 1. That will give you what you want.

You don't need a stage variable at all. The way you are doing you have to know when you processed the last row which means you have to calculate the row count first.
It took me fifteen years to discover I had no talent for ETL, but I couldn't give it up because by that time I was too famous.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

:? Why not just select only the max record in Oracle?
-craig

"You can never have too many knives" -- Logan Nine Fingers
sud
Premium Member
Premium Member
Posts: 366
Joined: Fri Dec 02, 2005 5:00 am
Location: Here I Am

Post by sud »

chulett wrote::? Why not just select only the max record in Oracle?
Oooo yaaaa, was I smoking again ? :D :D
It took me fifteen years to discover I had no talent for ETL, but I couldn't give it up because by that time I was too famous.
juliank
Premium Member
Premium Member
Posts: 9
Joined: Tue Oct 30, 2007 12:52 pm

Post by juliank »

chulett wrote::? Why not just select only the max record in Oracle?

Great question - at least for me. I was looking for posts on 'max' and found this one.

I am new to DataStage. One of the questions I am pondering over is where to have most of the logic? Keep most in Oracle or have most in one place for streamlined maintenance and execution, .i.e. in DataStage with sources and targets only in Oracle. After all you can do most things in DataStage. Right now I am on a project where I am migrating a prior (complete and in production) implementation using Oracle OWB to DataStage. This question came up when, at least for now, I am experiencing slower execution times for corresponding modules in DataStage as compared to everything in Oracle. Perhaps it is a matter of tuning?
bi_fujitsu
Premium Member
Premium Member
Posts: 46
Joined: Tue Mar 20, 2007 3:30 am
Location: India

Post by bi_fujitsu »

Hi,

I can't go with the SQL in Oracle because I am generating that Sequence number in the transformer stage only using stage variable(Incrementing it by 1) and putting that information into the file only.

I am generating this sequence in the transformer, I can't go for the sorting the incoming data in the transformer too..

I can use the Aggregator stage but I don't want to use it...As i have looked into some Posts stating that we can impliment this scenrio in the transformar stage also by updating the stage variable.

Thanks
BI
sud
Premium Member
Premium Member
Posts: 366
Joined: Fri Dec 02, 2005 5:00 am
Location: Here I Am

Post by sud »

This is sounding like one of those "requirement gathering days". Start with a nice breakfast, then a great lunch. Then by afternoon things look so cool ! Then a coffee break and it's time to pack up when you ask one last clarification and bingo! You figure, whatever your client has talked about so far is not what he actually means !

So, Mr. BI, you want to know the maximum for a field whose value is nothing but a surrogate id that you only generate. Which, if I have ever smoked, will mean, in a job you are generating surrogate key column and in the same job want to note the last key(max) generated - right?
It took me fifteen years to discover I had no talent for ETL, but I couldn't give it up because by that time I was too famous.
bi_fujitsu
Premium Member
Premium Member
Posts: 46
Joined: Tue Mar 20, 2007 3:30 am
Location: India

Post by bi_fujitsu »

Hi Sud,

Exactly... i want the same thing to implement in my job ....:-)
sud
Premium Member
Premium Member
Posts: 366
Joined: Fri Dec 02, 2005 5:00 am
Location: Here I Am

Post by sud »

bi_fujitsu wrote:Hi Sud,

Exactly... i want the same thing to implement in my job ....:-)
8) 8)

Shakespeare wrote : A sentence is but a cheveril glove to a good wit: how quickly the wrong side may be turned outward!

Please don't loose your patience. Read on -

To generate a sequence, you either generate from 0 or 1 or from some known number say n. Now you increment only by 1 for each record. Hence, the max value for this generated number after processing all records will be nothing but n + x, where x = total row count. Since you are writing only one record to the file, you want to write only the value n + x to the field for the single record that you want to write. So, all we need to do is figure out the number x which will be given by the derivation count(1) in the SQL itself.

Now, probably you are writing the actual records with the generated number separately as well. This is also not a problem since you can get the rowcount in a separate column within the query itself. The important point is that there is no way but to figure out the row count - no matter how you do it - whether in the query/ or with aggregator (not through transformer - since the transformer does not give you a clue about which record is the last record). So, what's it gonna be :?: You decide for yourself.
It took me fifteen years to discover I had no talent for ETL, but I couldn't give it up because by that time I was too famous.
bi_fujitsu
Premium Member
Premium Member
Posts: 46
Joined: Tue Mar 20, 2007 3:30 am
Location: India

Post by bi_fujitsu »

We can get the Max value by updating the stage variable after processing all the rows coming from the source but the problem is now we need to find out the last record and pass it to the output link. I am looking for the constraints, Which will pass only last record to the output link.

Can we do it through the transformer stage or Do i need to include other stage like..Aggregator,Head or Tail stage after the transformer to implement it.
sud
Premium Member
Premium Member
Posts: 366
Joined: Fri Dec 02, 2005 5:00 am
Location: Here I Am

Post by sud »

You caught the right point, there is just no way that you can figure which is the last record in the transformer (without which you cannot write a constraint) unless you pre-calculate row count either through an aggregator or through the SQL.
It took me fifteen years to discover I had no talent for ETL, but I couldn't give it up because by that time I was too famous.
fc_user
Premium Member
Premium Member
Posts: 8
Joined: Tue Feb 26, 2008 5:28 pm

Post by fc_user »

Thanks for your all the valuable suggestions.
sud
Premium Member
Premium Member
Posts: 366
Joined: Fri Dec 02, 2005 5:00 am
Location: Here I Am

Post by sud »

In fact the best way to implement this is to define a column in the SQL as rownum like:

select rownum as ID,.....

and that will be your surrogate key generator, now either additionally define a SQL field as ocunt(1) which will be your row count itself or use an aggregator to get the row count. This decision depends completely on the SQL you have (and this goes on to answer your earlier question, where do we put which functionality in ETL. That question is the moot point in ETL - how do I get most performance wih the allowed resources. So, the nature of your query will determine if an aggregator is better or SQL strategy is better).
It took me fifteen years to discover I had no talent for ETL, but I couldn't give it up because by that time I was too famous.
bi_fujitsu
Premium Member
Premium Member
Posts: 46
Joined: Tue Mar 20, 2007 3:30 am
Location: India

Post by bi_fujitsu »

Thanks Sud......Appreciate your valuable input....
Post Reply