Get Max Value using Transaformer stage.
Moderators: chulett, rschirm, roy
-
- Premium Member
- Posts: 46
- Joined: Tue Mar 20, 2007 3:30 am
- Location: India
Get Max Value using Transaformer stage.
Hello,
I am trying to get max value through transformer stage. I am using stage variable, which will be updated for each incoming row...Finally i will get the max value in my stage variable...but the question is Suppose my job is like this:-
Oracle Stage----------------->Transformer-------------->Sequential File.
I need only one record with max value in my sequential file...If we don't provide any constraint to the tranformer output link, it will pass all the record to the sequential file and here i need only one record(The Last record)
Can any one please let me know..How to get this logic..What will be the constraint do i need to put in the transformer output link.
Thanks in Advance
BI
I am trying to get max value through transformer stage. I am using stage variable, which will be updated for each incoming row...Finally i will get the max value in my stage variable...but the question is Suppose my job is like this:-
Oracle Stage----------------->Transformer-------------->Sequential File.
I need only one record with max value in my sequential file...If we don't provide any constraint to the tranformer output link, it will pass all the record to the sequential file and here i need only one record(The Last record)
Can any one please let me know..How to get this logic..What will be the constraint do i need to put in the transformer output link.
Thanks in Advance
BI
Re: Get Max Value using Transaformer stage.
First of all, you should use an aggregate stage instead of a transformer.
Ok, even if you stick to the transformer, in the transformer input, sort the data in descending order for the column for which you want maximum. Make the processing in the transformer sequential and put a constraint of inrownum = 1. That will give you what you want.
You don't need a stage variable at all. The way you are doing you have to know when you processed the last row which means you have to calculate the row count first.
Ok, even if you stick to the transformer, in the transformer input, sort the data in descending order for the column for which you want maximum. Make the processing in the transformer sequential and put a constraint of inrownum = 1. That will give you what you want.
You don't need a stage variable at all. The way you are doing you have to know when you processed the last row which means you have to calculate the row count first.
It took me fifteen years to discover I had no talent for ETL, but I couldn't give it up because by that time I was too famous.
chulett wrote: Why not just select only the max record in Oracle?
Great question - at least for me. I was looking for posts on 'max' and found this one.
I am new to DataStage. One of the questions I am pondering over is where to have most of the logic? Keep most in Oracle or have most in one place for streamlined maintenance and execution, .i.e. in DataStage with sources and targets only in Oracle. After all you can do most things in DataStage. Right now I am on a project where I am migrating a prior (complete and in production) implementation using Oracle OWB to DataStage. This question came up when, at least for now, I am experiencing slower execution times for corresponding modules in DataStage as compared to everything in Oracle. Perhaps it is a matter of tuning?
-
- Premium Member
- Posts: 46
- Joined: Tue Mar 20, 2007 3:30 am
- Location: India
Hi,
I can't go with the SQL in Oracle because I am generating that Sequence number in the transformer stage only using stage variable(Incrementing it by 1) and putting that information into the file only.
I am generating this sequence in the transformer, I can't go for the sorting the incoming data in the transformer too..
I can use the Aggregator stage but I don't want to use it...As i have looked into some Posts stating that we can impliment this scenrio in the transformar stage also by updating the stage variable.
Thanks
BI
I can't go with the SQL in Oracle because I am generating that Sequence number in the transformer stage only using stage variable(Incrementing it by 1) and putting that information into the file only.
I am generating this sequence in the transformer, I can't go for the sorting the incoming data in the transformer too..
I can use the Aggregator stage but I don't want to use it...As i have looked into some Posts stating that we can impliment this scenrio in the transformar stage also by updating the stage variable.
Thanks
BI
This is sounding like one of those "requirement gathering days". Start with a nice breakfast, then a great lunch. Then by afternoon things look so cool ! Then a coffee break and it's time to pack up when you ask one last clarification and bingo! You figure, whatever your client has talked about so far is not what he actually means !
So, Mr. BI, you want to know the maximum for a field whose value is nothing but a surrogate id that you only generate. Which, if I have ever smoked, will mean, in a job you are generating surrogate key column and in the same job want to note the last key(max) generated - right?
So, Mr. BI, you want to know the maximum for a field whose value is nothing but a surrogate id that you only generate. Which, if I have ever smoked, will mean, in a job you are generating surrogate key column and in the same job want to note the last key(max) generated - right?
It took me fifteen years to discover I had no talent for ETL, but I couldn't give it up because by that time I was too famous.
-
- Premium Member
- Posts: 46
- Joined: Tue Mar 20, 2007 3:30 am
- Location: India
bi_fujitsu wrote:Hi Sud,
Exactly... i want the same thing to implement in my job ....
Shakespeare wrote : A sentence is but a cheveril glove to a good wit: how quickly the wrong side may be turned outward!
Please don't loose your patience. Read on -
To generate a sequence, you either generate from 0 or 1 or from some known number say n. Now you increment only by 1 for each record. Hence, the max value for this generated number after processing all records will be nothing but n + x, where x = total row count. Since you are writing only one record to the file, you want to write only the value n + x to the field for the single record that you want to write. So, all we need to do is figure out the number x which will be given by the derivation count(1) in the SQL itself.
Now, probably you are writing the actual records with the generated number separately as well. This is also not a problem since you can get the rowcount in a separate column within the query itself. The important point is that there is no way but to figure out the row count - no matter how you do it - whether in the query/ or with aggregator (not through transformer - since the transformer does not give you a clue about which record is the last record). So, what's it gonna be You decide for yourself.
It took me fifteen years to discover I had no talent for ETL, but I couldn't give it up because by that time I was too famous.
-
- Premium Member
- Posts: 46
- Joined: Tue Mar 20, 2007 3:30 am
- Location: India
We can get the Max value by updating the stage variable after processing all the rows coming from the source but the problem is now we need to find out the last record and pass it to the output link. I am looking for the constraints, Which will pass only last record to the output link.
Can we do it through the transformer stage or Do i need to include other stage like..Aggregator,Head or Tail stage after the transformer to implement it.
Can we do it through the transformer stage or Do i need to include other stage like..Aggregator,Head or Tail stage after the transformer to implement it.
You caught the right point, there is just no way that you can figure which is the last record in the transformer (without which you cannot write a constraint) unless you pre-calculate row count either through an aggregator or through the SQL.
It took me fifteen years to discover I had no talent for ETL, but I couldn't give it up because by that time I was too famous.
In fact the best way to implement this is to define a column in the SQL as rownum like:
select rownum as ID,.....
and that will be your surrogate key generator, now either additionally define a SQL field as ocunt(1) which will be your row count itself or use an aggregator to get the row count. This decision depends completely on the SQL you have (and this goes on to answer your earlier question, where do we put which functionality in ETL. That question is the moot point in ETL - how do I get most performance wih the allowed resources. So, the nature of your query will determine if an aggregator is better or SQL strategy is better).
select rownum as ID,.....
and that will be your surrogate key generator, now either additionally define a SQL field as ocunt(1) which will be your row count itself or use an aggregator to get the row count. This decision depends completely on the SQL you have (and this goes on to answer your earlier question, where do we put which functionality in ETL. That question is the moot point in ETL - how do I get most performance wih the allowed resources. So, the nature of your query will determine if an aggregator is better or SQL strategy is better).
It took me fifteen years to discover I had no talent for ETL, but I couldn't give it up because by that time I was too famous.
-
- Premium Member
- Posts: 46
- Joined: Tue Mar 20, 2007 3:30 am
- Location: India