Compare input column with Loop cache value in transformer
Moderators: chulett, rschirm, roy
Compare input column with Loop cache value in transformer
Hi
I want to use looping in a transformer stage, to aggregate data, and stop a total in the input record
But I need to compare a date held in the input record with one in each of the cached records I am looping through, to calculate the sum. Under certain conditions a record from the cache will be ignored.
Would storing the input value in stage variable be a way round it, or is there a simple solution?
Thanks
I want to use looping in a transformer stage, to aggregate data, and stop a total in the input record
But I need to compare a date held in the input record with one in each of the cached records I am looping through, to calculate the sum. Under certain conditions a record from the cache will be ignored.
Would storing the input value in stage variable be a way round it, or is there a simple solution?
Thanks
I'm not certain what you mean. DataStage doesn't really have a concept of "cached" records. Can you explain what you mean by cached records and how you plan to accomplish it?
I had assumed you meant that you were looping through an array of numbers contained in a single record. In that case they would be stored in a loop variable.
I had assumed you meant that you were looping through an array of numbers contained in a single record. In that case they would be stored in a loop variable.
I'm assuming they are referring to the transformer cache introduced in 8.5 along with looping and the SaveInputRecord() and GetInputRecord() functions. Yes?asorrell wrote:DataStage doesn't really have a concept of "cached" records.
-craig
"You can never have too many knives" -- Logan Nine Fingers
"You can never have too many knives" -- Logan Nine Fingers
Just to give a bit more background
I have the following data
The data is sorted by key1 , date
Key 1 is a set of records that , for each key2 record must be processed in the loop
for each key2 record i must summ the value, only if the date of the loop record is greater than the key2 record being processed
The output should be something like this
To my view it is made for using the transformer looping process
Another alternative was to produce a cartesian join on itself,
filter the records that were to be summed and aggregate, then use as lookup and update
This worked fine in the test version, however there are 8m records and it ran out of temp db space.
Which again will be a issue with the looping solution, as it will be doing a lot of processing
So I have another question, i presume that the SaveInputRecord() will save the whole incomming record,
in some cases these key1 sets of records
can be 2000 records, it would presumably make sense to reduce the size of the incomming data to just what is needed to do the processing
rather than the whole record that would have data to be used further down in the process and then merge the results back again
Just so I can understand how the looping in a transformer works, is the any document that describes exactly this process,
there are a number of examples around but none seem to explain the processing logic
Sorry for the long post , but thought I better explain exactly what i am trying to do
[Note - added code blocks to retain formatting in example - Andy]
I have the following data
Code: Select all
key1 key2 date value
123 111 1/1/12 500
123 113 1/2/12 300
123 115 1/3/12 600
123 117 1/4/12 100
125 121 1/1/12 500
125 133 1/2/12 300
126 165 1/3/12 600
126 167 1/4/12 100
Key 1 is a set of records that , for each key2 record must be processed in the loop
for each key2 record i must summ the value, only if the date of the loop record is greater than the key2 record being processed
The output should be something like this
Code: Select all
key1 key2 date value Sum
123 111 1/1/12 500 1000
123 113 1/2/12 300 700
123 115 1/3/12 600 100
123 117 1/4/12 100 0
125 121 1/1/12 500 300
125 133 1/2/12 300 0
126 165 1/3/12 600 100
126 167 1/4/12 100 0
Another alternative was to produce a cartesian join on itself,
filter the records that were to be summed and aggregate, then use as lookup and update
This worked fine in the test version, however there are 8m records and it ran out of temp db space.
Which again will be a issue with the looping solution, as it will be doing a lot of processing
So I have another question, i presume that the SaveInputRecord() will save the whole incomming record,
in some cases these key1 sets of records
can be 2000 records, it would presumably make sense to reduce the size of the incomming data to just what is needed to do the processing
rather than the whole record that would have data to be used further down in the process and then merge the results back again
Just so I can understand how the looping in a transformer works, is the any document that describes exactly this process,
there are a number of examples around but none seem to explain the processing logic
Sorry for the long post , but thought I better explain exactly what i am trying to do
[Note - added code blocks to retain formatting in example - Andy]
Craig - Thanks for reminding me about the Transformer cache, nice link also, very helpful.
Per request - some looping examples:
http://newtonapples.com/datastage-trans ... looping-1/
http://newtonapples.com/datastage-trans ... looping-2/
Several examples in IBM doc:
http://pic.dhe.ibm.com/infocenter/iisin ... oop_1.html
Per request - some looping examples:
http://newtonapples.com/datastage-trans ... looping-1/
http://newtonapples.com/datastage-trans ... looping-2/
Several examples in IBM doc:
http://pic.dhe.ibm.com/infocenter/iisin ... oop_1.html
Last edited by asorrell on Sat Dec 07, 2013 10:33 am, edited 1 time in total.
Ok - after sitting here looking at your code for a while, I finally figured out what your example was doing. I don't think you need looping, caching, or any of that.
Sort data by key1, date descending. Hash on key1.
Create these stage variables in this order:
Then just output svSum as your "Sum" value in your output record.
To summarize:
1) Reset running total / output sum when key1 changes
2) Only update the sum to be output with a new value from running total when the date changes.
I believe that will work and will be much faster. It also requires no caching or special buffering so it should be able to process 8 million records no problem. Let me know if that works, it was sort of fun to puzzle out.
Note: If this works, please mark topic solved as a "work-around", since it technically doesn't use looping to fix the issue...
Sort data by key1, date descending. Hash on key1.
Create these stage variables in this order:
- svSum initialized to 0
svRunningTotal initialized to 0
svKey1 initialized to ""
svCurrentDate initialized to ""
Code: Select all
svSum: If inputlink.key1 <> svKey1 Then 0 Else
If inputlink.key1 = svKey1 and inputlink.date <> svCurrentDate Then svRunningTotal Else svSum
svRunningTotal: If inputlink.key1 <> svKey1 Then inputlink.value Else inputlink.value + svRunningTotal
svKey1: inputlink.key1
svCurrentDate: inputlink.date
To summarize:
1) Reset running total / output sum when key1 changes
2) Only update the sum to be output with a new value from running total when the date changes.
I believe that will work and will be much faster. It also requires no caching or special buffering so it should be able to process 8 million records no problem. Let me know if that works, it was sort of fun to puzzle out.
Note: If this works, please mark topic solved as a "work-around", since it technically doesn't use looping to fix the issue...