Recursive Fold ! Tricky

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
rameshrr3
Premium Member
Premium Member
Posts: 609
Joined: Mon May 10, 2004 3:32 am
Location: BRENTWOOD, TN

Recursive Fold ! Tricky

Post by rameshrr3 »

I have a Bunch of Rows grouped by a key and multiple such bunches.

The requirement

For a given Grouping Key , start the first group at the lowest date and and it at the row where the date is highest value less than 365 - store in first grouping bucket
next start with the row that is the First Value date > 365 and iterate upto 365 days further and bucket into the second grouping
and so on and so forth
Im not sure if Transformer looping would help me achieve this, Modulo groups calculated from the earliest date of the groups range would not work because the start date of iteration n+1 would reset beyond an absolute 365 day cutoff - but can still make acrude solution.

The only thing I have in mind here is to run the first iteration through a transformer and write a 'reject' file that would overwrite the input file next time via after job command, and continue looping through residual records till no records exist- this of course calls for a sequencer with a loop which is do able, but is there anyway of doing it via a Transformer looping itself ?

Alternatively I thought an array of all dates in a group would help me create recursive Folds once I pivot all dates into a Variable Length Vector via a combine records stage & Split Vector and column export the vector column into a delimited string list where the Field Function and Dcount would help me achive certain results , but that's just a thought - have people in the forum have any better ideas to run with this?
UCDI
Premium Member
Premium Member
Posts: 383
Joined: Mon Mar 21, 2016 2:00 pm

Post by UCDI »

do you need to handle leap years?

that aside, transformer loops create a record per loop, and are handy but you already have all your records as you want them (right?) and are just grouping them so it may not be a clean solution (you can toss extra records, but its unnecessary work best avoided to generate stuff to throw it away).

I am thinking you just need stage variables?
something like

sv1 = initial date
sv2 = (date math to get this row - sv1 in days)
sv 3 = (sv2 integer division by 365?) so day 1 to 365 is 1.0, day 366 is 2.0, etc... group by THESE values? (and the key, of course).

would that work? It seems likely you could also work on your extract to get the initial date and do the date math in the extract so you just have it available to group by from the sql? this may or may not be cleaner/easier than datastage doing it, depending on the complexity involved.
rameshrr3
Premium Member
Premium Member
Posts: 609
Joined: Mon May 10, 2004 3:32 am
Location: BRENTWOOD, TN

Post by rameshrr3 »

I did not have tim eto work through a generic algorithm though a sequencer loop processing a constantly reducing file would have worked logically speaking.

The business only wants to load 3 years of data at a time , so I created a job that runs 5 iterations and writes an overflow bucket if it did not complete in 5 cascading transformer stages , the residues from one feeding the next , and so far even 3 transformers are deemed sufficient since no residues go on to the 4th cascade.
The stage variables are used to persist a group sequence and minimum interaction date , along with another that calculates relative offset of subsequent dates from locally cached minimum ( local to the transformer in cascade) for an iteration and dates exceeding 365 day cutoff are pushed forward to the next cascade that repeats the logic on its residual input, dates in each iteration cluster are pivoted vertically and written to an output record with other vertically pivoted attributes, the pivot done on the grouping key with the start date of the iteration. No leap year considerations are in effect as beginning and end of year folds is not coincident with a business or Gregorian calendar date.
Post Reply