Vertical Pivot Dynamic Array Size

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
irshanuddin
Participant
Posts: 44
Joined: Wed May 27, 2009 3:01 pm

Vertical Pivot Dynamic Array Size

Post by irshanuddin »

So we have some data coming in where we need to do a vertical pivot and writing to a sequential file. But I would like to find out if I can make the file only as wide as the widest record, which means the array size for the pivot stage will need to be dynamic.
I think I can get to what it should be using an aggregator and doing a count on max number of records for the grouping, but is there a way to pass this number to the pivot stage on the fly?

Thanks!
irshanuddin
Participant
Posts: 44
Joined: Wed May 27, 2009 3:01 pm

Post by irshanuddin »

Anyone?
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Others will need to chime in here but I'm not aware of any way to make that Array Size in the stage 'dynamic', so thinking you'll need to handle whatever passes for the maximum number of records to pivot to columns. I wonder if transformer looping would be a better option here?

And then it seems to me you'll only be able to handle the "file only as wide as the widest record" part by outputting the file as a single long string with an appropriately built set of delimited values in it.
-craig

"You can never have too many knives" -- Logan Nine Fingers
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Or a Column Export stage, with its output appropriately right trimmed of delimiter characters.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
ShaneMuir
Premium Member
Premium Member
Posts: 508
Joined: Tue Jun 15, 2004 5:00 am
Location: London

Post by ShaneMuir »

As per chulett's suggestion, this can be done relatively easily with a transformer and looping.

You will however need to be able to provide the number of columns required to the output (also any column names if required). Using that number of columns you just build up a delimited list of values for each key column and output when required. If you output to a external target stage you can use a simple awk command to generate any number of differently structured files at the same time.
irshanuddin
Participant
Posts: 44
Joined: Wed May 27, 2009 3:01 pm

Post by irshanuddin »

This is how I ended up resolving this issue based on Craig's input. (Thanks!)

Created looping in transformer manually. I need to familiarize myself with the built-in functionality, so for now I went old school and created the looping using stage variables.
Created a variable that incremented each time the same key came through and kept on adding hardcoded values as column headers with the counter appended at the end to create a field that has values such as vehicle1,vehicle_vin1 etc. Stored this in another variable and then appended this to the next value if the key stayed the same. So if the same key came through, the next value for this field would be vehicl1,vehicle_vin1,vehicle2,vehicle_vin2 etc.
And same thing again for actual values of these records.
So the output of this transformer is 2 columns, one with all the headers for each key field and all the values for each key field.

After all the records have been cycled through, I created a fork design to send the data values through as is and another stream to get only the largest header.

One stream does a sort on a static field and does a max on the counter and a max on header with preserve type property since this is a varchar field.

After this, I created some dummy fields to make them appear the same input and passed both to a funnel.
Also added a field to both streams called sort and assigned a 1 to the header stream sort field and a higher number to the 2nd stream sort field so that the funnel always picks the header as the first record.

This is then output to a sequential file.
The only property I am unable to provide in the output is text qualifiers for the fields, since this would lump the entire max header into one field and all the data values into 1 field.

I was unable to use the column import stage to do the split because the maximum number of columns will vary.

I hope this helps someone.
Thanks for everyone's help.
Post Reply