Multiple records based on value in column

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
esivaprasad
Participant
Posts: 135
Joined: Tue Dec 09, 2008 10:35 am

Multiple records based on value in column

Post by esivaprasad »

Hi ,

we are using datastage 8.1.
my requirement is :

Input is like this
customer|Count
A|4
B|1
C|2
Based on customer count value , need to generate that many output columns for the same customer.

output required as
Column1 | Column2
A|1
A|1
A|1
A|1
B|1
C|1
C|1
please help me how to get output as requested format.

Thank you.
Siva
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Well... first thing I would have suggested was 'transformer looping' but I'm pretty sure it's not available in your ancient version. What is the source - a file, a database table or something you are willing to load into a table? Thinking there's a SQL solution available if so. Or perhaps someone else has a clever solution... perhaps something written in C++ by UCDI? Command line filter Perl shenanigans? :wink:
-craig

"You can never have too many knives" -- Logan Nine Fingers
esivaprasad
Participant
Posts: 135
Joined: Tue Dec 09, 2008 10:35 am

Post by esivaprasad »

My source and target are both CSV file format.
Tranformer looping option not available in 8.1 version.


Thank you.
Siva
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Right. Ancient. :wink:

Still curious if a database is an option, if that's where your skills are - perhaps even an Oracle external table. I'd also be curious what you've tried so far.
-craig

"You can never have too many knives" -- Logan Nine Fingers
UCDI
Premium Member
Premium Member
Posts: 383
Joined: Mon Mar 21, 2016 2:00 pm

Post by UCDI »

Im open to C if you need it, but I would personally attack this one in excel if at all possible (to the point that I might even consider elimination of datastage entirely depending on the circumstances).

Another idea.. .what is the largest # that can appear? If its less than 10, you can just split out a transformer and manufacture up to that many rows simply. A little fragile if the # can get big in the future, but sometimes, you know it won't...?

I have found that the VB transformer using a VB routine is pretty much equally fast to C for simple tasks. C is marginally faster on encrypt or bit manipulation or hard math, but string processing, its a wash.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Unless I'm misremembering how things work (or just confused) I think you meant:

- The Filter stage limits the output using the condition COUNT <= ROWNUM

Yes?
-craig

"You can never have too many knives" -- Logan Nine Fingers
Thomas.B
Participant
Posts: 63
Joined: Thu Apr 09, 2015 6:40 am
Location: France - Nantes

Post by Thomas.B »

We need to send the record to the output unless the row number is higher than the number of rows to generate, in my head, my logic is right.

Anyway, Siva will test it if he decides that is the right solution.
BI Consultant
DSXConsult
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Ah... guess I had the two values backwards in my head. D'oh.
-craig

"You can never have too many knives" -- Logan Nine Fingers
rkashyap
Premium Member
Premium Member
Posts: 532
Joined: Fri Dec 02, 2011 12:02 pm
Location: Richmond VA

Post by rkashyap »

Another option could be leveraging Unix ... invoke following command in either External Source stage or using Filter option of Sequential file stage

Code: Select all

awk 'BEGIN {FS="|";RS="\n"};  (NR> 1) {for(i=0; i<$2; ++i) {print $1}}' <inputfile>
Post Reply