Adding a column during concatenation

A forum for discussing DataStage<sup>®</sup> basics. If you're not sure where your question goes, start here.

Moderators: chulett, rschirm, roy

Post Reply
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Adding a column during concatenation

Post by chulett »

Not really a DataStage question as I don't want a job solution, but a UNIX one.

I have 16M records in 51 identical files that need to be concatenated together for processing. I'd like to add a single letter as a trailing pipe delimited column to each record during the process, if possible. I know which files need what letter, so don't worry about that. Just wondering if there's some way to maintain the speed of a straight command line 'cat *.xxx > file' operation and add a column to the end of each record at the same time.

Thanks!
-craig

"You can never have too many knives" -- Logan Nine Fingers
DSguru2B
Charter Member
Charter Member
Posts: 6854
Joined: Wed Feb 09, 2005 3:44 pm
Location: Houston, TX

Post by DSguru2B »

Look into sed, Craig. Something like

Code: Select all

sed -e 's/$/your character/g' infile 
You can perform this command on the files, either before concatenating them or after.
As for the speed, you will have to test it out. Sed is pretty fast. But dont know with 51 x 2M records.
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Thanks - running some timing tests. I can pipe cats through sed:

Code: Select all

cat *.xxx | sed -e 's/$/|M' > fileout
Two birds, one stone. :D
-craig

"You can never have too many knives" -- Logan Nine Fingers
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

This does work but adds some overhead. Cat'ing 2.6M records went from 23 seconds to 1 minute 23 seconds. I can live with that but still curious if there is something less impactive that could be done.
-craig

"You can never have too many knives" -- Logan Nine Fingers
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

How about using the wildcard as stdin for sed (so they work one at a time) and append-redirecting the output into your file? You could also parallelize the sed operations with a bit more scripting.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

Syncsort available?
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

kcbland wrote:Syncsort available?
Sadly, no. :cry:
-craig

"You can never have too many knives" -- Logan Nine Fingers
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

ray.wurlod wrote:How about using the wildcard as stdin for sed (so they work one at a time) and append-redirecting the output into your file?
Err... meaning... this?

Code: Select all

sed -e 's/$/|M' *.xxx >> fileout
I'll give it a shot.

Edited to add: WTH?

The substitution syntax that worked fine in the other form won't parse in this one, in spite of me cribbing it directly from the man pages.

sed: Function s/$/|M cannot be parsed

Doesn't seem to matter what I put between the quotes, the dollar sign or pipe are not the issue here. :?
-craig

"You can never have too many knives" -- Logan Nine Fingers
DSguru2B
Charter Member
Charter Member
Posts: 6854
Joined: Wed Feb 09, 2005 3:44 pm
Location: Houston, TX

Post by DSguru2B »

Try this:

Code: Select all

sed -e 's/$/|M/g' *.xxx >> fileout
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

:oops: When I transcribed the previous syntax I was using into the post I left off the trailing slash, which is why it could no longer be parsed. So now both are working and here are some timing tests for anyone interested:

Code: Select all

sed -e 's/$/|M/' *.xxx >> fileout          55 sec avg

cat *.xxx | sed -e 's/$/|M/' >> fileout    77 sec avg
This for 1.7M records. I'll go with the former, though either would be 'fine' in the long run. :D
-craig

"You can never have too many knives" -- Logan Nine Fingers
Post Reply