Adding a column during concatenation
Moderators: chulett, rschirm, roy
Adding a column during concatenation
Not really a DataStage question as I don't want a job solution, but a UNIX one.
I have 16M records in 51 identical files that need to be concatenated together for processing. I'd like to add a single letter as a trailing pipe delimited column to each record during the process, if possible. I know which files need what letter, so don't worry about that. Just wondering if there's some way to maintain the speed of a straight command line 'cat *.xxx > file' operation and add a column to the end of each record at the same time.
Thanks!
I have 16M records in 51 identical files that need to be concatenated together for processing. I'd like to add a single letter as a trailing pipe delimited column to each record during the process, if possible. I know which files need what letter, so don't worry about that. Just wondering if there's some way to maintain the speed of a straight command line 'cat *.xxx > file' operation and add a column to the end of each record at the same time.
Thanks!
-craig
"You can never have too many knives" -- Logan Nine Fingers
"You can never have too many knives" -- Logan Nine Fingers
Look into sed, Craig. Something like
You can perform this command on the files, either before concatenating them or after.
As for the speed, you will have to test it out. Sed is pretty fast. But dont know with 51 x 2M records.
Code: Select all
sed -e 's/$/your character/g' infile
As for the speed, you will have to test it out. Sed is pretty fast. But dont know with 51 x 2M records.
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
Thanks - running some timing tests. I can pipe cats through sed:
Two birds, one stone. :D
Code: Select all
cat *.xxx | sed -e 's/$/|M' > fileout
-craig
"You can never have too many knives" -- Logan Nine Fingers
"You can never have too many knives" -- Logan Nine Fingers
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
How about using the wildcard as stdin for sed (so they work one at a time) and append-redirecting the output into your file? You could also parallelize the sed operations with a bit more scripting.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Err... meaning... this?ray.wurlod wrote:How about using the wildcard as stdin for sed (so they work one at a time) and append-redirecting the output into your file?
Code: Select all
sed -e 's/$/|M' *.xxx >> fileout
Edited to add: WTH?
The substitution syntax that worked fine in the other form won't parse in this one, in spite of me cribbing it directly from the man pages.
sed: Function s/$/|M cannot be parsed
Doesn't seem to matter what I put between the quotes, the dollar sign or pipe are not the issue here.
-craig
"You can never have too many knives" -- Logan Nine Fingers
"You can never have too many knives" -- Logan Nine Fingers
Try this:
Code: Select all
sed -e 's/$/|M/g' *.xxx >> fileout
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
When I transcribed the previous syntax I was using into the post I left off the trailing slash, which is why it could no longer be parsed. So now both are working and here are some timing tests for anyone interested:
This for 1.7M records. I'll go with the former, though either would be 'fine' in the long run. :D
Code: Select all
sed -e 's/$/|M/' *.xxx >> fileout 55 sec avg
cat *.xxx | sed -e 's/$/|M/' >> fileout 77 sec avg
-craig
"You can never have too many knives" -- Logan Nine Fingers
"You can never have too many knives" -- Logan Nine Fingers