How to read comma delimited files with missing column values

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
k.prashanthi
Participant
Posts: 18
Joined: Thu Dec 08, 2016 11:42 am

How to read comma delimited files with missing column values

Post by k.prashanthi »

Hello,

I have a new requirement to work with comma delimited files with no quotes for blank values but I have never come across with this kind of files to work with. For example:

D,0,SB,,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
D,1,IN,,0,0,0,0,90000000,0,0,0,0,0,0,0,0,0,0
D,4,FEDOUT,,0,0,0,0,90000000,0,0,0,0,0,0,0
D,6,EB,,0,0,0,0,0,0
D,7,DEP,LOC,0,0,0,0,90000000,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,90000000

How to read this kind of files??

TIA.
Last edited by k.prashanthi on Wed Jan 24, 2018 2:10 pm, edited 2 times in total.
PaulVL
Premium Member
Premium Member
Posts: 1315
Joined: Fri Dec 17, 2010 4:36 pm

Post by PaulVL »

Does the Sequential File stage with nullable columns not work for you?
k.prashanthi
Participant
Posts: 18
Joined: Thu Dec 08, 2016 11:42 am

Post by k.prashanthi »

Hi Paul,

No. It did not work. Any other ideas?
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Specifics, please. What exactly "did not work" about it?
-craig

"You can never have too many knives" -- Logan Nine Fingers
BillB
Premium Member
Premium Member
Posts: 48
Joined: Tue Nov 13, 2007 6:44 pm

Post by BillB »

The server Sequential File stage will read a CSV file like your example. Set 'Outputs-Format-Missing columns action' to 'Map empty string', and 'Outputs-Columns-Incomplete Column' to 'Replace'.

If you need your CSV file to be an input to a parallel job, consider writing a pre-processor server job that reads your example CSV and writes a second CSV that becomes the parallel job input.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Right, was going to get there. :wink:

Another option, a Server Shared Container in your PX job that just handles the sequential file read.
-craig

"You can never have too many knives" -- Logan Nine Fingers
k.prashanthi
Participant
Posts: 18
Joined: Thu Dec 08, 2016 11:42 am

Post by k.prashanthi »

BillB,

That would work but we don't have server job as we have Data Quality license only. Any other ideas on this?
PaulVL
Premium Member
Premium Member
Posts: 1315
Joined: Fri Dec 17, 2010 4:36 pm

Post by PaulVL »

if all else fails, you could modify the input file.

https://askubuntu.com/questions/20414/f ... g-commands

Command:
sed -i 's/original/new/g' file.txt


SO something like this:
sed -i 's/,,/,"",/g' file.txt
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

And the sed command that Paul suggested could even be a Filter in the Sequential File stage.
Or executed via a before-job subroutine.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply