Reading pipe delimited; but some columns missing

karumudi7 · Post by **karumudi7** » Fri Oct 04, 2013 2:57 am

Hi,

I have a pipe delimited file with 3 columns. But for some records at the end there are some missing fields and pipes (delimiter) is also not availble.
How can I read that file without losing any records!
Sample File:

Code: Select all

 Name|Email|Country
 ABC|abc@gmail.com|USA
 DEF|
 GHXY||INDIA

Here the problem is with second record, while reading the file I am getting only 2 records. Second one is rejected with error "Email" column not found...

Need to load these 3 records into the table, if data is not available NULL will be loaded for the purticular column.

Thanks!

ArndW · Post by **ArndW** » Fri Oct 04, 2013 3:28 am

This was easier in Server, but there should be a way to do this in parallel jobs. Is this sequential file defined with variable field/record length and with a <cr><lf> or <lf> line terminator?

ArndW · Post by **ArndW** » Fri Oct 04, 2013 3:41 am

I experimented a bit and while there is no equivalent to the server "missing columns" mode, you can put in the following filter condition:

Code: Select all

awk 'BEGIN {FS = OFS = "|"} {$3 = $3; print}'

This will pad each input line to make exactly 3 pipe delimited fields; then you need to handle the empty strings appropriately in your job.

karumudi7 · Post by **karumudi7** » Fri Oct 04, 2013 4:05 am

ArndW wrote:I experimented a bit and while there is no equivalent to the server "missing columns" mode, you can put in the following filter condition:
Code: Select all
awk 'BEGIN {FS = OFS = "|"}  ...[/quote]

Unable to read, Premium content.  :(

arunkumarmm · Post by **arunkumarmm** » Fri Oct 04, 2013 4:37 am

Never mind about your column delimiter. Read the whole record as one single column and split it up inside the transformer using field.

ray.wurlod · Post by **ray.wurlod** » Fri Oct 04, 2013 5:15 am

Use a server Sequential File stage in a server Shared Container and make use of its excellent missing columns functionality.

Or, if it's a small to medium sized file, use a server job entirely.

rsomiset · Post by **rsomiset** » Sun Oct 06, 2013 10:17 am

Did you try to read the whole record as 1 single column in source stage and then split using Field function in a transformer stage using delimiter(pipe in this case)?