Page 1 of 1

Problem in reading .csv file

Posted: Thu Nov 09, 2017 4:25 pm
by dnat
Hi,

I have a .csv format file and it comes in the following way

For ex it has 4 fields

a,345,cd,678
b,"6,878",cd,"9,870"
c,"7,989",fg,"7,880"
d,567,cd,989

The second and the fourth field are numeric fields and if it is more than 3 digit number the it comes with a comma but within double quotes.
When i try view data, i am not able to view the 2nd and 3rd record. The issue is because how the final delimiter string is stored.
If i give Final delimiter as end then 1st and 4th record are viewable. If make Final delimiter as \n then 2nd and 3rd record are viewable. Any idea how to handle it?

Posted: Thu Nov 09, 2017 5:16 pm
by chulett
Not seeing off the top of my head how this is related to the final delimiter. However, I'm curious how did you define the metadata? To me they should all be defined as strings and then the 2nd and 4th fields will need to be converted inside the job post-read...

Posted: Thu Nov 09, 2017 5:28 pm
by dnat
Yes, they are all defined as Strings. And i will be converting those fields as required inside the job.

But now the problem is with reading the file itself. While i view data in sequential file stage the 2nd and 3rd record get dropped if i specify the Final delimiter as "end" or "none". If i specify the Final delimiter as "\n" then 1st and 4th record get dropped.

The issue is definitely in the source file as they said that we will be sending the file with whatever formats they already have. But is there a way to handle this?

Posted: Fri Nov 10, 2017 10:01 am
by UCDI
several things you may be able to do...

- you could not use numeric commas (123456 is written 123456 not 123,456)
- you could use something other than , for the fields (tab, |, #, whatever)
- you could parse the entire line by hand

the first 2 options may mean writing a pre-processor to convert the files. This is probably a simple line or 2 of unix magic commands or could be a short C program or a datastage transformer (which becomes the 3rd option).

Posted: Fri Nov 10, 2017 5:12 pm
by ray.wurlod
Try using a Server Sequential File stage (perhaps in a Server Shared Container). This stage is much more flexible about how it handles multiple formats.

Posted: Mon Nov 13, 2017 12:05 pm
by dnat
Thanks for all your inputs. I added the "Record delimiter string" to DOS format and was able to read the file as it was.