Problem in reading .csv file

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
dnat
Participant
Posts: 200
Joined: Thu Sep 06, 2007 2:06 am

Problem in reading .csv file

Post by dnat »

Hi,

I have a .csv format file and it comes in the following way

For ex it has 4 fields

a,345,cd,678
b,"6,878",cd,"9,870"
c,"7,989",fg,"7,880"
d,567,cd,989

The second and the fourth field are numeric fields and if it is more than 3 digit number the it comes with a comma but within double quotes.
When i try view data, i am not able to view the 2nd and 3rd record. The issue is because how the final delimiter string is stored.
If i give Final delimiter as end then 1st and 4th record are viewable. If make Final delimiter as \n then 2nd and 3rd record are viewable. Any idea how to handle it?
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Not seeing off the top of my head how this is related to the final delimiter. However, I'm curious how did you define the metadata? To me they should all be defined as strings and then the 2nd and 4th fields will need to be converted inside the job post-read...
-craig

"You can never have too many knives" -- Logan Nine Fingers
dnat
Participant
Posts: 200
Joined: Thu Sep 06, 2007 2:06 am

Post by dnat »

Yes, they are all defined as Strings. And i will be converting those fields as required inside the job.

But now the problem is with reading the file itself. While i view data in sequential file stage the 2nd and 3rd record get dropped if i specify the Final delimiter as "end" or "none". If i specify the Final delimiter as "\n" then 1st and 4th record get dropped.

The issue is definitely in the source file as they said that we will be sending the file with whatever formats they already have. But is there a way to handle this?
UCDI
Premium Member
Premium Member
Posts: 383
Joined: Mon Mar 21, 2016 2:00 pm

Post by UCDI »

several things you may be able to do...

- you could not use numeric commas (123456 is written 123456 not 123,456)
- you could use something other than , for the fields (tab, |, #, whatever)
- you could parse the entire line by hand

the first 2 options may mean writing a pre-processor to convert the files. This is probably a simple line or 2 of unix magic commands or could be a short C program or a datastage transformer (which becomes the 3rd option).
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Try using a Server Sequential File stage (perhaps in a Server Shared Container). This stage is much more flexible about how it handles multiple formats.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
dnat
Participant
Posts: 200
Joined: Thu Sep 06, 2007 2:06 am

Post by dnat »

Thanks for all your inputs. I added the "Record delimiter string" to DOS format and was able to read the file as it was.
Post Reply