Reading Double quotes in CSV file

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
atulgoel
Participant
Posts: 84
Joined: Tue Feb 03, 2009 1:09 am
Location: Bangalore, India

Reading Double quotes in CSV file

Post by atulgoel »

Hi,

I have a data like below in CSV file:

1,N,"AAAAA L,BBBBBB R","CCCCCC DDDDD"" AAA FFF","NO 55""",FSFSFSD CVCVC

1 Column: 1
2 Column: N
3 Column: AAAAA L,BBBBBB R
4 Column: CCCCCC DDDDD" AAA FFF
5 Column: NO 55"
6 Column: FSFSFSD CVCVC

Can anyone suggest how to read the this csv file in datastage. I have tried reading using sequential file stage(Delimited as comma and Quotes as None), its not able to read 4 column.
Atul
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Unless something has changed that I'm not aware of, the advice I've seen here is to use a Server job and its sequential file stage, that or use it in a Server Shared Container in your PX job. For whatever reason, it can handle those embedded quotes much more better than the Parallel version can.
-craig

"You can never have too many knives" -- Logan Nine Fingers
UCDI
Premium Member
Premium Member
Posts: 383
Joined: Mon Mar 21, 2016 2:00 pm

Post by UCDI »

any chance the csv was passed through a program that changes real double quotes for something that looks like double quotes? MS products are notorious for this. Does not look/sound like the issue here but it is an issue at times.
atulgoel
Participant
Posts: 84
Joined: Tue Feb 03, 2009 1:09 am
Location: Bangalore, India

Post by atulgoel »

I am able to read it properly using server sequential file stage but the issue is I have many files to read. In server job I cannot pass schema file name. I have to hardcode the column names in server sequential file stage. So i cannot use single job to read multiple csv file.

Can you suggest some other way to achieve and read csv file by passing schema file (RCP)
Atul
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Not that I am aware of. Others may, however. While you wait you could always open a support case, see if they have any Words of Wisdom.
-craig

"You can never have too many knives" -- Logan Nine Fingers
rkashyap
Premium Member
Premium Member
Posts: 532
Joined: Fri Dec 02, 2011 12:02 pm
Location: Richmond VA

Post by rkashyap »

You can leverage Unix's capabilities to meet this requirement as given below:

In Seq file stage's properties
  • 1. Set value of FILTER to

    Code: Select all

    nawk -F\" 'BEGIN{OFS=FS;} {for(i=1;i<=NF;i=i+2){gsub(/,/,"~",$i);} print $0;}'
    This command will replace the "delimiter commas" with "~" (assuming ~ does not occur in data), while commas within quotes will be ignored.

    2. Set value of "Delimiter" to ~ (replacee delimiter).
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

... except they are not having issues with commas but rather embedded double quotes inside one of the double-quoted string data fields.
-craig

"You can never have too many knives" -- Logan Nine Fingers
rkashyap
Premium Member
Premium Member
Posts: 532
Joined: Fri Dec 02, 2011 12:02 pm
Location: Richmond VA

Re: Reading Double quotes in CSV file

Post by rkashyap »

atulgoel wrote:I have tried reading using sequential file stage(Delimited as comma and Quotes as None), its not able to read 4 column.
Craig, You have a point about embedded double quotes ... though with solution mentioned above, all columns are read, as embedded double quotes and commas etc no longer trip DataStage.
Post Reply