Page 1 of 1

Reading Double quotes in CSV file

Posted: Fri Sep 08, 2017 3:35 am
by atulgoel
Hi,

I have a data like below in CSV file:

1,N,"AAAAA L,BBBBBB R","CCCCCC DDDDD"" AAA FFF","NO 55""",FSFSFSD CVCVC

1 Column: 1
2 Column: N
3 Column: AAAAA L,BBBBBB R
4 Column: CCCCCC DDDDD" AAA FFF
5 Column: NO 55"
6 Column: FSFSFSD CVCVC

Can anyone suggest how to read the this csv file in datastage. I have tried reading using sequential file stage(Delimited as comma and Quotes as None), its not able to read 4 column.

Posted: Fri Sep 08, 2017 7:12 am
by chulett
Unless something has changed that I'm not aware of, the advice I've seen here is to use a Server job and its sequential file stage, that or use it in a Server Shared Container in your PX job. For whatever reason, it can handle those embedded quotes much more better than the Parallel version can.

Posted: Fri Sep 08, 2017 8:04 am
by UCDI
any chance the csv was passed through a program that changes real double quotes for something that looks like double quotes? MS products are notorious for this. Does not look/sound like the issue here but it is an issue at times.

Posted: Fri Sep 08, 2017 9:53 am
by atulgoel
I am able to read it properly using server sequential file stage but the issue is I have many files to read. In server job I cannot pass schema file name. I have to hardcode the column names in server sequential file stage. So i cannot use single job to read multiple csv file.

Can you suggest some other way to achieve and read csv file by passing schema file (RCP)

Posted: Fri Sep 08, 2017 11:45 am
by chulett
Not that I am aware of. Others may, however. While you wait you could always open a support case, see if they have any Words of Wisdom.

Posted: Fri Sep 08, 2017 12:48 pm
by rkashyap
You can leverage Unix's capabilities to meet this requirement as given below:

In Seq file stage's properties
  • 1. Set value of FILTER to

    Code: Select all

    nawk -F\" 'BEGIN{OFS=FS;} {for(i=1;i<=NF;i=i+2){gsub(/,/,"~",$i);} print $0;}'
    This command will replace the "delimiter commas" with "~" (assuming ~ does not occur in data), while commas within quotes will be ignored.

    2. Set value of "Delimiter" to ~ (replacee delimiter).

Posted: Fri Sep 08, 2017 12:56 pm
by chulett
... except they are not having issues with commas but rather embedded double quotes inside one of the double-quoted string data fields.

Re: Reading Double quotes in CSV file

Posted: Fri Sep 08, 2017 2:08 pm
by rkashyap
atulgoel wrote:I have tried reading using sequential file stage(Delimited as comma and Quotes as None), its not able to read 4 column.
Craig, You have a point about embedded double quotes ... though with solution mentioned above, all columns are read, as embedded double quotes and commas etc no longer trip DataStage.