DSXchange: DataStage and IBM Websphere Data Integration Forum
View next topic
View previous topic
Add To Favorites
Author Message
atulgoel
Participant



Joined: 03 Feb 2009
Posts: 73
Location: Bangalore, India
Points: 820

Post Posted: Fri Sep 08, 2017 3:35 am Reply with quote    Back to top    

DataStage® Release: 11x
Job Type: Parallel
OS: Unix
Hi,

I have a data like below in CSV file:

1,N,"AAAAA L,BBBBBB R","CCCCCC DDDDD"" AAA FFF","NO 55""",FSFSFSD CVCVC

1 Column: 1
2 Column: N
3 Column: AAAAA L,BBBBBB R
4 Column: CCCCCC DDDDD" AAA FFF
5 Column: NO 55"
6 Column: FSFSFSD CVCVC

Can anyone suggest how to read the this csv file in datastage. I have tried reading using sequential file stage(Delimited as comma and Quotes as None), its not able to read 4 column.

_________________
Atul
chulett

Premium Poster


since January 2006

Group memberships:
Premium Members, Inner Circle, Server to Parallel Transition Group

Joined: 12 Nov 2002
Posts: 42215
Location: Denver, CO
Points: 216749

Post Posted: Fri Sep 08, 2017 7:12 am Reply with quote    Back to top    

Unless something has changed that I'm not aware of, the advice I've seen here is to use a Server job and its sequential file stage, that or use it in a Server Shared Container in your PX job. For whatever reason, it can handle those embedded quotes much more better than the Parallel version can.

_________________
-craig

Watch out where the huskies go and don't you eat that yellow snow
Rate this response:  
UCDI



Group memberships:
Premium Members

Joined: 21 Mar 2016
Posts: 267

Points: 2686

Post Posted: Fri Sep 08, 2017 8:04 am Reply with quote    Back to top    

any chance the csv was passed through a program that changes real double quotes for something that looks like double quotes? MS products are notorious for this. Does not look/sound like the issue here but it is an issue at times.
Rate this response:  
atulgoel
Participant



Joined: 03 Feb 2009
Posts: 73
Location: Bangalore, India
Points: 820

Post Posted: Fri Sep 08, 2017 9:53 am Reply with quote    Back to top    

I am able to read it properly using server sequential file stage but the issue is I have many files to read. In server job I cannot pass schema file name. I have to hardcode the column names in server sequential file stage. So i cannot use single job to read multiple csv file.

Can you suggest some other way to achieve and read csv file by passing schema file (RCP)

_________________
Atul
Rate this response:  
chulett

Premium Poster


since January 2006

Group memberships:
Premium Members, Inner Circle, Server to Parallel Transition Group

Joined: 12 Nov 2002
Posts: 42215
Location: Denver, CO
Points: 216749

Post Posted: Fri Sep 08, 2017 11:45 am Reply with quote    Back to top    

Not that I am aware of. Others may, however. While you wait you could always open a support case, see if they have any Words of Wisdom.

_________________
-craig

Watch out where the huskies go and don't you eat that yellow snow
Rate this response:  
rkashyap



Group memberships:
Premium Members

Joined: 02 Dec 2011
Posts: 498
Location: Richmond VA
Points: 4518

Post Posted: Fri Sep 08, 2017 12:48 pm Reply with quote    Back to top    

You can leverage Unix's capabilities to meet this requirement as given below:

In Seq file stage's properties
    1. Set value of FILTER to
    Code:
    nawk -F\" 'BEGIN{OFS=FS;} {for(i=1;i<=NF;i=i+2){gsub(/,/,"~",$i);} print $0;}'
    This command will replace the "delimiter commas" with "~" (assuming ~ does not occur in data), while commas within quotes will be ignored.

    2. Set value of "Delimiter" to ~ (replacee delimiter).
Rate this response:  
chulett

Premium Poster


since January 2006

Group memberships:
Premium Members, Inner Circle, Server to Parallel Transition Group

Joined: 12 Nov 2002
Posts: 42215
Location: Denver, CO
Points: 216749

Post Posted: Fri Sep 08, 2017 12:56 pm Reply with quote    Back to top    

... except they are not having issues with commas but rather embedded double quotes inside one of the double-quoted string data fields.

_________________
-craig

Watch out where the huskies go and don't you eat that yellow snow
Rate this response:  
rkashyap



Group memberships:
Premium Members

Joined: 02 Dec 2011
Posts: 498
Location: Richmond VA
Points: 4518

Post Posted: Fri Sep 08, 2017 2:08 pm Reply with quote    Back to top    

atulgoel wrote:
I have tried reading using sequential file stage(Delimited as comma and Quotes as None), its not able to read 4 column.

Craig, You have a point about embedded double quotes ... though with solution mentioned above, all columns are read, as embedded double quotes and commas etc no longer trip DataStage.
Rate this response:  
Display posts from previous:       

Add To Favorites
View next topic
View previous topic
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum



Powered by phpBB © 2001, 2002 phpBB Group
Theme & Graphics by Daz :: Portal by Smartor
All times are GMT - 6 Hours