Page 1 of 2

How to process a sequential file containing quotes.

Posted: Thu Jul 31, 2008 10:03 am
by hiteshm
I need to load a comma separated sequential file where each column is wrapped in a pair of quotes e.g.

"Field 1", "Field 2", "Field 3"

But in some columns there are quotes within the data, e.g.

"Field 1", "New ""Field" 2"", "Field 3"

When the above csv example is loaded into Excel, it correctly parses the data as below:-

Field 1 New "Field 2" Field 3

Within the DataStage sequential stage I have specified the delimiter as commas and quote = double.

But DataStage does not seem to be able to handle to above scenario and rejects the row.

I've thought of filtering the file using an awk command for example, but shouldn't DataStage be able to handle this?

Posted: Thu Jul 31, 2008 12:30 pm
by Raftsman
I have had this happen and in order to get it working, I substituted CSV for TXT and tab delimited. I found that the comma delimited sometime give me a headache.

Re: How to process a sequential file containing quotes.

Posted: Thu Jul 31, 2008 3:30 pm
by csri
Not sure if there is a better way but the following approach should help:

1. Use .csv or .txt file as the input file
2. set delimiter as "," (comma) and quote charater as "000" (no character) in the input sequential file stage
3. using a transformer remove the first and last quotes for each field using expression:
Right(Left(DSLink.Str, Len(DSLink.Str)-1),Len(DSLink.Str)-2)

In this example DSLink is the input link and Str is the filed. It worked for me in my test.

Hope this helps. :wink:

Posted: Thu Jul 31, 2008 4:31 pm
by bcarlson
This sure seems like something DataStage should be able to do automatically. It is a common way to handle quotes in databases. And if a Microsuck tool can do it, then Datastage should be able to do it with it's eyes closed.

Brad.

Posted: Thu Jul 31, 2008 5:56 pm
by chulett
I believe it works in Server but not in PX for some reason. Search for topics on the subject by Phil Hibbs and his problems with a rusty 6" nail.

Posted: Thu Jul 31, 2008 6:22 pm
by crouse
I don't believe it's ever worked in Server version either. Still doesn't on 7.5.2

Posted: Thu Jul 31, 2008 7:36 pm
by chulett
Hmmm... maybe it was the fact that he had a Server job that could produce the file but no PX job could read it. [shrug]

Posted: Fri Aug 01, 2008 4:18 am
by hiteshm
Thank to everyone for their comments.

With regards to csri suggestion, what would happen if the column contained commas?

eg,

Address
--------------
"5 My street, Big town"


I am still surprised why DataStage cannot automatically handle these situations!!!

Posted: Fri Aug 01, 2008 4:34 am
by hiteshm
Hi,

Also just to add to my previous comments,

if the data contains quotes, then the csv file will be saved with double double quotes. Has shown is my first post.

Correct me if I'm wrong, but using crsi suggestions, then double double quotes would be loaded, instead of the original a single double quote.

(I hope the above makes sense!).

Posted: Fri Aug 01, 2008 8:05 am
by csri
I tried writting to a sequential file and I have set the quote character to "000" on the target seqeuntial file too.

I had two fields in the input record:
input record: "A",""AA""""
output record: A,"AA"""

If you have a comma within the data then this approach doesn't work. :(

Posted: Fri Aug 01, 2008 9:21 am
by hiteshm
I have found a possible solution, by running the following awk script on the filter option of the sequential file stage :-

awk 'BEGIN{FS=",";OFS=","}{gsub("\"\"","",$2)}1'

This script basically strips out the quotes within column 2.

Though this has following drawbacks :-

- I am unable to use a reject link from the sequential stage since the table scehma is not defined. The actual error is 'sfrejects: A schema must be set in APT_FileExportOperator before use.'

- The above script only processes column 2, therefore for each column the gsub function would need to be specified.

p.s. thank you Shankar for the script!

Posted: Mon Aug 04, 2008 8:55 am
by csri
If your second field value is ""New' Field2"" What will be your output? Does your solution handle if there are two double quotes at the begining or at the end of the filed value as in the example.

Were you able to reslve the answer to schema def issue here?

Posted: Wed Aug 13, 2008 6:14 pm
by girish312
hiteshm wrote:I have found a possible solution, by running the following awk script on the filter option of the sequential file stage :-

awk 'BEGIN{FS=",";OFS=","}{gsub("""","",$2)}1'

This script basically strips out the quotes within column 2.

Though this has following drawbacks :-

- I am unable to use a reject link from the sequential stage since the table scehma is not defined. The actual error is 'sfrejects: A schema must be set in APT_FileExportOperator before use.'

- The above script only processes column 2, therefore for each column the gsub function would need to be specified.

p.s. thank you Shankar for the script!

Posted: Wed Aug 13, 2008 8:38 pm
by chulett
:? Interesting first post.

girish312 - you need to include your questions in the body of the post, no-one can see your subject until they reply. And most won't even notice it, I'd wager.
girish312 wrote:Were you able to reslve the answer to schema def issue here?

Posted: Tue Dec 31, 2013 2:22 pm
by sivanandha
HI, though its an old post - i found a solution and thought worth sharing.
Am in DS 8.5 (hope this works with 8.x)

Set the Sequential Stage format to comma separated without quotes.
next got to each column in that stage, double clicking that field will take you to column metadata.
for other fields (those which dont have issues) add field level quote=double
for fields having this issue set field level quote=none && String Type Character Set=ASCII


this worked great for me