How to process a sequential file containing quotes.
Moderators: chulett, rschirm, roy
How to process a sequential file containing quotes.
I need to load a comma separated sequential file where each column is wrapped in a pair of quotes e.g.
"Field 1", "Field 2", "Field 3"
But in some columns there are quotes within the data, e.g.
"Field 1", "New ""Field" 2"", "Field 3"
When the above csv example is loaded into Excel, it correctly parses the data as below:-
Field 1 New "Field 2" Field 3
Within the DataStage sequential stage I have specified the delimiter as commas and quote = double.
But DataStage does not seem to be able to handle to above scenario and rejects the row.
I've thought of filtering the file using an awk command for example, but shouldn't DataStage be able to handle this?
"Field 1", "Field 2", "Field 3"
But in some columns there are quotes within the data, e.g.
"Field 1", "New ""Field" 2"", "Field 3"
When the above csv example is loaded into Excel, it correctly parses the data as below:-
Field 1 New "Field 2" Field 3
Within the DataStage sequential stage I have specified the delimiter as commas and quote = double.
But DataStage does not seem to be able to handle to above scenario and rejects the row.
I've thought of filtering the file using an awk command for example, but shouldn't DataStage be able to handle this?
Re: How to process a sequential file containing quotes.
Not sure if there is a better way but the following approach should help:
1. Use .csv or .txt file as the input file
2. set delimiter as "," (comma) and quote charater as "000" (no character) in the input sequential file stage
3. using a transformer remove the first and last quotes for each field using expression:
Right(Left(DSLink.Str, Len(DSLink.Str)-1),Len(DSLink.Str)-2)
In this example DSLink is the input link and Str is the filed. It worked for me in my test.
Hope this helps.
1. Use .csv or .txt file as the input file
2. set delimiter as "," (comma) and quote charater as "000" (no character) in the input sequential file stage
3. using a transformer remove the first and last quotes for each field using expression:
Right(Left(DSLink.Str, Len(DSLink.Str)-1),Len(DSLink.Str)-2)
In this example DSLink is the input link and Str is the filed. It worked for me in my test.
Hope this helps.
This sure seems like something DataStage should be able to do automatically. It is a common way to handle quotes in databases. And if a Microsuck tool can do it, then Datastage should be able to do it with it's eyes closed.
Brad.
Brad.
It is not that I am addicted to coffee, it's just that I need it to survive.
Hi,
Also just to add to my previous comments,
if the data contains quotes, then the csv file will be saved with double double quotes. Has shown is my first post.
Correct me if I'm wrong, but using crsi suggestions, then double double quotes would be loaded, instead of the original a single double quote.
(I hope the above makes sense!).
Also just to add to my previous comments,
if the data contains quotes, then the csv file will be saved with double double quotes. Has shown is my first post.
Correct me if I'm wrong, but using crsi suggestions, then double double quotes would be loaded, instead of the original a single double quote.
(I hope the above makes sense!).
I have found a possible solution, by running the following awk script on the filter option of the sequential file stage :-
awk 'BEGIN{FS=",";OFS=","}{gsub("\"\"","",$2)}1'
This script basically strips out the quotes within column 2.
Though this has following drawbacks :-
- I am unable to use a reject link from the sequential stage since the table scehma is not defined. The actual error is 'sfrejects: A schema must be set in APT_FileExportOperator before use.'
- The above script only processes column 2, therefore for each column the gsub function would need to be specified.
p.s. thank you Shankar for the script!
awk 'BEGIN{FS=",";OFS=","}{gsub("\"\"","",$2)}1'
This script basically strips out the quotes within column 2.
Though this has following drawbacks :-
- I am unable to use a reject link from the sequential stage since the table scehma is not defined. The actual error is 'sfrejects: A schema must be set in APT_FileExportOperator before use.'
- The above script only processes column 2, therefore for each column the gsub function would need to be specified.
p.s. thank you Shankar for the script!
Were you able to reslve the answer to schema def issue here?
hiteshm wrote:I have found a possible solution, by running the following awk script on the filter option of the sequential file stage :-
awk 'BEGIN{FS=",";OFS=","}{gsub("""","",$2)}1'
This script basically strips out the quotes within column 2.
Though this has following drawbacks :-
- I am unable to use a reject link from the sequential stage since the table scehma is not defined. The actual error is 'sfrejects: A schema must be set in APT_FileExportOperator before use.'
- The above script only processes column 2, therefore for each column the gsub function would need to be specified.
p.s. thank you Shankar for the script!
Interesting first post.
girish312 - you need to include your questions in the body of the post, no-one can see your subject until they reply. And most won't even notice it, I'd wager.
girish312 - you need to include your questions in the body of the post, no-one can see your subject until they reply. And most won't even notice it, I'd wager.
girish312 wrote:Were you able to reslve the answer to schema def issue here?
-craig
"You can never have too many knives" -- Logan Nine Fingers
"You can never have too many knives" -- Logan Nine Fingers
-
- Premium Member
- Posts: 11
- Joined: Sat Mar 12, 2011 2:31 am
- Location: united states
HI, though its an old post - i found a solution and thought worth sharing.
Am in DS 8.5 (hope this works with 8.x)
Set the Sequential Stage format to comma separated without quotes.
next got to each column in that stage, double clicking that field will take you to column metadata.
for other fields (those which dont have issues) add field level quote=double
for fields having this issue set field level quote=none && String Type Character Set=ASCII
this worked great for me
Am in DS 8.5 (hope this works with 8.x)
Set the Sequential Stage format to comma separated without quotes.
next got to each column in that stage, double clicking that field will take you to column metadata.
for other fields (those which dont have issues) add field level quote=double
for fields having this issue set field level quote=none && String Type Character Set=ASCII
this worked great for me
Thanks!
Sivanandha
Sivanandha