Problem with handling quote character

Archive of postings to DataStageUsers@Oliver.com. This forum intended only as a reference and cannot be posted to.

Moderators: chulett, rschirm

Locked
admin
Posts: 8720
Joined: Sun Jan 12, 2003 11:26 pm

Problem with handling quote character

Post by admin »

Hello people,

I have an input sequential file coming from an external source. For
simplicity, I have created a simple file which I am trying to highlight the
problem.

Example:
1,"Paul Ko","abc street"
2,"Paul"Ko","abc street"
3,"Paul,Ko","abc street"

To read the file, I have to use the sequential file stage, set the delimiter
to comma and set the quote character to double quote.

When I do that, read warning will result on the second row. What datastage
will do is to remove the quote character and become "PaulKo" instead. This
is not what I wish to happen.


I tried another way which is to set the delimited to comma but set the quote
character to 000 in the sequential file stage. Then I can use the trim
function to remove the first and the last quote. Unfortunately, this will
result in error on the third row as the name "Paul,Ko" has an extra comma
within it.

Any suggestion.


Paul Ko
TelstraClear - Data Warehouse Consultant
<b>PLEASE READ</b>
Do not contact admin unless you have technical support or account questions. Do not send email or Private Messages about discussion topics to ADMIN. Contact the webmaster concerning abusive or offensive posts.
admin
Posts: 8720
Joined: Sun Jan 12, 2003 11:26 pm

Problem with handling quote character

Post by admin »

You really shouldn't be putting address information into a CSV file as addresses frequently contain both commas and quotes embedded into the middle of fields. The best solution is to get your supplier to change to another delimiter such as the pipe | character.

Your biggest problem is the mix of numeric and text fields which means some fields have commas and quotes for delimiters and some just have commas. If you read the row in as a single big chunk to text you will have a problem with the mix between text fields with surrounding quotes with numbers and columns in them and geniune numeric fields.

ie
3,"Paul Ko","1, 21 abc street"

In this case you would parse the comma in 3," as a delimiter but not the comma in "1, 21 abc. How do you tell the difference?

You will make your parsing easier if you can split the incoming records into numeric fields and text fields, this depends on the order in which field appear. If the non string fields appear at the start and end of the record you can split them off with the FIELD command assuming the comma is a safe delimiter. You can then parse the remainder of the row which consists of quote delimited text fields by assuming that "," is the delimiter and not just ,. It is highly unlikely that you would get "," inside a text field.

regards
Vincent

-----Original Message-----
From: David Barham [mailto:David.Barham@Anglocoal.com.au]
Sent: Fri 23/01/2004 10:27 AM
To: 'datastage-users@oliver.com'
Cc:
Subject: RE: Problem with handling quote character



I'm not quite sure how you expect DataStage to apply the logic to deal with
this. What is generating the CSV file? It contravenes any conventions for
CSV files. Tools that generate CSV files should do something special with
embedded delimiters.

If you really have to deal with that sort of input, you might need to read
the entire record as a single field and then write your own function to
extract the individual fields. Or, use your own function to "tidy" the line
then the FIELD function to extract the fields.


-----Original Message-----
From: Paul Ko [mailto:Paul.Ko@team.telstraclear.co.nz]
Sent: Friday, 23 January 2004 9:16 AM
To: datastage-users@oliver.com
Subject: Problem with handling quote character

Hello people,

I have an input sequential file coming from an external source. For
simplicity, I have created a simple file which I am trying to highlight the
problem.

Example:
1,"Paul Ko","abc street"
2,"Paul"Ko","abc street"
3,"Paul,Ko","abc street"

To read the file, I have to use the sequential file stage, set the delimiter
to comma and set the quote character to double quote.

When I do that, read warning will result on the second row. What datastage
will do is to remove the quote character and become "PaulKo" instead. This
is not what I wish to happen.


I tried another way which is to set the delimited to comma but set the quote
character to 000 in the sequential file stage. Then I can use the trim
function to remove the first and the last quote. Unfortunately, this will
result in error on the third row as the name "Paul,Ko" has an extra comma
within it.

Any suggestion.


Paul Ko
TelstraClear - Data Warehouse Consultant




**********************************************************************
This email and any files transmitted with it are confidential and
intended solely for the use of the individual or entity to whom they
are addressed. If you have received this email in error please notify
the system manager.

This footnote also confirms that this email message has been swept by
MIMEsweeper for the presence of computer viruses.

www.mimesweeper.com
**********************************************************************
<b>PLEASE READ</b>
Do not contact admin unless you have technical support or account questions. Do not send email or Private Messages about discussion topics to ADMIN. Contact the webmaster concerning abusive or offensive posts.
admin
Posts: 8720
Joined: Sun Jan 12, 2003 11:26 pm

Problem with handling quote character

Post by admin »

Paul,

Can the software creating the file use a pipe | as the delimiter? Can the sequential file stage recognize a pipe as the delimiter? I have very rarely seen a pipe in data. If you have these perhaps some other esoteric character may work.


Regards,
Jack

-----Original Message-----
From: Paul Ko [mailto:Paul.Ko@team.telstraclear.co.nz]
Sent: Thursday, January 22, 2004 6:16 PM
To: datastage-users@oliver.com
Subject: Problem with handling quote character

Hello people,

I have an input sequential file coming from an external source. For
simplicity, I have created a simple file which I am trying to highlight the
problem.

Example:
1,"Paul Ko","abc street"
2,"Paul"Ko","abc street"
3,"Paul,Ko","abc street"

To read the file, I have to use the sequential file stage, set the delimiter
to comma and set the quote character to double quote.

When I do that, read warning will result on the second row. What datastage
will do is to remove the quote character and become "PaulKo" instead. This
is not what I wish to happen.


I tried another way which is to set the delimited to comma but set the quote
character to 000 in the sequential file stage. Then I can use the trim
function to remove the first and the last quote. Unfortunately, this will
result in error on the third row as the name "Paul,Ko" has an extra comma
within it.

Any suggestion.


Paul Ko
TelstraClear - Data Warehouse Consultant



_______________________________________________
datastage-users mailing list
datastage-users@oliver.com
http://www.oliver.com/mailman/listinfo/datastage-users
<b>PLEASE READ</b>
Do not contact admin unless you have technical support or account questions. Do not send email or Private Messages about discussion topics to ADMIN. Contact the webmaster concerning abusive or offensive posts.
admin
Posts: 8720
Joined: Sun Jan 12, 2003 11:26 pm

Problem with handling quote character

Post by admin »

Hi, Paul

The easiest way is to re-generate this sequential file using pipe or
ampersand ' & ', or ' ^ ' as delimeter.

By the way, this issue happened before while I worked in TelstraClear, and I
remember that the ETL team had all agreed to use ' ^ ' as delimeter for all
generated sequential files.

xiong

-----Original Message-----
From: Paul Ko [mailto:Paul.Ko@team.telstraclear.co.nz]
Sent: 2004?1?23? 12:16
To: datastage-users@oliver.com
Subject: Problem with handling quote character


Hello people,

I have an input sequential file coming from an external source. For
simplicity, I have created a simple file which I am trying to highlight the
problem.

Example:
1,"Paul Ko","abc street"
2,"Paul"Ko","abc street"
3,"Paul,Ko","abc street"

To read the file, I have to use the sequential file stage, set the delimiter
to comma and set the quote character to double quote.

When I do that, read warning will result on the second row. What datastage
will do is to remove the quote character and become "PaulKo" instead. This
is not what I wish to happen.


I tried another way which is to set the delimited to comma but set the quote
character to 000 in the sequential file stage. Then I can use the trim
function to remove the first and the last quote. Unfortunately, this will
result in error on the third row as the name "Paul,Ko" has an extra comma
within it.

Any suggestion.


Paul Ko
TelstraClear - Data Warehouse Consultant


_______________________________________________
datastage-users mailing list
datastage-users@oliver.com
http://www.oliver.com/mailman/listinfo/datastage-users
<b>PLEASE READ</b>
Do not contact admin unless you have technical support or account questions. Do not send email or Private Messages about discussion topics to ADMIN. Contact the webmaster concerning abusive or offensive posts.
admin
Posts: 8720
Joined: Sun Jan 12, 2003 11:26 pm

Problem with handling quote character

Post by admin »

Hello People,

Thank you for all your valuable input. The scenario is a little bit odd
here in that the file is sent from a bank. The names that is given in the
file is signed and agreeded in the customer contract, so the names cannot be
changed.

Anyway, there are only two records in the file that are in exception to the
normal CSV format and we just have to deal with them manually.

Thank you for all your suggestions.

Paul Ko
TelstraClear - Data Warehouse Consultant


-----Original Message-----
From: Xiong Li [mailto:xj.liang@paradise.net.nz]
Sent: Sunday, 25 January, 2004 10:19
To: datastage-users@oliver.com
Subject: RE: Problem with handling quote character


Hi, Paul

The easiest way is to re-generate this sequential file using pipe or
ampersand ' & ', or ' ^ ' as delimeter.

By the way, this issue happened before while I worked in TelstraClear, and I
remember that the ETL team had all agreed to use ' ^ ' as delimeter for all
generated sequential files.

xiong

-----Original Message-----
From: Paul Ko [mailto:Paul.Ko@team.telstraclear.co.nz]
Sent: 2004?1?23? 12:16
To: datastage-users@oliver.com
Subject: Problem with handling quote character


Hello people,

I have an input sequential file coming from an external source. For
simplicity, I have created a simple file which I am trying to highlight the
problem.

Example:
1,"Paul Ko","abc street"
2,"Paul"Ko","abc street"
3,"Paul,Ko","abc street"

To read the file, I have to use the sequential file stage, set the delimiter
to comma and set the quote character to double quote.

When I do that, read warning will result on the second row. What datastage
will do is to remove the quote character and become "PaulKo" instead. This
is not what I wish to happen.


I tried another way which is to set the delimited to comma but set the quote
character to 000 in the sequential file stage. Then I can use the trim
function to remove the first and the last quote. Unfortunately, this will
result in error on the third row as the name "Paul,Ko" has an extra comma
within it.

Any suggestion.


Paul Ko
TelstraClear - Data Warehouse Consultant


_______________________________________________
datastage-users mailing list
datastage-users@oliver.com
http://www.oliver.com/mailman/listinfo/datastage-users
_______________________________________________
datastage-users mailing list
datastage-users@oliver.com
http://www.oliver.com/mailman/listinfo/datastage-users
<b>PLEASE READ</b>
Do not contact admin unless you have technical support or account questions. Do not send email or Private Messages about discussion topics to ADMIN. Contact the webmaster concerning abusive or offensive posts.
admin
Posts: 8720
Joined: Sun Jan 12, 2003 11:26 pm

Problem with handling quote character

Post by admin »

That figures. The question "which bank" pops to mind, but you don't have to
answer that. :)

It's not an option to tell the bank that their CSV format data is INVALID
and ask them to fix it???

-----Original Message-----
From: datastage-users-bounces@oliver.com
[mailto:datastage-users-bounces@oliver.com]On Behalf Of Paul Ko
Sent: Tuesday, 27 January 2004 6:25 AM
To: datastage-users@oliver.com
Subject: RE: Problem with handling quote character

Hello People,

Thank you for all your valuable input. The scenario is a little bit odd
here in that the file is sent from a bank. The names that is given in the
file is signed and agreeded in the customer contract, so the names cannot be
changed.

Anyway, there are only two records in the file that are in exception to the
normal CSV format and we just have to deal with them manually.

Thank you for all your suggestions.

Paul Ko
TelstraClear - Data Warehouse Consultant


-----Original Message-----
From: Xiong Li [mailto:xj.liang@paradise.net.nz]
Sent: Sunday, 25 January, 2004 10:19
To: datastage-users@oliver.com
Subject: RE: Problem with handling quote character


Hi, Paul

The easiest way is to re-generate this sequential file using pipe or
ampersand ' & ', or ' ^ ' as delimeter.

By the way, this issue happened before while I worked in TelstraClear, and I
remember that the ETL team had all agreed to use ' ^ ' as delimeter for all
generated sequential files.

xiong

-----Original Message-----
From: Paul Ko [mailto:Paul.Ko@team.telstraclear.co.nz]
Sent: 2004?1?23? 12:16
To: datastage-users@oliver.com
Subject: Problem with handling quote character


Hello people,

I have an input sequential file coming from an external source. For
simplicity, I have created a simple file which I am trying to highlight the
problem.

Example:
1,"Paul Ko","abc street"
2,"Paul"Ko","abc street"
3,"Paul,Ko","abc street"

To read the file, I have to use the sequential file stage, set the delimiter
to comma and set the quote character to double quote.

When I do that, read warning will result on the second row. What datastage
will do is to remove the quote character and become "PaulKo" instead. This
is not what I wish to happen.


I tried another way which is to set the delimited to comma but set the quote
character to 000 in the sequential file stage. Then I can use the trim
function to remove the first and the last quote. Unfortunately, this will
result in error on the third row as the name "Paul,Ko" has an extra comma
within it.

Any suggestion.


Paul Ko
TelstraClear - Data Warehouse Consultant


_______________________________________________
datastage-users mailing list
datastage-users@oliver.com
http://www.oliver.com/mailman/listinfo/datastage-users
_______________________________________________
datastage-users mailing list
datastage-users@oliver.com
http://www.oliver.com/mailman/listinfo/datastage-users


_______________________________________________
datastage-users mailing list
datastage-users@oliver.com
http://www.oliver.com/mailman/listinfo/datastage-users
<b>PLEASE READ</b>
Do not contact admin unless you have technical support or account questions. Do not send email or Private Messages about discussion topics to ADMIN. Contact the webmaster concerning abusive or offensive posts.
admin
Posts: 8720
Joined: Sun Jan 12, 2003 11:26 pm

Problem with handling quote character

Post by admin »

I'm prepared to bet that there's nothing in that contract that allows them to break the industry-standard formatting rules for delimited (CSV) files, by including quote characters in strings enclosed in the same quote character. If I'm right, you've probably got an argument to force them to do things right.

----- Original Message -----
From: Paul Ko
Date: Tue, 27 Jan 2004 09:24:34 +1300
To: datastage-users@oliver.com
Subject: RE: Problem with handling quote character

> Hello People,
>
> Thank you for all your valuable input. The scenario is a little bit odd
> here in that the file is sent from a bank. The names that is given in the
> file is signed and agreeded in the customer contract, so the names cannot be
> changed.
>
> Anyway, there are only two records in the file that are in exception to the
> normal CSV format and we just have to deal with them manually.
>
> Thank you for all your suggestions.
>
> Paul Ko
> TelstraClear - Data Warehouse Consultant
>
>

_______________________________________________
datastage-users mailing list
datastage-users@oliver.com
http://www.oliver.com/mailman/listinfo/datastage-users
<b>PLEASE READ</b>
Do not contact admin unless you have technical support or account questions. Do not send email or Private Messages about discussion topics to ADMIN. Contact the webmaster concerning abusive or offensive posts.
admin
Posts: 8720
Joined: Sun Jan 12, 2003 11:26 pm

Problem with handling quote character

Post by admin »

Ray,

I think you probably misunderstand. I suspect the contract itself has
nothing to do with the data warehouse or csv files itself, rather the
contract is between the bank and the customer, and the contract information
is to be stored in the data warehouse or repository. That is all. So really
the problem is the specification of the extract, in that the extract cannot
be a csv file by definition of the REAL world data.

Phil.

Phil Walker
+64 21 336294
phil.walker@infocusp.co.nz
infocusp limited
\\ PO Box 77032, Auckland New Zealand

-----Original Message-----
From: datastage-users-bounces@oliver.com
[mailto:datastage-users-bounces@oliver.com]On Behalf Of Ray Wurlod
Sent: Wednesday, January 28, 2004 8:49 AM
To: DataStage Users Discussion List
Subject: RE: Problem with handling quote character

I'm prepared to bet that there's nothing in that contract that allows them
to break the industry-standard formatting rules for delimited (CSV) files,
by including quote characters in strings enclosed in the same quote
character. If I'm right, you've probably got an argument to force them to
do things right.

----- Original Message -----
From: Paul Ko
Date: Tue, 27 Jan 2004 09:24:34 +1300
To: datastage-users@oliver.com
Subject: RE: Problem with handling quote character

> Hello People,
>
> Thank you for all your valuable input. The scenario is a little bit odd
> here in that the file is sent from a bank. The names that is given in the
> file is signed and agreeded in the customer contract, so the names cannot
be
> changed.
>
> Anyway, there are only two records in the file that are in exception to
the
> normal CSV format and we just have to deal with them manually.
>
> Thank you for all your suggestions.
>
> Paul Ko
> TelstraClear - Data Warehouse Consultant
>
>

_______________________________________________
datastage-users mailing list
datastage-users@oliver.com
http://www.oliver.com/mailman/listinfo/datastage-users


_______________________________________________
datastage-users mailing list
datastage-users@oliver.com
http://www.oliver.com/mailman/listinfo/datastage-users
<b>PLEASE READ</b>
Do not contact admin unless you have technical support or account questions. Do not send email or Private Messages about discussion topics to ADMIN. Contact the webmaster concerning abusive or offensive posts.
Locked