Multi-Byte Character Shifts Data and Next Field Is Not Recog

1 post • Page 1 of 1

Mike: Premium Member; Posts: 1021; Joined: Sun Mar 03, 2002 6:01 pm; Location: Tampa, FL

Quote

Post by Mike » Wed Oct 30, 2013 12:57 pm

It would appear that you are receiving your data encoded with the UTF-8 character set... in which E2 80 9D is a three byte encoding of a RIGHT DOUBLE QUOTATION MARK character.

UTF-8 characters can be from 1 to 4 bytes in length.

You will need to revisit your interface contract with the source data provider to work out a method for dealing with variable length characters.

Perhaps an agreement to only use 1-byte ascii characters or perhaps an agreement to expand character fields by a factor of 4 to accomodate the worst case size of UTF-8 characters.

Mike

Post Reply

1 post • Page 1 of 1

Return to “IBM<sup>Â®</sup> DataStage Enterprise Edition (Formerly Parallel Extender/PX)”