Multi-Byte Character Shifts Data and Next Field Is Not Recog

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
Mike
Premium Member
Premium Member
Posts: 1021
Joined: Sun Mar 03, 2002 6:01 pm
Location: Tampa, FL

Post by Mike »

It would appear that you are receiving your data encoded with the UTF-8 character set... in which E2 80 9D is a three byte encoding of a RIGHT DOUBLE QUOTATION MARK character.

UTF-8 characters can be from 1 to 4 bytes in length.

You will need to revisit your interface contract with the source data provider to work out a method for dealing with variable length characters.

Perhaps an agreement to only use 1-byte ascii characters or perhaps an agreement to expand character fields by a factor of 4 to accomodate the worst case size of UTF-8 characters.

Mike
Post Reply