ANSI to UTF-8

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
senthil_tcs
Premium Member
Premium Member
Posts: 40
Joined: Tue Oct 14, 2008 3:30 pm
Location: London

ANSI to UTF-8

Post by senthil_tcs »

Hello,

I want to convert the sequential file from ANSI to UTF-8 format.
I have tried setting the NLS MAP to UTF-8 at the project level and the NLS MAP at the stage level is also set to UTF-8 just to make sure. The record delimiter is set to UNIX Newline.

The file is not geting created in the UTF-8 format. When I download and open the test file in Edit Plus Editor or Textpad Editor, the file encoding shows as ANSI.

This looks very strange, I am not sure if we are missing something. I have managed to replicate this issue using a sample job with row generator and a sequential file. Any help much appreciated. DataStage Version 8.5/OS - AIX

Thanks,
Senthil
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Have you considered that perhaps your "download" step is affecting the file format? What does it test as if you check it directly on the UNIX server using 'file'?

And you certainly don't need a DataStage job to convert the file, iconv from the command line would do it but you'd have to clarify what 'ANSI' means as that is a generic Windows term. Probably 'Windows-1252'.
-craig

"You can never have too many knives" -- Logan Nine Fingers
senthil_tcs
Premium Member
Premium Member
Posts: 40
Joined: Tue Oct 14, 2008 3:30 pm
Location: London

Post by senthil_tcs »

Thanks for your response, I tried a sample job as the original job which transforms the XML to a CSV file is quite complex. For the sample job I have moved the file created in windows setting encoding type as 'ANSI' and moved to AIX using FTP client in binary mode. I am able to see the format as ' ASCII TEXT', output is also in 'ASCII TEXT', if the outfile file is in UTF-8 it shows as ' data or International Language text' in AIX. I have tested the job with sample UTF-8 file as source, the target is getting created in UTF-8.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Does this indicate that your problem is resolved? If so, please mark this thread as Resolved, to assist future searchers.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
senthil_tcs
Premium Member
Premium Member
Posts: 40
Joined: Tue Oct 14, 2008 3:30 pm
Location: London

Post by senthil_tcs »

The problem is not resolved, I have responded to 'chulett' question. The issue is, I am unable to create a UTF-8 CSV file in the target. Even I have tested this with a sample job with Row Generator and Sequential file stage. Is there anything I am missing? UTF-8 is set at project level, job level...I even tried setting the same in sequential file stage but still the result is same.

Any light on this issue much appreciated.

Thanks,
Senthil Kumar
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Can you explicitly set the extended property "Unicode" for each character string?
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
senthil_tcs
Premium Member
Premium Member
Posts: 40
Joined: Tue Oct 14, 2008 3:30 pm
Location: London

Post by senthil_tcs »

Thanks, its still the same. I have explicitly set the exrtended property across all the stages, still the output is in 'ASCII TEXT'. I assume the setting given in the stage will overrisde any setting given in Job/Project/UVCONFIG. In stage/job/project the NLS is set to UTF-8. The reason is, when I checked the uvconfig the following are values set. I am keen on the NLSDEFSEQMAP as it says '# with sequential file input/output to a file or # device that has no explicit map associated with # it. Can be overridden by a SET.SEQ.MAP command'.

I am not sure if the issue is because of this.

Please share your thoughts or if you have any other suggestion please advise.

NLSDEFSEQMAP ISO8859-1

NLSMODE = 1
NLSREADELSE = 1
NLSWRITEELSE = 1
NLSDEFSOCKMAP = NONE
NLSDEFFILEMAP = ISO8859-1
NLSDEFDIRMAP = ISO8859-1+MARKS
NLSNEWFILEMAP = NONE
NLSNEWDIRMAP = ISO8859-1
NLSDEFPTRMAP = ISO8859-1
NLSDEFTERMMAP = ISO8859-1
NLSDEFDEVMAP = ISO8859-1
NLSDEFGCIMAP = NONE
NLSDEFSRVMAP = MS1252-CS
NLSDEFSEQMAP = ISO8859-1
NLSOSMAP = ISO8859-1+MARKS
NLSLCMODE = 1
NLSDEFUSERLC = US-ENGLISH
NLSDEFSRVLC = US-ENGLISH

Thanks,
Senthil
Mike
Premium Member
Premium Member
Posts: 1021
Joined: Sun Mar 03, 2002 6:01 pm
Location: Tampa, FL

Post by Mike »

Just to clarify, 7-bit ASCII characters are a subset of UTF-8, so they are encoded exactly the same and there is no conversion. If your entire "ANSI" source file consists of 7-bit ASCII characters then you will see no difference. Try including some character in your source file that you know will convert to a 2+ byte UTF-8 character.

Mike
senthil_tcs
Premium Member
Premium Member
Posts: 40
Joined: Tue Oct 14, 2008 3:30 pm
Location: London

Post by senthil_tcs »

Hello Mike,
Thanks, I will check on this and get back to you.

Thanks,
Senthil
senthil_tcs
Premium Member
Premium Member
Posts: 40
Joined: Tue Oct 14, 2008 3:30 pm
Location: London

Post by senthil_tcs »

The file is creating as UTF-8 if I pass some UTF-8 special characters. I am still not clear why the same is not happening when we pass normal characters which is again valid UTF-8 characters. Any thoughts?

Thanks,
Senthil
Post Reply