ANSI to UTF-8
Moderators: chulett, rschirm, roy
-
- Premium Member
- Posts: 40
- Joined: Tue Oct 14, 2008 3:30 pm
- Location: London
ANSI to UTF-8
Hello,
I want to convert the sequential file from ANSI to UTF-8 format.
I have tried setting the NLS MAP to UTF-8 at the project level and the NLS MAP at the stage level is also set to UTF-8 just to make sure. The record delimiter is set to UNIX Newline.
The file is not geting created in the UTF-8 format. When I download and open the test file in Edit Plus Editor or Textpad Editor, the file encoding shows as ANSI.
This looks very strange, I am not sure if we are missing something. I have managed to replicate this issue using a sample job with row generator and a sequential file. Any help much appreciated. DataStage Version 8.5/OS - AIX
Thanks,
Senthil
I want to convert the sequential file from ANSI to UTF-8 format.
I have tried setting the NLS MAP to UTF-8 at the project level and the NLS MAP at the stage level is also set to UTF-8 just to make sure. The record delimiter is set to UNIX Newline.
The file is not geting created in the UTF-8 format. When I download and open the test file in Edit Plus Editor or Textpad Editor, the file encoding shows as ANSI.
This looks very strange, I am not sure if we are missing something. I have managed to replicate this issue using a sample job with row generator and a sequential file. Any help much appreciated. DataStage Version 8.5/OS - AIX
Thanks,
Senthil
Have you considered that perhaps your "download" step is affecting the file format? What does it test as if you check it directly on the UNIX server using 'file'?
And you certainly don't need a DataStage job to convert the file, iconv from the command line would do it but you'd have to clarify what 'ANSI' means as that is a generic Windows term. Probably 'Windows-1252'.
And you certainly don't need a DataStage job to convert the file, iconv from the command line would do it but you'd have to clarify what 'ANSI' means as that is a generic Windows term. Probably 'Windows-1252'.
-craig
"You can never have too many knives" -- Logan Nine Fingers
"You can never have too many knives" -- Logan Nine Fingers
-
- Premium Member
- Posts: 40
- Joined: Tue Oct 14, 2008 3:30 pm
- Location: London
Thanks for your response, I tried a sample job as the original job which transforms the XML to a CSV file is quite complex. For the sample job I have moved the file created in windows setting encoding type as 'ANSI' and moved to AIX using FTP client in binary mode. I am able to see the format as ' ASCII TEXT', output is also in 'ASCII TEXT', if the outfile file is in UTF-8 it shows as ' data or International Language text' in AIX. I have tested the job with sample UTF-8 file as source, the target is getting created in UTF-8.
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
-
- Premium Member
- Posts: 40
- Joined: Tue Oct 14, 2008 3:30 pm
- Location: London
The problem is not resolved, I have responded to 'chulett' question. The issue is, I am unable to create a UTF-8 CSV file in the target. Even I have tested this with a sample job with Row Generator and Sequential file stage. Is there anything I am missing? UTF-8 is set at project level, job level...I even tried setting the same in sequential file stage but still the result is same.
Any light on this issue much appreciated.
Thanks,
Senthil Kumar
Any light on this issue much appreciated.
Thanks,
Senthil Kumar
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
-
- Premium Member
- Posts: 40
- Joined: Tue Oct 14, 2008 3:30 pm
- Location: London
Thanks, its still the same. I have explicitly set the exrtended property across all the stages, still the output is in 'ASCII TEXT'. I assume the setting given in the stage will overrisde any setting given in Job/Project/UVCONFIG. In stage/job/project the NLS is set to UTF-8. The reason is, when I checked the uvconfig the following are values set. I am keen on the NLSDEFSEQMAP as it says '# with sequential file input/output to a file or # device that has no explicit map associated with # it. Can be overridden by a SET.SEQ.MAP command'.
I am not sure if the issue is because of this.
Please share your thoughts or if you have any other suggestion please advise.
NLSDEFSEQMAP ISO8859-1
NLSMODE = 1
NLSREADELSE = 1
NLSWRITEELSE = 1
NLSDEFSOCKMAP = NONE
NLSDEFFILEMAP = ISO8859-1
NLSDEFDIRMAP = ISO8859-1+MARKS
NLSNEWFILEMAP = NONE
NLSNEWDIRMAP = ISO8859-1
NLSDEFPTRMAP = ISO8859-1
NLSDEFTERMMAP = ISO8859-1
NLSDEFDEVMAP = ISO8859-1
NLSDEFGCIMAP = NONE
NLSDEFSRVMAP = MS1252-CS
NLSDEFSEQMAP = ISO8859-1
NLSOSMAP = ISO8859-1+MARKS
NLSLCMODE = 1
NLSDEFUSERLC = US-ENGLISH
NLSDEFSRVLC = US-ENGLISH
Thanks,
Senthil
I am not sure if the issue is because of this.
Please share your thoughts or if you have any other suggestion please advise.
NLSDEFSEQMAP ISO8859-1
NLSMODE = 1
NLSREADELSE = 1
NLSWRITEELSE = 1
NLSDEFSOCKMAP = NONE
NLSDEFFILEMAP = ISO8859-1
NLSDEFDIRMAP = ISO8859-1+MARKS
NLSNEWFILEMAP = NONE
NLSNEWDIRMAP = ISO8859-1
NLSDEFPTRMAP = ISO8859-1
NLSDEFTERMMAP = ISO8859-1
NLSDEFDEVMAP = ISO8859-1
NLSDEFGCIMAP = NONE
NLSDEFSRVMAP = MS1252-CS
NLSDEFSEQMAP = ISO8859-1
NLSOSMAP = ISO8859-1+MARKS
NLSLCMODE = 1
NLSDEFUSERLC = US-ENGLISH
NLSDEFSRVLC = US-ENGLISH
Thanks,
Senthil
Just to clarify, 7-bit ASCII characters are a subset of UTF-8, so they are encoded exactly the same and there is no conversion. If your entire "ANSI" source file consists of 7-bit ASCII characters then you will see no difference. Try including some character in your source file that you know will convert to a 2+ byte UTF-8 character.
Mike
Mike
-
- Premium Member
- Posts: 40
- Joined: Tue Oct 14, 2008 3:30 pm
- Location: London
-
- Premium Member
- Posts: 40
- Joined: Tue Oct 14, 2008 3:30 pm
- Location: London