Page 1 of 1

Junk character in sequential file viewer stage

Posted: Fri Dec 14, 2018 8:36 am
by rumu
Hi All,

I am reading a cobol EBCDIC file using CFF stage and loading it into Sequential file stage. There are 2 fields defined in CFF as PICX(2) and PIC X(1) which is in Record Layout shown as CHARCTER 2 and CHARACTER 1 respectively.

I directly mapped those 2 fields to sequential file stage using datatype Char(2) and CHAR(1).
Some data are shown in the datastage file viewer for the field with 2 charadters as

Code: Select all

?|
The second character is | like but I can not copy it as when I am pasting only ? is pasted.
I used String to Raw function to display it and I got the following

Code: Select all

{1a 18}
Are these CAN and Linefeed in HEX? How do I remove them ? When I see it in Unix, it shows noting....

The column has 2 distinct Values when I putput in Unix

RE and blank.

I use dthe following commnd to deisplay in hexdump:

Code: Select all

-bash-4.2$ cat RDTDP.txt|cut -d'|' -f1|sort|uniq|hexdump
0000000 181a 1a0a 0a1a 4552 000a
0000009
How can I convert these foreign characters to space?

Posted: Mon Dec 17, 2018 1:06 am
by ray.wurlod
What character set are you using? Could these be double-byte representations of Unicode characters?

Posted: Mon Dec 17, 2018 6:47 am
by rumu
Hi Ray,

The NLS map is set to Project default(UTF-8).

I used following derivation in the transformer and those characters were not seen.
Trim(Trim(DSLink3.RDT_ADDL_SEG_KEY_PROD,char(24)),char(26))
I used 24 as Dec representation for hex 18 and 26 is Dec representation for hex 1A.
Is that approach ok?

Posted: Tue Dec 18, 2018 12:46 am
by ray.wurlod
Who knows? You've condemned what might be valid characters to be "junk". I'd examine that assumption pretty closely.

Posted: Tue Dec 18, 2018 8:38 am
by rumu
Hi Ray,

I used StringToRaw function to to check the values. How can I identify whether it is a double byte character?