Page 1 of 1

Removing Control and Non-ASCII characters

Posted: Wed Oct 10, 2007 4:49 pm
by Amit Jaiswal
Hi All,
Following is the code being used in Ab Initio:
out.pd_into_ovrdrft_cd :: string_replace(re_replace(in.pd_into_od_cd, "[\001-\037|\177-\377]", " "), char_string(0), " " );

My understanding is Within the column pd_into_od_cd, any occurrence of a value between \001 and \037 (control characters) or between \177 and \377 (non-ASCII characters or non-printable characters) should be replaced finally by 1 space

Can we use same code in a convert function? Or is there other way of doing this?

Thanks in advance.
-Amit

Posted: Wed Oct 10, 2007 5:25 pm
by ArndW
The Ab Initio code looks like a direct interlude to the unix tr command; which you can use as a filter in the source sequential stage. The syntax of the DataStage CONVERT function is very different, you explicitly specify two strings, and each position value in one string is convert to the value in the same position in the other string. So you would need to explicitly list those characters you wish to convert in one string and then have another string of the same lenght filled with spaces.

Posted: Wed Oct 10, 2007 5:27 pm
by ray.wurlod
Convert() does not support ranges or octal representation, but you can set up a stage variable containing a string of all the characters to be converted, and a single Convert() function can then be used to replace them with " " from a stage variable containing the same number of space characters as there are unwanted characters in the first string.

Code: Select all

Convert(svUnwantedChars, svSpaces, InLink.TheString)
Or you could use a UNIX command such as tr (perhaps in an External Filter stage) which can handle regular expressions as in your example.

Posted: Thu Oct 11, 2007 5:03 pm
by katz
Just thinking aloud, but could this translation be done using a custom NLS map?

katz

Posted: Sat Oct 13, 2007 7:58 am
by Amit Jaiswal
Hi,

I am sorry Ray, since I am not a premium member of this forum I am not able to see your copleter response here. I am still struggling with this. Can anyone tell me how to find and replace non-ascii/control characters from the string?
Thanks in advance.

-Amit

Posted: Sat Oct 13, 2007 8:07 am
by chulett
Said pretty much the same Arnd did. Investigate the Convert function, use multiple Char() functions cat'd together to build the list of characters to change.

Posted: Sat Oct 13, 2007 3:56 pm
by us1aslam1us
If i remember it correctly, one of the user 'michaeld' or 'bcarlson' has written a generic 'C' function and posted here which does this.Do a search in the forum.

Posted: Fri Jun 01, 2018 12:27 pm
by olgc
It can be detected with function index(strg, char(8206),1) for Unicode 0x200E, but it seems convert(strg, char(8206), '') doesn't recognise the char(8206), so doesn't work as expected.

Posted: Fri Jun 01, 2018 12:38 pm
by chulett
So... are you looking for help with that or simply providing help 11 years after the fact? :wink:

Posted: Fri Jun 01, 2018 1:11 pm
by olgc
Another function works: ereplace(strg, char(8206), '')