Removing Control and Non-ASCII characters

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
Amit Jaiswal
Premium Member
Premium Member
Posts: 38
Joined: Fri Apr 22, 2005 6:07 am

Removing Control and Non-ASCII characters

Post by Amit Jaiswal »

Hi All,
Following is the code being used in Ab Initio:
out.pd_into_ovrdrft_cd :: string_replace(re_replace(in.pd_into_od_cd, "[\001-\037|\177-\377]", " "), char_string(0), " " );

My understanding is Within the column pd_into_od_cd, any occurrence of a value between \001 and \037 (control characters) or between \177 and \377 (non-ASCII characters or non-printable characters) should be replaced finally by 1 space

Can we use same code in a convert function? Or is there other way of doing this?

Thanks in advance.
-Amit
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

The Ab Initio code looks like a direct interlude to the unix tr command; which you can use as a filter in the source sequential stage. The syntax of the DataStage CONVERT function is very different, you explicitly specify two strings, and each position value in one string is convert to the value in the same position in the other string. So you would need to explicitly list those characters you wish to convert in one string and then have another string of the same lenght filled with spaces.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Convert() does not support ranges or octal representation, but you can set up a stage variable containing a string of all the characters to be converted, and a single Convert() function can then be used to replace them with " " from a stage variable containing the same number of space characters as there are unwanted characters in the first string.

Code: Select all

Convert(svUnwantedChars, svSpaces, InLink.TheString)
Or you could use a UNIX command such as tr (perhaps in an External Filter stage) which can handle regular expressions as in your example.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
katz
Charter Member
Charter Member
Posts: 52
Joined: Thu Jan 20, 2005 8:13 am

Post by katz »

Just thinking aloud, but could this translation be done using a custom NLS map?

katz
Amit Jaiswal
Premium Member
Premium Member
Posts: 38
Joined: Fri Apr 22, 2005 6:07 am

Post by Amit Jaiswal »

Hi,

I am sorry Ray, since I am not a premium member of this forum I am not able to see your copleter response here. I am still struggling with this. Can anyone tell me how to find and replace non-ascii/control characters from the string?
Thanks in advance.

-Amit
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Said pretty much the same Arnd did. Investigate the Convert function, use multiple Char() functions cat'd together to build the list of characters to change.
-craig

"You can never have too many knives" -- Logan Nine Fingers
us1aslam1us
Charter Member
Charter Member
Posts: 822
Joined: Sat Sep 17, 2005 5:25 pm
Location: USA

Post by us1aslam1us »

If i remember it correctly, one of the user 'michaeld' or 'bcarlson' has written a generic 'C' function and posted here which does this.Do a search in the forum.
I haven't failed, I've found 10,000 ways that don't work.
Thomas Alva Edison(1847-1931)
olgc
Participant
Posts: 145
Joined: Tue Nov 18, 2003 9:00 am

Post by olgc »

It can be detected with function index(strg, char(8206),1) for Unicode 0x200E, but it seems convert(strg, char(8206), '') doesn't recognise the char(8206), so doesn't work as expected.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

So... are you looking for help with that or simply providing help 11 years after the fact? :wink:
-craig

"You can never have too many knives" -- Logan Nine Fingers
olgc
Participant
Posts: 145
Joined: Tue Nov 18, 2003 9:00 am

Post by olgc »

Another function works: ereplace(strg, char(8206), '')
Post Reply