Page 1 of 1

Retyping to any of the know classes

Posted: Tue Dec 23, 2014 10:21 pm
by hitmanthesilentassasin
Hi,

I am trying to remove dots following initials and combine them. after combining the initials I would want to retype the token to appropriate class. But I am not able to do.

below is the example I am trying to achieve.

Code: Select all

Input string:
1. abc xyc Pty L.T.D
2. J.O.H.N CAFE
Output pattern I am looking to achieve is :

Code: Select all

1. abc xyz pty LTD -> ++WO
2. JOHN CAFE -> FW
I have already tried retyping the token like below but nothing seems to be working.

Code: Select all

1. RETYPE [1] & "JOHN"
2. RETYPE [1] * "JOHN"
3. RETYPE [1] + "JOHN"
Thanks for your help!!

Posted: Wed Dec 24, 2014 8:15 am
by rjdickson
Hi,

Is the output pattern significant, or just how you want to deal with the data?

The most direct way is to test for the specific words toward the top of the rule set:

*&="L" | . | &="T" | . | &="D"
RETYPE [1] O "LTD" "LTD"
RETYPE [2] 0
RETYPE [3] 0
RETYPE [4] 0
RETYPE [5] 0

Note that the xxNAME rule sets already do this.

For JOHN, the same routine in xxNAME (the Periods subroutine) also outputs 'JOHN'.

I hope this helps!

Posted: Sat Dec 27, 2014 4:37 am
by hitmanthesilentassasin
Thanks for your the reply rjdickson.

I wanted to classify the tokens to appropriate class after suppressing noise. the dot sepator was just an example. If I can classify the tokens to appropriate classes then they would get handled with the following patterns else I would have to handle each word in the pattern.

Posted: Sat Dec 27, 2014 8:41 am
by chulett
hitmanthesilentassasin wrote:the dot sepator was just an example
And yet in your post all you mentioned was "I am trying to remove dots following initials and combine them" so that's why you got the answer you did. Always best to lead with a full example / explanation of what you need so you don't have people waste time spinning up an answer that doesn't really help you.

So what exactly falls into your "noise" category? :?

Posted: Sat Dec 27, 2014 7:38 pm
by hitmanthesilentassasin
Craig - in the example I provided the dots separators are the noise. my question was about retyping the cleansed tokens to the classifications defined within the classification table without having to retype each word to specific class.

Thanks for your help!!

Posted: Sun Dec 28, 2014 10:08 pm
by hitmanthesilentassasin
I have managed to find below URL which does exactly what I was looking for. All I need to do is to manage separate list of classifications in the reference table.

http://www-01.ibm.com/support/knowledge ... okens.html

Posted: Mon Dec 29, 2014 9:08 am
by chulett
Excellent.