Page 1 of 1

Adding Tokens

Posted: Sun Jun 15, 2014 6:05 pm
by U
A new week, a new problem. We have some dirty data in which space characters are missing. For example "ACME CHEMICALS PTYLTD" has not space between "PTY" and "LTD".

As parsed by AUNAME rule set, this string has three classification tokens (+W+).

Is there some way in PAL to force this string to have four tokens, namely "ACME CHEMICALS PTY LTD", which would lead to a pattern of +WWO ?

We tried

Code: Select all

+ | W | + = "PTYLTD"
COPY "PTY LTD" temp
RETYPE [3] & temp temp
and neither PTY nor LTD remained available.

We tried

Code: Select all

+ | W | + = "PTYLTD"
RETYPE [3] W "PTY" "PTY"
RETYPE [4] O "LTD "LTD"
but in this case PTY was kept and LTD was dropped.

I should probably add that there are worse examples that need to be solved, for example "SINGAPOREPTELTD" or "AUSTRALIAPTYLTD" which we'd like to split into three tokens.

And, ideally, we'd like to use standardized abbreviations of tokens like "LDT" (which should be "LTD" in this context).

Thank you for your time.

Posted: Sun Jun 15, 2014 7:10 pm
by stuartjvnorton
Take a look at CONVERT_R and CONVERT_S.

CONVERT_R should work for just "PTYLTD" because it' would be a small set of known terms, but for some of the others you're alluding to you would need a couple of goes at CONVERT_S to shave off a suffix at a time.

See pp35,38 of the Pattern Action Reference.

Posted: Sun Jun 15, 2014 10:17 pm
by ray.wurlod
Note, too, that if you're going to process using CONVERT_R to produce additional tokens, then you should reset at least the right margin. To be safe, reset both margins.

Posted: Sun Jun 15, 2014 10:23 pm
by U
Great advice from you both. Many thanks, we have a working solution.
For future searchers, here's what we did.

Code: Select all

*& = "AUSTPTY", "AUSTRALIAPTY", "PTYLIMIT", "PTYLTD", "PYLTD"
CALL CrushedTokens

...

\SUB CrushedTokens

0*+ = "AUSTPTY"
CONVERT_R [1] @NMCRUSHED.TBL
SET_L_MARGIN OPERAND [BEGIN_TOKEN]
SET_R_MARGIN OPERAND [END_TOKEN]

0*+ = "AUSTRALIAPTY"
CONVERT_R [1] @NMCRUSHED.TBL
SET_L_MARGIN OPERAND [BEGIN_TOKEN]
SET_R_MARGIN OPERAND [END_TOKEN]

0*+ = "PTYLTD"
CONVERT_R [1] @NMCRUSHED.TBL
SET_L_MARGIN OPERAND [BEGIN_TOKEN]
SET_R_MARGIN OPERAND [END_TOKEN]

0*+ = "PTYLIMIT"     ; picks up PTYLIMITED
CONVERT_R [1] @NMCRUSHED.TBL
SET_L_MARGIN OPERAND [BEGIN_TOKEN]
SET_R_MARGIN OPERAND [END_TOKEN]

0*+ = "PYLTD"
CONVERT_R [1] @NMCRUSHED.TBL
SET_L_MARGIN OPERAND [BEGIN_TOKEN]
SET_R_MARGIN OPERAND [END_TOKEN]

\END_SUB
Content of the NMCRUSHED.TBL follows.
;;QualityStage v8.0

Code: Select all

\FORMAT\ SORT=Y
AUSTPTY "AUST PTY"
AUSTRALIAPTY "AUSTRALIA PTY"
PTYLIMIT "PTY LIMIT"
PTYLTD "PTY LTD"
PYLTD "PTY LTD"