Adding Tokens

Infosphere's Quality Product

Moderators: chulett, rschirm

Post Reply
U
Participant
Posts: 230
Joined: Tue Apr 17, 2007 8:23 pm
Location: Singapore

Adding Tokens

Post by U »

A new week, a new problem. We have some dirty data in which space characters are missing. For example "ACME CHEMICALS PTYLTD" has not space between "PTY" and "LTD".

As parsed by AUNAME rule set, this string has three classification tokens (+W+).

Is there some way in PAL to force this string to have four tokens, namely "ACME CHEMICALS PTY LTD", which would lead to a pattern of +WWO ?

We tried

Code: Select all

+ | W | + = "PTYLTD"
COPY "PTY LTD" temp
RETYPE [3] & temp temp
and neither PTY nor LTD remained available.

We tried

Code: Select all

+ | W | + = "PTYLTD"
RETYPE [3] W "PTY" "PTY"
RETYPE [4] O "LTD "LTD"
but in this case PTY was kept and LTD was dropped.

I should probably add that there are worse examples that need to be solved, for example "SINGAPOREPTELTD" or "AUSTRALIAPTYLTD" which we'd like to split into three tokens.

And, ideally, we'd like to use standardized abbreviations of tokens like "LDT" (which should be "LTD" in this context).

Thank you for your time.
stuartjvnorton
Participant
Posts: 527
Joined: Thu Apr 19, 2007 1:25 am
Location: Melbourne

Post by stuartjvnorton »

Take a look at CONVERT_R and CONVERT_S.

CONVERT_R should work for just "PTYLTD" because it' would be a small set of known terms, but for some of the others you're alluding to you would need a couple of goes at CONVERT_S to shave off a suffix at a time.

See pp35,38 of the Pattern Action Reference.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Note, too, that if you're going to process using CONVERT_R to produce additional tokens, then you should reset at least the right margin. To be safe, reset both margins.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
U
Participant
Posts: 230
Joined: Tue Apr 17, 2007 8:23 pm
Location: Singapore

Post by U »

Great advice from you both. Many thanks, we have a working solution.
For future searchers, here's what we did.

Code: Select all

*& = "AUSTPTY", "AUSTRALIAPTY", "PTYLIMIT", "PTYLTD", "PYLTD"
CALL CrushedTokens

...

\SUB CrushedTokens

0*+ = "AUSTPTY"
CONVERT_R [1] @NMCRUSHED.TBL
SET_L_MARGIN OPERAND [BEGIN_TOKEN]
SET_R_MARGIN OPERAND [END_TOKEN]

0*+ = "AUSTRALIAPTY"
CONVERT_R [1] @NMCRUSHED.TBL
SET_L_MARGIN OPERAND [BEGIN_TOKEN]
SET_R_MARGIN OPERAND [END_TOKEN]

0*+ = "PTYLTD"
CONVERT_R [1] @NMCRUSHED.TBL
SET_L_MARGIN OPERAND [BEGIN_TOKEN]
SET_R_MARGIN OPERAND [END_TOKEN]

0*+ = "PTYLIMIT"     ; picks up PTYLIMITED
CONVERT_R [1] @NMCRUSHED.TBL
SET_L_MARGIN OPERAND [BEGIN_TOKEN]
SET_R_MARGIN OPERAND [END_TOKEN]

0*+ = "PYLTD"
CONVERT_R [1] @NMCRUSHED.TBL
SET_L_MARGIN OPERAND [BEGIN_TOKEN]
SET_R_MARGIN OPERAND [END_TOKEN]

\END_SUB
Content of the NMCRUSHED.TBL follows.
;;QualityStage v8.0

Code: Select all

\FORMAT\ SORT=Y
AUSTPTY "AUST PTY"
AUSTRALIAPTY "AUSTRALIA PTY"
PTYLIMIT "PTY LIMIT"
PTYLTD "PTY LTD"
PYLTD "PTY LTD"
Post Reply