We have a problem in standardizing addresses like 123 JOHN ST ST LEONARDS using a modified rule set in which ST is standardised to STREET.
QualityStage parses the above address as "123", "JOHN STREET", "ST" (which is not what we want).
Short of using a PREP rule set, is there any smart way for parsing the above address?
Thank you for your time.
ST and other multi-use tokens
-
- Participant
- Posts: 527
- Joined: Thu Apr 19, 2007 1:25 am
- Location: Melbourne
This is probably a combination of things:
- you are trying to parse with locality at the end. Normally it works ok, but with localities starting with classified terms like St you will have issues.
- Did you update Multiple_Semantics when you set ST to be standardised to STREET?
- The default rule for Streets (the subroutine) takes up to the last known Street Type in the address, which includes the St out of St Leonards.
As the whole STREET vs ST thing, I think it might depend on what you did in Multiple_Semantics. It seems like it's converted them to STREET, but then later converted the last one back as part of the
COPY_A [3] {StreetType}
To go ahead with this without using AUPREP, you'd probably need a list of the localities that start with St (and other Street Types that start locality names and any others that start with words that are classified, especially the directionals for Street Type Direction which could be worse)
Check those before or at the start of Multiple_Semantics:
eg
? | M =T= "ST" | & = @ST_LOCALITIES.TBL | $
RETYPE [2] + "SAINT" "SAINT"
etc.
Doable, but not overly nice...
- you are trying to parse with locality at the end. Normally it works ok, but with localities starting with classified terms like St you will have issues.
- Did you update Multiple_Semantics when you set ST to be standardised to STREET?
- The default rule for Streets (the subroutine) takes up to the last known Street Type in the address, which includes the St out of St Leonards.
As the whole STREET vs ST thing, I think it might depend on what you did in Multiple_Semantics. It seems like it's converted them to STREET, but then later converted the last one back as part of the
COPY_A [3] {StreetType}
To go ahead with this without using AUPREP, you'd probably need a list of the localities that start with St (and other Street Types that start locality names and any others that start with words that are classified, especially the directionals for Street Type Direction which could be worse)
Check those before or at the start of Multiple_Semantics:
eg
? | M =T= "ST" | & = @ST_LOCALITIES.TBL | $
RETYPE [2] + "SAINT" "SAINT"
etc.
Doable, but not overly nice...