Page 1 of 1

Rule set understanding

Posted: Mon Nov 02, 2015 6:46 am
by vamsi_4a6
i am trying to understand the rule set.stuck up with following questions.
USNAME RULE set:
classification:
;;QualityStage v8.0
\FORMAT\ SORT=Y
;-------------------------------------------------------------------------------
; USNAME Classification Table
;-------------------------------------------------------------------------------
; Classification Legend
;-------------------------------------------------------------------------------
; A - Abbreviations (Misspellings)
; C - Common Words
; F - First Names
; G - Individual Name Generations
; I - Initials
; L - Last Name Prefixes
; O - Organization Name Suffixes
; P - Individual Name Prefixes
; Q - Additional Name Qualifiers
; S - Individual Name Suffixes
; W - Organization Name Words
; Z - Delimiters
;-------------------------------------------------------------------------------
; Table Sort Order: 51-51 Ascending, 26-50 Ascending, 1-25 Ascending
;-------------------------------------------------------------------------------
;END ENDOWMENT W
AN AN C
AND AND C
AS AS C
AT AT C
BY BY C
FOR FOR C
FROM FROM C
IN IN C
OF OF C
ON ON C
OR OR C
THE THE C
TO TO C
WITH WITH C
AARON AARON F
ABBEY ABBEY F
ABBIE ABBIE F
ABBY ABBY F
ABDUL ABDUL F
ABE ABE F
ABEL ABEL F
ABIGAIL ABIGAIL F
ABRAHAM ABRAHAM F
ABRAM ABRAM F
ADA ADA F
ADAH ADAH F

Doubt 1:what is the meaning of below code in classification?

ABIGAIL ABIGAIL F
ABRAHAM ABRAHAM F

doubt2:Dictionary File code is not clear.it would be better if someone explain code for each block.

Example:
;-------------------------------------------------------------------------------
; Business Intelligence Fields
;-------------------------------------------------------------------------------
NameType C 1 S NameType ;0001-0001

;-------------------------------------------------------------------------------
; Matching Fields
;-------------------------------------------------------------------------------
MatchFirstName C 25 S MatchFirstName ;0203-0227
------------------------------------------------------------------------------

;-------------------------------------------------------------------------------
; Reporting Fields
;-------------------------------------------------------------------------------
UnhandledPattern C 30 S UnhandledPattern ;0420-0449
UnhandledData C 100 S UnhandledData ;0450-0549
InputPattern C 30 S InputPattern ;0550-0579
ExceptionData C 25 S ExceptionData ;0580-0604
UserOverrideFlag C 2 S UserOverrideFlag ;0605-0606

;-------------------------------------------------------------------------------
; USNAME Dictionary File
;-------------------------------------------------------------------------------
; Total Dictionary Length = 606
;-------------------------------------------------------------------------------
; Business Intelligence Fields
;-------------------------------------------------------------------------------
NameType C 1 S NameType ;0001-0001
GenderCode C 1 S GenderCode ;0002-0002
NamePrefix C 20 S NamePrefix ;0003-0022
FirstName C 25 S FirstName ;0023-0047
MiddleName C 25 S MiddleName ;0048-0072
PrimaryName C 50 S PrimaryName ;0073-0122
NameGeneration C 10 S NameGeneration ;0123-0132
NameSuffix C 20 S NameSuffix ;0133-0152
AdditionalName C 50 S AdditionalName ;0153-0202
;-------------------------------------------------------------------------------
; Matching Fields
;-------------------------------------------------------------------------------
MatchFirstName C 25 S MatchFirstName ;0203-0227
MatchFirstNameNYSIIS C 8 X MatchFirstNameNYSIIS ;0228-0235
MatchFirstNameRVSNDX C 4 Z MatchFirstNameRVSNDX ;0236-0239
MatchPrimaryName C 50 S MatchPrimaryName ;0240-0289
MatchPrimaryNameHashKey C 10 S MatchPrimaryNameHashKey ;0290-0299
MatchPrimaryNamePackKey C 20 S MatchPrimaryNamePackKey ;0300-0319
NumofMatchPrimaryWords C 1 S NumofMatchPrimaryWords ;0320-0320
MatchPrimaryWord1 C 15 S MatchPrimaryWord1 ;0321-0335
MatchPrimaryWord2 C 15 S MatchPrimaryWord2 ;0336-0350
MatchPrimaryWord3 C 15 S MatchPrimaryWord3 ;0351-0365
MatchPrimaryWord4 C 15 S MatchPrimaryWord4 ;0366-0380
MatchPrimaryWord5 C 15 S MatchPrimaryWord5 ;0381-0395
MatchPrimaryWord1NYSIIS C 8 X MatchPrimaryWord1NYSIIS ;0396-0403
MatchPrimaryWord1RVSNDX C 4 Z MatchPrimaryWord1RVSNDX ;0404-0407
MatchPrimaryWord2NYSIIS C 8 X MatchPrimaryWord2NYSIIS ;0408-0415
MatchPrimaryWord2RVSNDX C 4 Z MatchPrimaryWord2RVSNDX ;0416-0419
;-------------------------------------------------------------------------------
; Reporting Fields
;-------------------------------------------------------------------------------
UnhandledPattern C 30 S UnhandledPattern ;0420-0449
UnhandledData C 100 S UnhandledData ;0450-0549
InputPattern C 30 S InputPattern ;0550-0579
ExceptionData C 25 S ExceptionData ;0580-0604
UserOverrideFlag C 2 S UserOverrideFlag ;0605-0606

Posted: Mon Nov 02, 2015 9:50 am
by rjdickson
Hi,

Answer to question 1:
Please see http://www-01.ibm.com/support/knowledge ... _file.html

Answer to question 2:
Please see http://www-01.ibm.com/support/knowledge ... _file.html


I hope this helps!

Posted: Mon Nov 02, 2015 9:55 am
by vamsi_4a6
Thanks for Input.For Dictionary file what is meant by business, matching, and reporting fields

Posted: Mon Nov 02, 2015 11:55 am
by rjdickson
Hi,

Just a textual description ('Documentation') of the intent for those output columns.

Robert

Posted: Mon Nov 02, 2015 4:35 pm
by ray.wurlod
The Business Intelligence fields are typically derived from the input and are likely to be transferred through to final output.

The Matching fields are more likely to be used to drive Matching and are unlikely to be transferred through to final output.

The five Reporting fields give insight as to how well the Standardization is working, and are typically used to tune the Rule Set to make it work in a more accurately targeted fashion.