Rule set understanding

Infosphere's Quality Product

Moderators: chulett, rschirm

Post Reply
vamsi_4a6
Participant
Posts: 95
Joined: Wed Jun 04, 2014 12:06 am

Rule set understanding

Post by vamsi_4a6 »

i am trying to understand the rule set.stuck up with following questions.
USNAME RULE set:
classification:
;;QualityStage v8.0
\FORMAT\ SORT=Y
;-------------------------------------------------------------------------------
; USNAME Classification Table
;-------------------------------------------------------------------------------
; Classification Legend
;-------------------------------------------------------------------------------
; A - Abbreviations (Misspellings)
; C - Common Words
; F - First Names
; G - Individual Name Generations
; I - Initials
; L - Last Name Prefixes
; O - Organization Name Suffixes
; P - Individual Name Prefixes
; Q - Additional Name Qualifiers
; S - Individual Name Suffixes
; W - Organization Name Words
; Z - Delimiters
;-------------------------------------------------------------------------------
; Table Sort Order: 51-51 Ascending, 26-50 Ascending, 1-25 Ascending
;-------------------------------------------------------------------------------
;END ENDOWMENT W
AN AN C
AND AND C
AS AS C
AT AT C
BY BY C
FOR FOR C
FROM FROM C
IN IN C
OF OF C
ON ON C
OR OR C
THE THE C
TO TO C
WITH WITH C
AARON AARON F
ABBEY ABBEY F
ABBIE ABBIE F
ABBY ABBY F
ABDUL ABDUL F
ABE ABE F
ABEL ABEL F
ABIGAIL ABIGAIL F
ABRAHAM ABRAHAM F
ABRAM ABRAM F
ADA ADA F
ADAH ADAH F

Doubt 1:what is the meaning of below code in classification?

ABIGAIL ABIGAIL F
ABRAHAM ABRAHAM F

doubt2:Dictionary File code is not clear.it would be better if someone explain code for each block.

Example:
;-------------------------------------------------------------------------------
; Business Intelligence Fields
;-------------------------------------------------------------------------------
NameType C 1 S NameType ;0001-0001

;-------------------------------------------------------------------------------
; Matching Fields
;-------------------------------------------------------------------------------
MatchFirstName C 25 S MatchFirstName ;0203-0227
------------------------------------------------------------------------------

;-------------------------------------------------------------------------------
; Reporting Fields
;-------------------------------------------------------------------------------
UnhandledPattern C 30 S UnhandledPattern ;0420-0449
UnhandledData C 100 S UnhandledData ;0450-0549
InputPattern C 30 S InputPattern ;0550-0579
ExceptionData C 25 S ExceptionData ;0580-0604
UserOverrideFlag C 2 S UserOverrideFlag ;0605-0606

;-------------------------------------------------------------------------------
; USNAME Dictionary File
;-------------------------------------------------------------------------------
; Total Dictionary Length = 606
;-------------------------------------------------------------------------------
; Business Intelligence Fields
;-------------------------------------------------------------------------------
NameType C 1 S NameType ;0001-0001
GenderCode C 1 S GenderCode ;0002-0002
NamePrefix C 20 S NamePrefix ;0003-0022
FirstName C 25 S FirstName ;0023-0047
MiddleName C 25 S MiddleName ;0048-0072
PrimaryName C 50 S PrimaryName ;0073-0122
NameGeneration C 10 S NameGeneration ;0123-0132
NameSuffix C 20 S NameSuffix ;0133-0152
AdditionalName C 50 S AdditionalName ;0153-0202
;-------------------------------------------------------------------------------
; Matching Fields
;-------------------------------------------------------------------------------
MatchFirstName C 25 S MatchFirstName ;0203-0227
MatchFirstNameNYSIIS C 8 X MatchFirstNameNYSIIS ;0228-0235
MatchFirstNameRVSNDX C 4 Z MatchFirstNameRVSNDX ;0236-0239
MatchPrimaryName C 50 S MatchPrimaryName ;0240-0289
MatchPrimaryNameHashKey C 10 S MatchPrimaryNameHashKey ;0290-0299
MatchPrimaryNamePackKey C 20 S MatchPrimaryNamePackKey ;0300-0319
NumofMatchPrimaryWords C 1 S NumofMatchPrimaryWords ;0320-0320
MatchPrimaryWord1 C 15 S MatchPrimaryWord1 ;0321-0335
MatchPrimaryWord2 C 15 S MatchPrimaryWord2 ;0336-0350
MatchPrimaryWord3 C 15 S MatchPrimaryWord3 ;0351-0365
MatchPrimaryWord4 C 15 S MatchPrimaryWord4 ;0366-0380
MatchPrimaryWord5 C 15 S MatchPrimaryWord5 ;0381-0395
MatchPrimaryWord1NYSIIS C 8 X MatchPrimaryWord1NYSIIS ;0396-0403
MatchPrimaryWord1RVSNDX C 4 Z MatchPrimaryWord1RVSNDX ;0404-0407
MatchPrimaryWord2NYSIIS C 8 X MatchPrimaryWord2NYSIIS ;0408-0415
MatchPrimaryWord2RVSNDX C 4 Z MatchPrimaryWord2RVSNDX ;0416-0419
;-------------------------------------------------------------------------------
; Reporting Fields
;-------------------------------------------------------------------------------
UnhandledPattern C 30 S UnhandledPattern ;0420-0449
UnhandledData C 100 S UnhandledData ;0450-0549
InputPattern C 30 S InputPattern ;0550-0579
ExceptionData C 25 S ExceptionData ;0580-0604
UserOverrideFlag C 2 S UserOverrideFlag ;0605-0606
rjdickson
Participant
Posts: 378
Joined: Mon Jun 16, 2003 5:28 am
Location: Chicago, USA
Contact:

Post by rjdickson »

Hi,

Answer to question 1:
Please see http://www-01.ibm.com/support/knowledge ... _file.html

Answer to question 2:
Please see http://www-01.ibm.com/support/knowledge ... _file.html


I hope this helps!
Regards,
Robert
vamsi_4a6
Participant
Posts: 95
Joined: Wed Jun 04, 2014 12:06 am

Post by vamsi_4a6 »

Thanks for Input.For Dictionary file what is meant by business, matching, and reporting fields
rjdickson
Participant
Posts: 378
Joined: Mon Jun 16, 2003 5:28 am
Location: Chicago, USA
Contact:

Post by rjdickson »

Hi,

Just a textual description ('Documentation') of the intent for those output columns.

Robert
Regards,
Robert
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

The Business Intelligence fields are typically derived from the input and are likely to be transferred through to final output.

The Matching fields are more likely to be used to drive Matching and are unlikely to be transferred through to final output.

The five Reporting fields give insight as to how well the Standardization is working, and are typically used to tune the Rule Set to make it work in a more accurately targeted fashion.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply