Page 1 of 1

Cutoff values for QualityStage match

Posted: Fri Jan 27, 2012 10:17 am
by marcelo_almeida
I implemented a process in Match QualityStage v8.7 and I have some problems, described below as follows:

Processing a source of 2900 records, 3 records from one person with identical national identity number, received respectively in the field qsMatchType this values: 'MP', 'DA' and 'DA' in the field qsMatchWeight this values '36.54', '23.2' and '28.53'.
In this case I used the following cutoff values: Match=20 and Clerical=10
This behavior is correct and expected.

However, I decided to examine this particular person and put a filter on the source to only process it, this way, I reduced my source of 2900 records for 3. In this way, the three records of the person were categorized as residuals, even though they are the same person.
I changed the cutoff values for Match=99 e Clerical=0 and still continued to be residuals.

So, I increment my source, including more 26 different people.
With the original Cutoff values (Match=20 and Clerical=10) that three people continued to be residuals.
With the cutoff values Match=99 and Clerical=0 I get this values respectively: qsMatchType ('MP', 'CP' and 'CP') and qsMatchWeight ('13.55', '6.75' and '9.47')

In my case, I process different amounts of records every day, one day the quantity is large and the other is small.
How should I do to set the values of the cutoff if the value qsMatchWeight is influenced by the amount of records in the source?

Thank you very much

Posted: Sat Jan 28, 2012 9:13 am
by rjdickson
Hi Marcelo,

This sounds like normal, expected behavior if your are regenerating the frequency files every time. Try generating the frequency file with the 'full volume', and then use that with just the three records and you should see the exact same results.

By the way, you are using the Match Designer, right? You get a lot more obvious visibility into the matches, and a lot more control over what your see...

Posted: Mon Jan 30, 2012 6:09 am
by marcelo_almeida
Hi Robert,

Yes, I am using the Match Designer.
I tried what you suggested and it worked well.
How often do you think I need to update the frequency file? And I must do this by always using the full source?

And if I start a service that does not have full source? Only those 3 records. How would I do this work?

Thank you very much

Posted: Mon Jan 30, 2012 9:35 am
by rjdickson
Hi Marcello,

The common practice is to update the full volume frequency file either every 'n' months, or after a lot of 'new' data has been added to the source. 'n' is a bit subjective, but many companies are ok with 3-6 months. 'new' generally means matching data that you have not seen before. If, for example, you are dealing with Brazil exclusively, and then add the US, you will want to regenerate your frequencies.

For your service, the common practice is to ALWAYS use the full volume frequencies. The common practice, for exactly the reason you identified, is to no use the Match Frequency stage in the service job.

Posted: Mon Jan 30, 2012 1:19 pm
by marcelo_almeida
Hi Robert,

Thanks for your help!

Now I understood as should be the procedure to update the reference file and I will do as you advised.

However, I do not understand why not works only with 3 records.

But okay, this is enough for my purpose.

You would have some additional material of good practice that you could share with me by email? I would love to study more on this subject.

Thank you very much!

Posted: Wed Feb 01, 2012 5:06 am
by rjdickson
Hi Marcelo,

The reason the scores are different with just 3 records is because the frequency of occurance effects the score.

Some references :
http://publib.boulder.ibm.com/infocente ... 3%68%22%20

QualityStage Redbook: http://www.redbooks.ibm.com/abstracts/sg247546.html