Page 1 of 1

Warning in Reference Match During Blocking

Posted: Fri Oct 25, 2013 1:15 pm
by Madhumitha_Raghunathan
Hi All,

I have designed a Reference Match job where the Data will contain around 10k - 100k records and my reference set right now contains around 3 million and is expected to grow.

The match is being done on the first and last names and the blocking is on the NYSIIS values of the first and last name. But I am getting the following warning:

The number of records in the reference block with key NYSIIS_LAST_NAME(HAL) exceeds the maximum number specified of 10000.
All records in the block will be treated as residuals


The IBM site: http://www-01.ibm.com/support/docview.w ... wg21409638
Suggests that we dont update the Overflow values (which also can have a max of only 40k) but instead limit the number of records with invalid or default values.
But I am filtering all the invalid values much before it reaches Matching.

Is there any other way to prevent these records from becoming residual or impact the matching process? Would be grateful if anyone can point me in the right direction.

Posted: Fri Oct 25, 2013 1:58 pm
by ray.wurlod
You need to reduce the number of records per block. Usually the way to do this is to add one or more additional blocking columns.

Think about the implications. If you have 10,000 record in a block, then you have 10,000 x 10,000 (10 million) pairwise comparisons that have to be made, just for that block.