Warning in Reference Match During Blocking

Infosphere's Quality Product

Moderators: chulett, rschirm

Post Reply
Madhumitha_Raghunathan
Premium Member
Premium Member
Posts: 59
Joined: Fri Apr 22, 2011 8:02 am

Warning in Reference Match During Blocking

Post by Madhumitha_Raghunathan »

Hi All,

I have designed a Reference Match job where the Data will contain around 10k - 100k records and my reference set right now contains around 3 million and is expected to grow.

The match is being done on the first and last names and the blocking is on the NYSIIS values of the first and last name. But I am getting the following warning:

The number of records in the reference block with key NYSIIS_LAST_NAME(HAL) exceeds the maximum number specified of 10000.
All records in the block will be treated as residuals


The IBM site: http://www-01.ibm.com/support/docview.w ... wg21409638
Suggests that we dont update the Overflow values (which also can have a max of only 40k) but instead limit the number of records with invalid or default values.
But I am filtering all the invalid values much before it reaches Matching.

Is there any other way to prevent these records from becoming residual or impact the matching process? Would be grateful if anyone can point me in the right direction.
Thanks,
Madhumitha
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

You need to reduce the number of records per block. Usually the way to do this is to add one or more additional blocking columns.

Think about the implications. If you have 10,000 record in a block, then you have 10,000 x 10,000 (10 million) pairwise comparisons that have to be made, just for that block.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply