Hi All,
I have designed a Reference Match job where the Data will contain around 10k - 100k records and my reference set right now contains around 3 million and is expected to grow.
The match is being done on the first and last names and the blocking is on the NYSIIS values of the first and last name. But I am getting the following warning:
The number of records in the reference block with key NYSIIS_LAST_NAME(HAL) exceeds the maximum number specified of 10000.
All records in the block will be treated as residuals
The IBM site: http://www-01.ibm.com/support/docview.w ... wg21409638
Suggests that we dont update the Overflow values (which also can have a max of only 40k) but instead limit the number of records with invalid or default values.
But I am filtering all the invalid values much before it reaches Matching.
Is there any other way to prevent these records from becoming residual or impact the matching process? Would be grateful if anyone can point me in the right direction.
Warning in Reference Match During Blocking
-
- Premium Member
- Posts: 59
- Joined: Fri Apr 22, 2011 8:02 am
Warning in Reference Match During Blocking
Thanks,
Madhumitha
Madhumitha
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
You need to reduce the number of records per block. Usually the way to do this is to add one or more additional blocking columns.
Think about the implications. If you have 10,000 record in a block, then you have 10,000 x 10,000 (10 million) pairwise comparisons that have to be made, just for that block.
Think about the implications. If you have 10,000 record in a block, then you have 10,000 x 10,000 (10 million) pairwise comparisons that have to be made, just for that block.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.