Match specification-Blocking Columns

Infosphere's Quality Product

Moderators: chulett, rschirm

Post Reply
BuddingDev
Premium Member
Premium Member
Posts: 43
Joined: Wed Feb 08, 2012 8:12 pm
Location: United States

Match specification-Blocking Columns

Post by BuddingDev »

Hi,

Can I use two different columns in one matching block?
If yes then can I set a match pass based on two blocking columns where any of them can be matching.

Is it feasible to create a subset based on block col1 or col 2

Example: If col1 and col2 are the blocking column for pass 1
col1 col2
Record 1:a b
Record 2:a b
Record 3:a
Record 4: b

In the above examples all these four records should be a part of one pool since they have at least one common column value according to my hypothesis.

Please suggest me if it is doable in Unduplicated dependent matching stage.
stuartjvnorton
Participant
Posts: 527
Joined: Thu Apr 19, 2007 1:25 am
Location: Melbourne

Post by stuartjvnorton »

You can have more than 1 blocking field per pass, but all specified blocking fields have to be satisfied within a single pass. If you want either field 1 or field 2 to qualify a record, you would need to have them in separate passes.

As to whether it's doable as undup dependent, it would be possible if:
- you had pass 1 using column 1 to block and records 1, 2 and 3 all exceed the match threshold
- records 1 or 2 are set as the master for the block
- pass 2 uses column 2 to block, so record 1 or 2 and 4 will be added if 4 exceeds the pass 2 match threshold.

So yes it is possible, but it depends on matching fields and cutoffs, which you haven't told us anything about.
BuddingDev
Premium Member
Premium Member
Posts: 43
Joined: Wed Feb 08, 2012 8:12 pm
Location: United States

Post by BuddingDev »

Appreciate your quick response. Now, I am in the process of applying it by working around cutoff value and etc.

However, I have one related doubt to the same problem. Can somebody tell me what happens when one of the matching column's value is missing.


Such as...

Example: If col1, col2 are the blocking columns for pass 1 and matching columns are col1, col2, col3 and col4.

col1 col2 col3 col4
Record 1:a b c d


As col4 is missing from only record 2. What would be the weight for this compared column 4.
rjdickson
Participant
Posts: 378
Joined: Mon Jun 16, 2003 5:28 am
Location: Chicago, USA
Contact:

Post by rjdickson »

By default, populated-to-unpopulated or unpopulated-to-unpoulated compares score as 0. You can change that in the overrides, if desired.

You can see this in the Match Designer if you select the two records, right-click and select 'Compare Weights'.
Regards,
Robert
Post Reply