Many to Many relationship matching between 2 files

Infosphere's Quality Product

Moderators: chulett, rschirm

Post Reply
saadmirza
Participant
Posts: 76
Joined: Tue Mar 29, 2005 2:57 am

Many to Many relationship matching between 2 files

Post by saadmirza »

Hi All,
Can I do a many to many match in Qualitystage using Match stage...?

Regards,
Saad
JamasE
Participant
Posts: 32
Joined: Sun Aug 31, 2003 5:52 pm

Re: Many to Many relationship matching between 2 files

Post by JamasE »

saadmirza wrote: Can I do a many to many match in Qualitystage using Match stage...?
Look at GEOMATCH DUPLICATE.
This allows many records on File A to match to many records on File B (as long as the "duplicate" records on File B are above the duplicate cut off, so simply leave it at 0, or make it the same as the match cut off).

(GEOMATCH is the one to many option, and GEOMATCH MULTIPLE is many to many where duplicates on File B have to have the same weight.)

Cheers,
Jamas
saadmirza
Participant
Posts: 76
Joined: Tue Mar 29, 2005 2:57 am

Post by saadmirza »

Thanks,
Also can you please elaborate, what exactly the difference is between the Geomatch Multiple and Geomatch Duplicate.

Which one is suitabe for me if I need to have many to many matching in QS...?

Thanks again for you reply.

Saad
JamasE
Participant
Posts: 32
Joined: Sun Aug 31, 2003 5:52 pm

Post by JamasE »

saadmirza wrote:Which one is suitabe for me if I need to have many to many matching in QS...?
Given FileA and FileB, records A1, A2, B1, B2, B3, say we get the following record pairs and weights:

A1,B1 20 XB ('the' matched record)
A1,B2 20 DB (duplicate on B)
A1,B3 15 DB (duplicate on B)
A2,B1 15 DA (duplicate on A)

If we have Geomatch only, with cut-off at 20, only A1,B1 will be accepted. Geomatch only with cut-off at 15, A1,B1 and A2,B1 both accepted.

Geomatch Multiple, with cut-off at 15 (duplicate cut-off the same), will accept A1,B1, A1,B2 and A2,B1 as links, as A1,B1 and A1,B2 have the same weight.

Geomatch Duplicate with cut-off at 15 (duplicate cut-off the same) will accept all records as matches.

I think you want Geomatch Duplicate for your project to allow any many-to-many matching above your main cut-off.

(Section 12, page 12 in your QS User Guide gives this distinction.)

Cheers,
Jamas
saadmirza
Participant
Posts: 76
Joined: Tue Mar 29, 2005 2:57 am

Post by saadmirza »

Thanks a lot for the detailed explanation...
just a small clarification...
If I use Gematch Duplicate, then it really doesnt matter which file should be File A or File B....I dont have to care which should be my reference file and which should be my data file..???

anyways thx again..

Saad
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

It matters for performance. FileA should be the larger.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
saadmirza
Participant
Posts: 76
Joined: Tue Mar 29, 2005 2:57 am

Post by saadmirza »

Thanks Ray,
Yes it does matter for the performance...but does it really matter for the functionality....because i am still not convinced even after lots of testing...

Please suggest..

Thanks,
SM
JamasE
Participant
Posts: 32
Joined: Sun Aug 31, 2003 5:52 pm

Post by JamasE »

saadmirza wrote: Yes it does matter for the performance...but does it really matter for the functionality....because i am still not convinced even after lots of testing...
Can't think of a major reason why it should matter, but then again the order of the files in a typical one-one matching matters functionally...(*).

Perhaps if one file is expected to be nearly one to many it's best to have that as File B (the geocode file). To be clearer, in one of our projects we were using many-to-many between File A and File B, but in most instances expected a many-to-one to be the norm, which dictated which was File B. (And agreeing with Ray, the larger file was File A.)

Cheers,
Jamas

(*) If you are matching Fields A1 and A2 to Fields B1 and B2, especially outside of an array, it does matter to the frequency table which is A and which is B.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

saadmirza wrote:Thanks Ray,
Yes it does matter for the performance...but does it really matter for the functionality....because i am still not convinced even after lots of testing...

Please suggest..

Thanks,
SM
No, it's like a fuzzy inner join. Inner joins are commutative. 8)
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply