Page 1 of 1

Many to Many relationship matching between 2 files

Posted: Wed Jul 13, 2005 6:52 am
by saadmirza
Hi All,
Can I do a many to many match in Qualitystage using Match stage...?

Regards,
Saad

Re: Many to Many relationship matching between 2 files

Posted: Wed Jul 13, 2005 2:39 pm
by JamasE
saadmirza wrote: Can I do a many to many match in Qualitystage using Match stage...?
Look at GEOMATCH DUPLICATE.
This allows many records on File A to match to many records on File B (as long as the "duplicate" records on File B are above the duplicate cut off, so simply leave it at 0, or make it the same as the match cut off).

(GEOMATCH is the one to many option, and GEOMATCH MULTIPLE is many to many where duplicates on File B have to have the same weight.)

Cheers,
Jamas

Posted: Thu Jul 14, 2005 4:11 am
by saadmirza
Thanks,
Also can you please elaborate, what exactly the difference is between the Geomatch Multiple and Geomatch Duplicate.

Which one is suitabe for me if I need to have many to many matching in QS...?

Thanks again for you reply.

Saad

Posted: Thu Jul 14, 2005 2:48 pm
by JamasE
saadmirza wrote:Which one is suitabe for me if I need to have many to many matching in QS...?
Given FileA and FileB, records A1, A2, B1, B2, B3, say we get the following record pairs and weights:

A1,B1 20 XB ('the' matched record)
A1,B2 20 DB (duplicate on B)
A1,B3 15 DB (duplicate on B)
A2,B1 15 DA (duplicate on A)

If we have Geomatch only, with cut-off at 20, only A1,B1 will be accepted. Geomatch only with cut-off at 15, A1,B1 and A2,B1 both accepted.

Geomatch Multiple, with cut-off at 15 (duplicate cut-off the same), will accept A1,B1, A1,B2 and A2,B1 as links, as A1,B1 and A1,B2 have the same weight.

Geomatch Duplicate with cut-off at 15 (duplicate cut-off the same) will accept all records as matches.

I think you want Geomatch Duplicate for your project to allow any many-to-many matching above your main cut-off.

(Section 12, page 12 in your QS User Guide gives this distinction.)

Cheers,
Jamas

Posted: Fri Jul 15, 2005 7:18 am
by saadmirza
Thanks a lot for the detailed explanation...
just a small clarification...
If I use Gematch Duplicate, then it really doesnt matter which file should be File A or File B....I dont have to care which should be my reference file and which should be my data file..???

anyways thx again..

Saad

Posted: Fri Jul 15, 2005 6:31 pm
by ray.wurlod
It matters for performance. FileA should be the larger.

Posted: Sun Jul 17, 2005 10:46 am
by saadmirza
Thanks Ray,
Yes it does matter for the performance...but does it really matter for the functionality....because i am still not convinced even after lots of testing...

Please suggest..

Thanks,
SM

Posted: Sun Jul 17, 2005 2:41 pm
by JamasE
saadmirza wrote: Yes it does matter for the performance...but does it really matter for the functionality....because i am still not convinced even after lots of testing...
Can't think of a major reason why it should matter, but then again the order of the files in a typical one-one matching matters functionally...(*).

Perhaps if one file is expected to be nearly one to many it's best to have that as File B (the geocode file). To be clearer, in one of our projects we were using many-to-many between File A and File B, but in most instances expected a many-to-one to be the norm, which dictated which was File B. (And agreeing with Ray, the larger file was File A.)

Cheers,
Jamas

(*) If you are matching Fields A1 and A2 to Fields B1 and B2, especially outside of an array, it does matter to the frequency table which is A and which is B.

Posted: Sun Jul 17, 2005 11:46 pm
by ray.wurlod
saadmirza wrote:Thanks Ray,
Yes it does matter for the performance...but does it really matter for the functionality....because i am still not convinced even after lots of testing...

Please suggest..

Thanks,
SM
No, it's like a fuzzy inner join. Inner joins are commutative. 8)