records that should match but don't

tcat2000 · Post by **tcat2000** » Sat Jul 30, 2005 3:33 pm

hi,

i ran my customer file thru qualitystage and undup on customer id. the output is perfect with the exception of 1 group of customer ids. It turned out that this group contain 5000 records and is the largest group in my file. The second largest group contains 1200 records and it was grouped okay. My m-prob and u-prob is set @ default of .9 and .01 with 0 cutoffs.

what am i doing wrong?... how can i get the largest group together? I tried adjusting the m-prob & u-prob but still same results. I have isolated the problem group and ran numerous variations and yet they remained seperated.

I am still new at QS... and have only been using it for 2 weeks to learn its potential. Any help would be greatly appreciated.

Thanks

ray.wurlod · Post by **ray.wurlod** » Sat Jul 30, 2005 5:33 pm

What are the aggregate weights for the XA entry and the first and last DA entries for this difficult group (block of records)?

If you can see that they're all in the same block (I'm assuming this), why do you claim that they don't match? Surely being in the same block is the defintion of match, when cutoff is 0.

tcat2000 · Post by **tcat2000** » Sat Jul 30, 2005 8:57 pm

i'm sorry to say that i haven't gotten to the aggregate weights for XA and DA entries yet... i couldn't find anything in the manual relating to it.

On my extract file, i constructed a column with the following command:
MOVE @SET8

It created a column called DF and i believed that is the group identifier for the group... however there is a different number for every record of the group.

ray.wurlod · Post by **ray.wurlod** » Sun Jul 31, 2005 2:46 am

OK, that suggests that they're not in the same block. You also need to MOVE @WGT to get the agreement weights. You can select this from the same drop down list where you found @SET8.
You'll also need to advise which columns you used for blocking (forming the blocks of potential duplicates). Ideally these should be or include fuzzy choices, such as NYSIIS of Main Name.

tcat2000 · Post by **tcat2000** » Sun Jul 31, 2005 10:22 am

i just discovered something wierd... my test file has 5326 records. i am blocking on customer id and matching on the same column. the file does not have name and address info .... just account numbers, ssn & customer id. this is a test file.

this file should have only 1 grouping... customer id and they have the same customer id accross the file. if i ran the file thru undup... i get 5326 different group identifers however if i reduce the number of records to 1500, 2500, 5000 respectively... i'm getting 1 group identifer which is what i want.

It looks like this have something to do with the number of records in a grouping... if it is, how can i increased that number? i read somewhere in the manual about increasing the buffer size on the advanced run options. I set mine's to 30000 and yet nothing.

Thanks for the help