records that should match but don't

Infosphere's Quality Product

Moderators: chulett, rschirm

Post Reply
tcat2000
Participant
Posts: 6
Joined: Wed Jun 01, 2005 1:35 pm

records that should match but don't

Post by tcat2000 »

hi,

i ran my customer file thru qualitystage and undup on customer id. the output is perfect with the exception of 1 group of customer ids. It turned out that this group contain 5000 records and is the largest group in my file. The second largest group contains 1200 records and it was grouped okay. My m-prob and u-prob is set @ default of .9 and .01 with 0 cutoffs.

what am i doing wrong?... how can i get the largest group together? I tried adjusting the m-prob & u-prob but still same results. I have isolated the problem group and ran numerous variations and yet they remained seperated.

I am still new at QS... and have only been using it for 2 weeks to learn its potential. Any help would be greatly appreciated.

Thanks
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

What are the aggregate weights for the XA entry and the first and last DA entries for this difficult group (block of records)?

If you can see that they're all in the same block (I'm assuming this), why do you claim that they don't match? Surely being in the same block is the defintion of match, when cutoff is 0.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
tcat2000
Participant
Posts: 6
Joined: Wed Jun 01, 2005 1:35 pm

Post by tcat2000 »

i'm sorry to say that i haven't gotten to the aggregate weights for XA and DA entries yet... i couldn't find anything in the manual relating to it.

On my extract file, i constructed a column with the following command:
MOVE @SET8

It created a column called DF and i believed that is the group identifier for the group... however there is a different number for every record of the group.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

OK, that suggests that they're not in the same block. You also need to MOVE @WGT to get the agreement weights. You can select this from the same drop down list where you found @SET8.
You'll also need to advise which columns you used for blocking (forming the blocks of potential duplicates). Ideally these should be or include fuzzy choices, such as NYSIIS of Main Name.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
tcat2000
Participant
Posts: 6
Joined: Wed Jun 01, 2005 1:35 pm

Post by tcat2000 »

i just discovered something wierd... my test file has 5326 records. i am blocking on customer id and matching on the same column. the file does not have name and address info .... just account numbers, ssn & customer id. this is a test file.

this file should have only 1 grouping... customer id and they have the same customer id accross the file. if i ran the file thru undup... i get 5326 different group identifers however if i reduce the number of records to 1500, 2500, 5000 respectively... i'm getting 1 group identifer which is what i want.

It looks like this have something to do with the number of records in a grouping... if it is, how can i increased that number? i read somewhere in the manual about increasing the buffer size on the advanced run options. I set mine's to 30000 and yet nothing.

Thanks for the help
Post Reply