DSXchange: DataStage and IBM Websphere Data Integration Forum
View next topic
View previous topic
Add To Favorites
Author Message
srds2



Group memberships:
Premium Members

Joined: 29 Nov 2011
Posts: 66

Points: 821

Post Posted: Wed Jul 31, 2013 11:34 am Reply with quote    Back to top    

DataStage® Release: 8x
Job Type: Parallel
OS: Unix
Hi, I am using Reference Match (May to One Multiple) and having the same issue with my CHAR comparison. Can anyone help me to understand how this special handling works

when we have a column x being used as a "Blocking"column in match pass 1 and the same column is being used as "Match Command" in another match pass (Variable Special handling is applicable for only matching columns or both matching and blocking column?)

My requirement is If both the columns are populated then the value must be same to be a match, If either one of the columns is missing then also it should be a match so can I use this "Critical Missing OK" special handling for both Data and Reference Columns?

Thanks a lot in advance for your help.
rjdickson
Participant



Joined: 16 Jun 2003
Posts: 378
Location: Chicago, USA
Points: 2531

Post Posted: Wed Jul 31, 2013 11:37 am Reply with quote    Back to top    

Hi,

The 'Critical Missing Ok' only applies to the matching commands, not to the blocking. In your case, the records would not end up in the same block because one of them is blank. Do not block on the column, and only use a match command with a 'Critical Missing OK'.

_________________
Regards,
Robert
Rate this response:  
Not yet rated
srds2



Group memberships:
Premium Members

Joined: 29 Nov 2011
Posts: 66

Points: 821

Post Posted: Wed Jul 31, 2013 2:12 pm Reply with quote    Back to top    

Thanks Robert for the information.

If I set Special Handling Action as "Critical Missing Ok" then It would be matched even if the value is missing on either Data/Reference side correct? (Just want to confirm whether it applies for only Data columns/Reference Column or Both because when I select this Action under Special handling Actions, besides it shows that "Data Columns" Radio Button as selected. when I select Reference columns Radio Button I dont see these Missing Actions but only NOFREQ and CONCAT)

Thank you very much for your hep.
Rate this response:  
Not yet rated
rjdickson
Participant



Joined: 16 Jun 2003
Posts: 378
Location: Chicago, USA
Points: 2531

Post Posted: Wed Jul 31, 2013 4:16 pm Reply with quote    Back to top    

That is correct. You select the 'Data' side only, and the 'missing ok' context is either data or reference (either one or both can be missing).

For example:

Code:
DatCol RefCol Result
------ ------ -------------
ABC    ABC     Matched
ABC    ABD     NOT Matched
ABC            Matched
       ABC     Matched
               Matched


I hope this helps.

_________________
Regards,
Robert
Rate this response:  
srds2



Group memberships:
Premium Members

Joined: 29 Nov 2011
Posts: 66

Points: 821

Post Posted: Thu Aug 01, 2013 10:17 am Reply with quote    Back to top    

Thanks a lot Robert for the information. It really helped to understand how this special handling works.

One last question regarding this match comparisons: I am using NAME_UNCERT and CHAR comparisions in my match passes, I was under the impression that by default (With no extra special handling)CHAR and UNCERT functions would work the same way(Blank to value and Blank to Blank - Positive match). Can you please let me know if my assumption is wrong as per the below example using the default comparisions with no special handling.

Code:
DatCol RefCol Result
------ ------ -------------
ABC    ABC     Matched
ABC    ABD     NOT Matched
ABC            Matched
       ABC     Matched
               Matched

If these comparisions doesnt match on Blank to value or Blank to Blank then as per my requirement do I need to add all the columns which I am using for match comparisions CHAR and NAME_UNCERT under Special Handling to have "Critical Missing OK"?

and does the order of match commands important to get proper matches. I know we need to add Blocking columns as per their priority. Can you please let me know if that rule applies for Match commands as well?

Thanks for your help in understanding these concepts to resolve my issue.
Rate this response:  
Not yet rated
rjdickson
Participant



Joined: 16 Jun 2003
Posts: 378
Location: Chicago, USA
Points: 2531

Post Posted: Thu Aug 01, 2013 11:47 am Reply with quote    Back to top    

Hi,

Match Commands (any of them) do not work the same way. This is by design. By default, anything compared to a space receives a 0 score because it is neither a match nor a mismatch - it is missing data. This is not the same as forcing a 'Not Matched' condition.

Critical Missing OK
Let's assume you have 15 columns that you are using to match. That's 15 different match commands. Let's further assume that 'DateOfBirth' is one of those columns, and is different in two records . Finally, assume that the other 14 columns are exact matches. Only the DateOfBirth is different. Your business rules may be strict enough to say 'NEVER match when there is a different date of birth.' In this case, 'Critical Missing OK' can be used to keep the records apart even though there are 14 other columns that match exactly. The input and result would look something like this:

Code:
DatCol   RefCol   Result
-------- -------- -------------
19910528 19910528 Matched
19910528 19910529 NOT Matched
19910528          Matched
         19910528 Matched
                  Matched


Match Commands
Contrast with the result from using a Match Command. Remember, this is only one of the 15 columns you have selected for the match.
Let's assume you have used CHAR:

Code:
DatCol   RefCol   Resuling weight for this column
-------- -------- -------------------------------
19910528 19910528 Maximum agreement weight 
19910528 19910529 Maximum disagreement weight
19910528          Zero
         19910528 Zero
                  Zero


So, this one field being different will not keep the records from matching - remember that there are 14 other fields that match exactly (in our example).

Remember, this can all be modified with overrides.

I hope this helps.

_________________
Regards,
Robert
Rate this response:  
rjdickson
Participant



Joined: 16 Jun 2003
Posts: 378
Location: Chicago, USA
Points: 2531

Post Posted: Thu Aug 01, 2013 11:59 am Reply with quote    Back to top    

Also, to answer your other questions:
No, blocking order within a pass does not matter.
No, match command order does not matter either. Comparison to the cutoffs does not happen until all weights are computed.

_________________
Regards,
Robert
Rate this response:  
Not yet rated
srds2



Group memberships:
Premium Members

Joined: 29 Nov 2011
Posts: 66

Points: 821

Post Posted: Fri Aug 02, 2013 8:17 am Reply with quote    Back to top    

Thanks a lot Robert for the information.

So, we can either add those match commands in the specail hadling with "Critical Missing OK" or Add "Weight Overrides" to match Blank values correct? If yes then I would use Special handling rather than doing weight overrides which I felt a bit complex to work with.

I was trying to test how this Weight overrides work for my requirement (Matching Balnks) but couldn't understand how to determine the Agreement Weight/Missing Weight override values. I have gone through few posts on DsXchange and other documentation, based on my understanding this missing weight override value we specify should be greater than the Match cutoff value in order to get the Blanks matched. does it mean we need to first specify the match cut off value before doing weight overrides?

Can you please suggest which method is better for my requirement (Matching blank to Value and Blank to blank)
Rate this response:  
Not yet rated
chulett

Premium Poster


since January 2006

Group memberships:
Premium Members, Inner Circle, Server to Parallel Transition Group

Joined: 12 Nov 2002
Posts: 42834
Location: Denver, CO
Points: 220807

Post Posted: Fri Aug 02, 2013 9:20 am Reply with quote    Back to top    

Not sure how I missed this... well past time to get this into your own post.

_________________
-craig

Your brain is full of spiders, you've got garlic in your soul, Mr. Grinch!
I wouldn't touch you With a thirty-nine-and-a-half foot pole!
Rate this response:  
Not yet rated
srds2



Group memberships:
Premium Members

Joined: 29 Nov 2011
Posts: 66

Points: 821

Post Posted: Mon Aug 19, 2013 9:30 am Reply with quote    Back to top    

Thanks a lot Robert for the information.

So, we can either add those match commands in the specail hadling with "Critical Missing OK" or Add "Weight Overrides" to match Blank values correct? If yes then I would use Special handling rather than doing weight overrides which I felt a bit complex to work with.

I was trying to test how this Weight overrides work for my requirement (Matching Balnks) but couldn't understand how to determine the Agreement Weight/Missing Weight override values. I have gone through few posts on DsXchange and other documentation, based on my understanding this missing weight override value we specify should be greater than the Match cutoff value in order to get the Blanks matched. does it mean we need to first specify the match cut off value before doing weight overrides?

Can you please suggest which method is better for my requirement (Matching blank to Value and Blank to blank)

Thank you very much for sharing information.
Rate this response:  
Not yet rated
stuartjvnorton
Participant



Joined: 19 Apr 2007
Posts: 518
Location: Melbourne
Points: 3853

Post Posted: Mon Aug 19, 2013 6:36 pm Reply with quote    Back to top    

Without looking at your data and the rest of your match strategy, it's hard to say definitively. Probabilistic matching is not a paint-by-numbers exercise (and why it's so much fun Wink.

Critical Missing OK doesn't mean it automatically matches if it is blank. What it says is "If this field doesn't match, then throw out the whole record, regardless of how high it scores from the other fields in the match pass. However, if it is missing, then don't automatically throw the record out. Still give it a chance to match based on the other fields"

This is not something you would always use. If the data is highly trusted and vital to the overall match then you would, but normally you would just use the disagreement weight as a penalty, or add a weight override if you want a heavier penalty.

Remember, the overall score is the sum of all of the individual field scores. If that field is missing and the record is still good enough to get an overall match then it will. If it isn't and you want it to, then you need to fiddle with the scores.


Using Weight Overrides is about giving things a nudge here and there to put the records you want over the threshold and keeping the ones you don't want under.

Raising or lowering the threshold makes it harder or easier overall.

As a very rough guide:
If you're being too picky overall, lower the threshold.
If you want a record to get through even though a specific field is missing (while not being more lenient in general), use a weight override to give it a bit of a boost.

Which option you choose depends on the behaviour you want, how trustworthy the sources are, the other fields in your match pass and what scores they are contributing to the overall match score.
Rate this response:  
srds2



Group memberships:
Premium Members

Joined: 29 Nov 2011
Posts: 66

Points: 821

Post Posted: Thu Aug 22, 2013 7:32 am Reply with quote    Back to top    

Thanks a lot for the information.

Can you please share me some suggestions on how can I know that what should be the Cut Off value and how much I should increase/decrease the Agreement/Disagreement weights in order to match on blanks.

Any information to better understand and modify these values would be helpful.

Thank you very much for your help
Rate this response:  
Not yet rated
stuartjvnorton
Participant



Joined: 19 Apr 2007
Posts: 518
Location: Melbourne
Points: 3853

Post Posted: Thu Aug 22, 2013 6:22 pm Reply with quote    Back to top    

Sure. Can I have your data? Wink

It's all related to the data, the number if fields and how the scores are clumped. A threshold for 1 pass could be 20 and 50 for another ass with more match fields. It's on a case by case basis and may even change over time as your data does.
Therein lies the art.
Rate this response:  
Not yet rated
ray.wurlod

Premium Poster
Participant

Group memberships:
Premium Members, Inner Circle, Australia Usergroup, Server to Parallel Transition Group

Joined: 23 Oct 2002
Posts: 54431
Location: Sydney, Australia
Points: 295208

Post Posted: Thu Aug 22, 2013 8:48 pm Reply with quote    Back to top    

Using the Match Designer test utility you can view a histogram of matches and slide the cutoffs along to investigate their effect. In association with doing that, you need to identify row pairs in wh ...

_________________
RXP Services Ltd
Melbourne | Canberra | Sydney | Hong Kong | Hobart | Brisbane
currently hiring: Canberra, Sydney and Melbourne (especially seeking good business analysts)
Rate this response:  
Not yet rated
Display posts from previous:       

Add To Favorites
View next topic
View previous topic
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum



Powered by phpBB © 2001, 2002 phpBB Group
Theme & Graphics by Daz :: Portal by Smartor
All times are GMT - 6 Hours