two-source probabilistic matching in real time
Posted: Thu Jan 08, 2015 10:58 am
Is it possible to perform two-source probabilistic matching in real time with:
- data source: real-time request, one record at a time, via web service call to a job we deploy as an ISD application
- reference source: Oracle (akin to an Oracle sparse lookup, knowing that the reference records can be inserted and updated in real time by external processes)
Q1. It seems from the documentation that all the reference data first needs to be standardized, determine the frequency distribution, etc. That seems like a lot of overhead to run through for each and every request that needs to be matched in real time. Is that a correct understanding? I would hope not.
Q2. I had gathered from the 8.7 docs that all the match stage inputs must be persistent data sets. Again I hope I misunderstood, as that does not really make sense. In the 11.3.1 docs it says database stages can be inputs. Have the match input requirements loosened between 8.7 and 11.3.1?
The latter could be easily tested where as Q1 seems pretty involved. At this point it would help to clarify what is actually possible in real time, when Oracle has the ever-changing reference data.
- data source: real-time request, one record at a time, via web service call to a job we deploy as an ISD application
- reference source: Oracle (akin to an Oracle sparse lookup, knowing that the reference records can be inserted and updated in real time by external processes)
Q1. It seems from the documentation that all the reference data first needs to be standardized, determine the frequency distribution, etc. That seems like a lot of overhead to run through for each and every request that needs to be matched in real time. Is that a correct understanding? I would hope not.
Q2. I had gathered from the 8.7 docs that all the match stage inputs must be persistent data sets. Again I hope I misunderstood, as that does not really make sense. In the 11.3.1 docs it says database stages can be inputs. Have the match input requirements loosened between 8.7 and 11.3.1?
The latter could be easily tested where as Q1 seems pretty involved. At this point it would help to clarify what is actually possible in real time, when Oracle has the ever-changing reference data.