Page 1 of 1

Smart Data Quality Tool

Posted: Thu Mar 15, 2007 9:57 am
by Alexander
Hi,

Every tools have limitations. The most important, in my opinion, is that they don't learn with the mistakes they correct. And we need to be on top of the problem to resolve new issues.

A data quality tool should learn with the input data to solve the problem by her self. How? The input should be: wrong form and correct form for the possible items to correct: adress, name, etc...
And when a new situation appears, a new address with a street name in the wrong position, she will solve it using her best know-how.

Is there any tool with this behavior?

Thanks

Posted: Thu Mar 15, 2007 3:30 pm
by kduke
Ray will sell you a RMM stage along with this great idea.

Posted: Thu Mar 15, 2007 5:13 pm
by DSguru2B
O yea, the good ol' RMM stage. :wink:

Posted: Thu Mar 15, 2007 7:15 pm
by ray.wurlod
Because QualityStage uses a pattern matching technique, it does not really require the heuristic you postulated. It can, through pre-processor rule sets, detect misplaced data, including column overlapping, via pattern recognition techniques.

These can be refined over time, and rules changed or overridden, but it's not automated in the current version; it requires human assistance, particularly in those grey areas where the probability of a hit is too high to qualify as a certain non-match but not quite high enough to qualify sufficiently certainly as a match. QualityStage incorporates 24 probabilistic (rather than deterministic) matching algorithms; the user investigating or searching for potental matches can choose any combination of them, as suited to the data.

I have seen other tools, but QualityStage is at the front of the pack with daylight to second place, in my opinion.

Posted: Fri Mar 16, 2007 5:17 am
by Alexander
If we look to the .pat wich is use to correct an address we will find a long list of patterns to correct every known case. And when it's not quite right, because there is another character missplaced, the only way to correct it is join another pat rule. Until exhaustion!!!

If the Data Quality tool gather information from all normalizations and constructs some form of kwnoledge base, with time she will start to recognize new errors and dinamic adapt with no human intervation.

Just because a PAT gives us unlimited resources to solve every situation, this doesn't answer my question, a smart learning Data Quality tool.

The only way, I see, is to code our an neuro-based algorithm wich will learn over the time.
Is there any work on this field?
Where can I look for more information?

Thanks

Posted: Fri Mar 16, 2007 6:04 am
by ray.wurlod
I know of no work in the field of neural net or any other "automatic" mechanism to do what you suggest. Maybe there's a PhD in it?