Smart Data Quality Tool

This forum is in support of all issues about Data Quality regarding DataStage and other strategies.

Moderators: chulett, rschirm

Post Reply
Alexander
Participant
Posts: 17
Joined: Fri May 12, 2006 10:10 am
Location: Europe

Smart Data Quality Tool

Post by Alexander »

Hi,

Every tools have limitations. The most important, in my opinion, is that they don't learn with the mistakes they correct. And we need to be on top of the problem to resolve new issues.

A data quality tool should learn with the input data to solve the problem by her self. How? The input should be: wrong form and correct form for the possible items to correct: adress, name, etc...
And when a new situation appears, a new address with a street name in the wrong position, she will solve it using her best know-how.

Is there any tool with this behavior?

Thanks
kduke
Charter Member
Charter Member
Posts: 5227
Joined: Thu May 29, 2003 9:47 am
Location: Dallas, TX
Contact:

Post by kduke »

Ray will sell you a RMM stage along with this great idea.
Mamu Kim
DSguru2B
Charter Member
Charter Member
Posts: 6854
Joined: Wed Feb 09, 2005 3:44 pm
Location: Houston, TX

Post by DSguru2B »

O yea, the good ol' RMM stage. :wink:
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Because QualityStage uses a pattern matching technique, it does not really require the heuristic you postulated. It can, through pre-processor rule sets, detect misplaced data, including column overlapping, via pattern recognition techniques.

These can be refined over time, and rules changed or overridden, but it's not automated in the current version; it requires human assistance, particularly in those grey areas where the probability of a hit is too high to qualify as a certain non-match but not quite high enough to qualify sufficiently certainly as a match. QualityStage incorporates 24 probabilistic (rather than deterministic) matching algorithms; the user investigating or searching for potental matches can choose any combination of them, as suited to the data.

I have seen other tools, but QualityStage is at the front of the pack with daylight to second place, in my opinion.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Alexander
Participant
Posts: 17
Joined: Fri May 12, 2006 10:10 am
Location: Europe

Post by Alexander »

If we look to the .pat wich is use to correct an address we will find a long list of patterns to correct every known case. And when it's not quite right, because there is another character missplaced, the only way to correct it is join another pat rule. Until exhaustion!!!

If the Data Quality tool gather information from all normalizations and constructs some form of kwnoledge base, with time she will start to recognize new errors and dinamic adapt with no human intervation.

Just because a PAT gives us unlimited resources to solve every situation, this doesn't answer my question, a smart learning Data Quality tool.

The only way, I see, is to code our an neuro-based algorithm wich will learn over the time.
Is there any work on this field?
Where can I look for more information?

Thanks
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

I know of no work in the field of neural net or any other "automatic" mechanism to do what you suggest. Maybe there's a PhD in it?
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply