Hello,
I've done column Analysis using information analyzer. Now I'm trying to review the analysis in Frequency distribution -> 'domain & Completeness'. since I have millions records, its hard to go thru every entry. Is there a way to write a condition for each column and get the review done. like if the City name contains numeric values make the status invalid else valid.
Example:
Atlanta- Valid
Atalata12345- Invalid
atla- default
Review column analysis results in 'domain & Completeness
-
- Participant
- Posts: 527
- Joined: Thu Apr 19, 2007 1:25 am
- Location: Melbourne
-
- Participant
- Posts: 527
- Joined: Thu Apr 19, 2007 1:25 am
- Location: Melbourne
-
- Participant
- Posts: 3593
- Joined: Thu Jan 23, 2003 5:25 pm
- Location: Australia, Melbourne
- Contact:
Have a look at DQ rules in Information Analyzer - I would recommend version 8.7 rollup 1. This lets you create the type of DQ rules you are looking for where the rule is defined as a multi criteria statement and it produces data quality metrics. For data that has millions of rows you will find DQ rules more effective than manual data checking. You can then bind these city rules to different instances of city columns.
Have a look at this article on pre-built address rules:
Using pre-built rule definitions with IBM InfoSphere Information Analyzer
You will also find QualityStage more suitable to cleansing these fields - so standardising Atalata into Atlanta.
Have a look at this article on pre-built address rules:
Using pre-built rule definitions with IBM InfoSphere Information Analyzer
You will also find QualityStage more suitable to cleansing these fields - so standardising Atalata into Atlanta.
Certus Solutions
Blog: Tooling Around in the InfoSphere
Twitter: @vmcburney
LinkedIn:Vincent McBurney LinkedIn
Blog: Tooling Around in the InfoSphere
Twitter: @vmcburney
LinkedIn:Vincent McBurney LinkedIn
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact: