Hi Team
I need help with data quality process /model in the data warehouse.
Kindly advise how to do it based on completeness ,accuracy, validity, uniqueness.
I need advise how to design a process to use to cater for data quality in datastage.
Regards
Zandile
Data quality checks in datawarehouse/ETL
Moderators: chulett, rschirm, roy
Data quality checks in datawarehouse/ETL
Zandile Moloi
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Welcome aboard.
This is an enormous topic! Entire books have been written about how to do it.
Begin by understanding that data quality is one component of an overall governance program. The business must specify the rules that determine how data need to be in order to be fit for purpose.
From that, the business should be recording the policies, rules and a glossary of business terms that will control their information. Information is a very valuable asset, and it belongs to the business. If they're not prepared to own it, walk away.
To gain an understanding of what the data look like, the data need to be profiled. The hard lifting can be done with a tool like Information Analyzer, but its results need to be reviewed by a business analyst, possibly with input from subject matter experts and data architect types. Information Analyzer can profile for structure, classification, format, etc., and also for compliance with data rules. In the longer term, regular monitoring for compliance with data rules is a very convenient mechanism for scoring the overall data quality in the organization's business.
It's only once all of the above has been completed (at least for the pilot project - don't try to boil the ocean) that DataStage and QualityStage developers should become involved. Techniques available include:
This is an enormous topic! Entire books have been written about how to do it.
Begin by understanding that data quality is one component of an overall governance program. The business must specify the rules that determine how data need to be in order to be fit for purpose.
From that, the business should be recording the policies, rules and a glossary of business terms that will control their information. Information is a very valuable asset, and it belongs to the business. If they're not prepared to own it, walk away.
To gain an understanding of what the data look like, the data need to be profiled. The hard lifting can be done with a tool like Information Analyzer, but its results need to be reviewed by a business analyst, possibly with input from subject matter experts and data architect types. Information Analyzer can profile for structure, classification, format, etc., and also for compliance with data rules. In the longer term, regular monitoring for compliance with data rules is a very convenient mechanism for scoring the overall data quality in the organization's business.
It's only once all of the above has been completed (at least for the pilot project - don't try to boil the ocean) that DataStage and QualityStage developers should become involved. Techniques available include:
- standardizing data (e.g. MEG = MARGARET, DR = DOCTOR)
matching data if required
applying data rules (Data Rules stage --> pass/fail and details)
capturing exceptions (Exceptions stage)
managing exceptions through Data Quality console
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.