etl/data mapping

Moderators: chulett, rschirm

Post Reply
troywinston
Participant
Posts: 42
Joined: Sun Jul 03, 2005 4:22 pm
Location: NY

etl/data mapping

Post by troywinston »

Hey I am fairly new to datastage and I am specifically interested in ETL/data migration- data mapping, been doing some research but not getting specific information. I have broken down the transformer into 4 processes, specifically I would like to know as a data mapper using datastage -transformation stage, just in a nutshell what is involved in
1. data validation process
2. data cleaning process
3. data integration process
4. data aggregation process

troy :?
kduke
Charter Member
Charter Member
Posts: 5227
Joined: Thu May 29, 2003 9:47 am
Location: Dallas, TX
Contact:

Post by kduke »

Is this homework or a job interview question?
Mamu Kim
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Lots of grey areas and overlaps - one of the dangers of applying labels.
  • Validation is ensuring that values are valid; this might include data type and range checking, might include "exists on file" checking too.

    Cleaning is applying algorithms to data to ensure that they comply with business rules, and might include data type conversions, replacement of missing values, and so on.

    Integration is guaranteeing that data from disparate sources is rendered into a form that allows "apples to be compared to apples", and might include standard units of measurement, currency exchange, and so on.

    Aggregation is any form of summarising data; in general you aren't interested in the individual transactions but, rather, total daily sales.
There's a lot more to ETL than that, but I hope the question is adequately answered. May I refer you to The Data Warehouse Lifecycle Toolkit by Ralph Kimball and others, where you will find the answers you seek.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
peternolan9
Participant
Posts: 214
Joined: Mon Feb 23, 2004 2:10 am
Location: Dublin, Ireland
Contact:

Post by peternolan9 »

Troy,
you can get a lot more information from the Ralphs latest book on the ETL toolkit. Unfortunately and general ETL discussion is, by necessity, not tool specific.

Also, if you check out my company web site www.instantbi.com you will see we have released a 'mapping spreadsheet' that will help you understand some basics of mappings from staging area to target......look under downloads->beta software->IDW MG Beta...

There is also a presentation about it as well.....

We have also now released a beta for software that generates ETL directly from the mapping spreadsheet (though not in DataStage).

I believe this invention of being able to do all mappings in a spreadsheet will revolutionise ETL development because we all need to write the mappings somewhere and we usually do that in a spreadsheet...then we code it into a tool like DS.

We are currently doing a large and complex EDW and we are generating all ETL code directly from our mapping spreadsheet.....not bad!!
Best Regards
Peter Nolan
www.peternolan.com
Post Reply