what is the basic difference between kimball and billinmon

Infosphere Master Data Management theory and best practices

Moderators: chulett, rschirm, falsehate

Post Reply
chandra
Participant
Posts: 88
Joined: Sun Apr 02, 2006 6:50 pm
Location: India

what is the basic difference between kimball and billinmon

Post by chandra »

could any explain the basic difference between what is the basic difference between kimball and billinmon
chandra ,
Hyd
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

What's the basic difference between an orange and an apple? What do you want, color, taste, nutritional value, texture, method of consumption?

Kimball and Inmon both write books, give lectures, and teach about data integration. Their methods are complimentary in certain areas, conflicting in others. There's no one line answer. You're best served by reading their websites and books, attending webinars, and signing up for newsletters.
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
chandra
Participant
Posts: 88
Joined: Sun Apr 02, 2006 6:50 pm
Location: India

Post by chandra »

The basic difference is :-
kimball :- DW is made of combination of DATAMARTS while INmon is DW is part if BI and DATAmarts are part of the DW.
chandra ,
Hyd
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

Well, no, you've pretty much got it completely wrong. Kimball argues against stove pipe data marts and not "backing into" an architecture. Kimball and Inmon both believe in a well architected data integration environment for analytical processing.

Kimball espouses his "Bus" architecture, and Inmon has his "CIF" (corporate information factory). There's data acquisition, storage, and presentation layers in both. The methods for data modeling differ, and I believe Inmon has no pure ETL strategy or framework book while Kimball just recently published one.

You really need to read their materials.
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
kduke
Charter Member
Charter Member
Posts: 5227
Joined: Thu May 29, 2003 9:47 am
Location: Dallas, TX
Contact:

Post by kduke »

Buzz, Buzz, Buzz. Lots of buzzwords. There is value in these books but why is it that the people who can quote from these books build terrible DataStage jobs? Simple solutions to complex problems are sometimes very beautiful. I think a lot of developers get lost in these books. Practical solutions with clean solid repeatable solutions are best. I think these are all evolving. What we think is valuable in design theory today maybe worthless in 5 years. It is important to study and know the terminology of our industry but don't get lost in it.

If your ETL is solid and runs fast and delivers what the end users need as far as information then you have succeeded on some level. If end users do not use the data warehouse and your company receives no benefit from your work then how can you call that a success? These books should push your designs to the next level and you can deliver more information to the end user.

By the way Ken, can you explain "Bus" versus "CIF" without using buzzwords? I double dare you.


I can see the value of an ODS. I can see the value of a staging table of file but do you have to land the data every time you change something. Too much staging slows down the overall processing time. What value does that add? I can see the value of surrogate keys. If you have a source table with keys 1,2,3,4 then why add a surrogate key? What value does that add. Maybe later you need to merge another soure into this same table. At that point adding a surrogate key makes sense. I do see the discipline of doing it the same way always but if someone has already built it without surrogate keys then do I have to rebuild it correctly. I can live with it not being perfect. The customer probably never knows the difference. All they know is you complain about all the existing jobs and you want to rewrite everything. Be practical. Build it right the first time which means know what these guys recommend and try to implement it but don't get upset when other developers create it wrong. Fix it when you can. Live with what you have to and be happy. You know you do better work because you studied these books and have your own opinion about what works and what you don't like.

Even Ray makes mistakes sometimes. Why worry about being perfect. Try to build something that is useful. We should all know basic terms like star schema, snowflake, normalize, fact table, dimension table, measures, OLAP, datamart and just to make Ray happy know the difference between sequence and sequencer. Please.
Mamu Kim
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

... and get the apostrophes correct!!! :wink:

Slightly more seriously - not that I'm flippant about misuse of apostrophes - common sense ought to prevail in ETL design, and Kim has summed it up fairly well. It is an unfortunate fact that common sense is not nearly as common as once thought. It is another unfortunate fact that some consultants believe that obfuscation is a mechanism for setting higher charges.

It is a good thing to be able to quote from recognized authorities. It is an even better thing to be able leaven the advice contained in their works with sensible and wise decisions about your particular design requirements. ETL is definitely not a "one size fits all" scenario.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

kduke wrote:By the way Ken, can you explain "Bus" versus "CIF" without using buzzwords? I double dare you.
No way. 8) Both my responses said the same thing: read their books and figure out the differences and whether the authors have contributed anything to you.

Solutions are "right-sized" according to the skills and needs of the customer. It doesn't make sense to build something a company can't manage or maintain, likewise, a simplistic solution may not be good enough for a company that has the skills to do the full out OLAP solution.

When asked what's the difference between Kimball and Inmon, it's like the question: Boxers or briefs? The solution depends on the, ahem, :oops: , needs of the owner. Some guys go commando and wear nothing, that's another solution. Others try the boxerbriefs hybrid. Then there's the thong. The point is you have to understand the issues and the potential solutions, and then decide on the courses of action.
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
kduke
Charter Member
Charter Member
Posts: 5227
Joined: Thu May 29, 2003 9:47 am
Location: Dallas, TX
Contact:

Post by kduke »

:lol: I understand what you mean but I was not talking about dumbing down my solutions. I don't think my solutions are simplistic. I do think my solutions are streamlined and clean in their design. I don't like over engineered or overly complicated solutions like Ray was talking about. There is a way to deliver a complicated solution without the jobs or processes being ugly or each job or process being too complex.

You have shared before on how difficult it was to understand some developers jobs or solutions. Why? I think the same is true for Kimball and Inmon. We have all gleaned valuable information from these books. I do think these guys have created more buzzwords and double talk or technobable. These buzzwrds do create nice shortcuts when talking about solutions. I doubt if both parties are in agreement on what solution is being created because of the confusion on these buzzwords where one thinks the buzzword means one thing and the other has a different idea on the meaning of the buzzword.

I would like a clean defintion of the difference between Kimball and Inmon as it applies to ETL or delivering a solution. What do you think we DSX developers would benefit from knowing more or understanding better these books and there is a lot of books and articles. What table designs or ETL design changes would be made if we understood more? I think most of us do the important parts and understand at least SCD type 1 and 2, surrogate keys, star schemas and staging. Beyond that what can we improve on?
Mamu Kim
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

kduke wrote:I would like a clean defintion of the difference between Kimball and Inmon as it applies to ETL or delivering a solution. What do you think we DSX developers would benefit from knowing more or understanding better these books and there is a lot of books and articles. What table designs or ETL design changes would be made if we understood more? I think most of us do the important parts and understand at least SCD type 1 and 2, surrogate keys, star schemas and staging. Beyond that what can we improve on?
As for ETL, Inmon hasn't really articulated an ETL framework, at least Kimball has an ETL course he offers. There is really no way to say where Inmon and Kimball differ.

The biggest improvement, in my opinion, would be for folks to understand where the line is drawn between high-performance high-volume techniques and low-performance low-volume techniques. I just read a book on "temporal data warehouses" and the author clearly never loaded hundreds of millions of rows of data because of the methods his triggers used (on insert select count(*) on the table being loaded with updating logic, kind of like SCD type 2 processing). There's things you can get away with in a data model and ETL because your volumes are low and your hardware is fast. When the situation is the reverse, you really have to nail the ETL and data model.

Both Kimball and Inmon believe in a centralized, data managed environment. In a left to right diagram, you have staging (PSA), storage (EDW), presentation (datamarts). How you manage each layer is different. In storage, Kimball says go for the lowest grain, star schema. Present the data in summarized grains still within the star schema. Inmon stays store the data in a more "normalized" model, but present in a star schema.

Both now position the "ODS" as a separate function, not in the pipeline between staging and storage, but more as an end point of a stream after staging. Placing the ODS on the critical path for the analytical solution tends to complicate things. Since ODS are now really positioned as real-time, volatile (no history or SCD type 2), and normalized, they gum up the works on daily refresh cycles for analytical work that requires non-volatile (history of every row or SCD type 2). It's hard to manage SCD when the ODS only has the most current row, you end up trans-logging the ODS to keep track of the insert-update-update-update of a row between batch refreshes of the EDW, or keeping an audit table to track them.
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
kduke
Charter Member
Charter Member
Posts: 5227
Joined: Thu May 29, 2003 9:47 am
Location: Dallas, TX
Contact:

Post by kduke »

Excellent, thank you.
Mamu Kim
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

Take a peek here:
http://www.inmoncif.com/library/cif/

This is my favorite diagram, because it shows the elements in a fully matured environment. I really like showing folks this diagram, because it settles all disputes over where the ODS is positioned.

Each element in the diagram serves a purpose. Staging is about keeping original data in original form. EDW is the "hub" where the transformed data is stored for the long haul. The data marts, feeds, and such radiate like spokes from the hub.

Kimball says all of this is good as well. The differences come into the modeling, naming conventions, and some of the loading practices. This is a really cool picture. :lol:
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
Post Reply