Normalized vs Denormalized

Moderators: chulett, rschirm

Post Reply
djames
Participant
Posts: 2
Joined: Wed Dec 11, 2002 8:21 am
Location: USA

Normalized vs Denormalized

Post by djames »

Can any case be made for a DSS structure based on a normalized ER model?

Everything I've seen or read supports the need for a denormalized dimensional model, especially where query flexibility and optimization are the objectives.

In addition, 'time' has been a common dimension in the star schemas that I've seen. Does anyone have any experience with the use of User Defined Functions to replace the need for a 'time' dimension?
datastage
Participant
Posts: 229
Joined: Wed Oct 23, 2002 10:10 am
Location: Omaha

relational vs multidimensional olap

Post by datastage »

I have an anonymous wp that discusses when to use multidimenional or relational OLAP. This is probably a different slant than the original normalized vs. denormalized question, but thought it may be useful or could make the discussion more dynamic.

Here is a paraphrase of when it says a multidimensial OLAP approach is a good-fit vs. a relational OLAP approach: use MULTIDIMENSIONAL when requirements call for 1) computation of intensive applications with what-if scenarios and models, 2) fairly static dimensions, 3) read-write capability, 4) a relationship between the dimensions that is rich and complex, 5) cross-dimensional and row-level calculations, 6) powerful calculation, statistical, and financial functions, and 7) acceptable database size for the business function. Use RELATIONAL when the requirements call for 1) data-intensive applications with significant data browse needs, 2) fairly dynamic dimensionality and changing granularity, 3) read-only capability with minimal write, 4) minimal cross-dimensional and row-level calculations, and 5) large database size, simple relationships between the dimensions, and on-the-fly dimensional view.

I think MULTIDIMENSIONAL is usually the logical choice because items 1 and 5 are for it are common. I am curious though how much issue 2 for both items affects the decision and whether it is a valid read to still use the RELATIONAL approach.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Normalized vs Denormalized

Post by ray.wurlod »

I am a major fan of the dimensional model, implemented as a star schema (and must admit to a bias as an exponent of IBM's Red Brick Warehouse - if only they knew what a great product they have!).
The time dimension is vital, as a typical DW or DM is a series of snapshots of operational data over time. To answer Dennis's question about user-defined functions to generate the time dimension, it's obvious that some information would be needed in the data from which the function could derive time dimension values, yet this information is not really a "fact" (a measure of the business), so it would have to be implemented as a degenerate dimension.
Given the cross-dimensional indexing capabilities in Red Brick Warehouse, any solution using this product would almost certainly employ an actual time dimension table.
To address Dennis's other question with rather less rigour than the previous poster, it is my strongly held belief (and Kimball's and Inmon's) that the relational model is a disaster in a DW, primarily due to the issues involved in navigating it.
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

There's another reason that any kind of reporting model needs to be denormalized and that is performance.

A denormalized, aggregated, shrunken, or collapsed structure "pre-joins" the data. If you have 30 variations of the same report navigating the same tables over and over, conducting the same joins over and over, you never realize performance advantages that could have been seized in the middle of the night during ETL processing. By denormalizing the presentation model, while maintaining the more normalized storage model, you arrive at the typical warehouse implementation. The presentation model allows you to keep users out of the storage model, which makes everyone's life easier. It's also fairly easy to load the presentation model, as it's usually denormalization and summarization, something easily handled with SQL scripts, whilst the storage model requires the robustness of the ETL toolsets.

Aggregate awareness is a major performance enhancement in query and/or reporting tools for the very fact that aggregates are either summarized, pre-joined, or both. The ability to aggregate a 3NF model and stay within a 3NF model is a mighty accomplishment, kind of like a screen door on a submarine.

Good luck!
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Ah, yes, aggregates. One of the amazing things in the Red Brick Warehouse product is a mechanism that:
(a) re-writes user queries to use pre-computed aggregate tables
(b) gathers statistics on queries that do, or could, use them
(c) reports to the DBA which pre-computed aggregate tables are being used, which are not, and which would be useful to have but aren't present.
Further, the bulk loader has the capacity automatically to maintain the pre-computed aggregate tables too.
It makes such a difference when a tool is purpose-written. :D
srinivasb
Participant
Posts: 42
Joined: Mon Mar 10, 2003 10:52 pm
Location: UK
Contact:

Normalized data warehouse

Post by srinivasb »

Hi,

In one discussion, the presenters were mentioning " NORMALIZED" Data warehouses...

I could not get more info on that topic. Could Ken or Ray share their knowledge?

Thanks in advance.

Regards
Srinivas.B
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Dimensional VS 3NF

Post by kcbland »

A normalized warehouse would follow rules of normalization. A dimensional model follows rules for the star schema. They are not necessarily mutually exclusive.

Recommend you read The Data Warehouse Lifecycle Toollkit by Ralph Kimball. You also need to visit www.BillImon.com and www.RalphKimball.com. Between these 2 guys you cover data warehousing pretty much in total.

On Immon's website you can get a picture of the CIF, which encompasses most of the typical pieces of a data warehouse project. In my experiences, and each client is different, you have 3 main data structures in your warehouse:

(1) A persistent/semi-persistent staging area with a data staging model
(2) An enterprise data warehouse using Inmon's methods
(3) Star schema data marts, with integrated facts, conformed dimensions, and aggregate tables.

I didn't invent any of this, I just summarized Inmon and Kimball into a few short sentences.

Good luck!
-Ken
Post Reply