Dezember 2006

DWH Modeling Rule #2: Build a generic, data-driven core

Almost any Data Warehouse architecture can be divided into five sections:

  1. Raw data from external data sources
  2. Staging area
  3. Consolidated and enriched raw data
  4. Multidimensional data (dimension tables, fact tables, etc.)
  5. Data marts

When we take a look at the data flow in the warehouse process we can find the most client-specific or application-specific requirements at the transitions 1->2, 2->3, and 4->5. In most cases, the transition 2->3 is particularly specific, complex, and elaborate.

On the other hand, the transition 3->4 is a good candidate for a generic approach. The complete process can be defined by a set of meta data and a set of procedures, which, based on the meta data, can dynamically build and execute the required SQL statements. The transition 3->4 can still be very complex and elaborate, but it is by far not as application-specific as the other transitions.

Sections 3 and 4 form what I call the Warehouse Core. This is the place where you can normally find some sort of star- or snowflake-schemas. The warehouse process of the core is primarily made up of the following steps:

  • Aggregations
  • Management of historical changes
  • Management of structural changes

One might say, that these operations can be very application-specific. This is absolutely right, but, compared to the other sections, they can easily be customized and configured by enhancing the star-schema and the meta data a little bit.

This post shall just set the stage for the upcoming detailed description of the enhanced star-schema and the associated meta data , which are both part of the Data Warehouse Framework I have developed over the past years. With the help of this framework I have succesfully implemented a number of Data Warehouse solutions, many of them containing irregular and ragged hierarchies and non-additive measures.


Data Warehouse
Data Modeling

Comments (1)

Permalink

The newest crucial question: Which BI-Tool is able to aggregate this correctly? I say: “Maybe ONE!”

Thanks to the valuable advice from two real OLAP gurus we now know, that MS Analysis Services 2005 is able to deliver correct aggregations for diamond-shaped hierarchies (please refer to my post from 2006/11/25 and the according comments). I consider this as a real quantum leap.

Now, let’s put it up another notch by adding another real-world requirement: Non-leaf members with data.

When there are non-leaf members with data, the aggregation can be done in one of the following three ways:

  1. Distinct sum over the descendants of the non-leaf member
  2. Distinct sum over the descendants of the non-leaf member and the non-leaf member itself
  3. Data value of the non-leaf member itself

The newest crucial question: Which BI-Tool is able to aggregate distinct sums over many-to-many dimensions with non-leaf members with data?

BI-Tools

Comments (0)

Permalink