August 19, 2013

Dimensional Modeling – The What and the Why

Post by: Cory Cundy

You might have heard of dimensional modeling and wondered what it is. It’s a great question and the answer may be simpler than you think.

Dimensional modeling is a database design technique that involves restructuring data from one or more source systems into a common data warehouse to support reporting and analytics. This design will typically be much different than a standard OLTP (Online Transactional Processing) database structure that is used with transactional applications. Typically, an OLTP design is normalized and has little to no redundant data which is good for data storage and the ability to handle a high volume of transactions. A dimensional model, on the other hand, will typically be more de-normalized and contain more redundant data to provide a variety of reporting benefits. OLTP systems are commonly used to source dimensionally modeled data warehouses.

What are the benefits of a dimensional model within a data warehouse? First, it provides a more simplified structure so that it is more intuitive for business users to write queries against. There are usually fewer tables that are more de-normalized than an OLTP database. Fewer tables for users to join will simplify the process of writing queries. Typically, several tables from an OLTP will be restructured into a single table in a dimensional model. It is also common for a data warehouse to contain data from multiple source systems which allows users to query data from across an organization.

Another goal of a dimensional model is query performance. Since there are fewer tables, there will be fewer joins which yields less processing and higher performance compared to writing the same query against the OLTP database. In addition, database technologies, such as SQL Server, have an optimization for these types of queries called the Star Join Query Optimization. Star joins are based off of joining a fact table to multiple dimensions, which we will discuss next.

What is a dimensional model made up of? The core of the dimension model is built from fact tables and dimension tables. Dimension tables contain the data related to business entities such as customers, products and employees. In addition to business entities, there is typically a dimension for dates. For example, the customer dimension would contain all information related to a customer. This might include their name, address, birth date, and gender along with other demographic information. This data may be stored across several tables in an OLTP database.

There are several types of fact tables, but a transactional fact table is probably the most common and contains records about events that occur at a point in time. For example, when you purchase an item at a retail store, it is an event that occurs and the data about the event would get written to a fact table, such as FactSales. A fact table contains keys (or pointers) and measures. The keys point to the related dimensions. When a retail transaction occurs, there is a customer, a product, a cashier (employee) and the date and time the transaction takes place, among other entities, that describe a sale. The fact record points to the corresponding dimension tables. This concept of a fact table pointing to other dimensions around it is known as a star schema. Measures are the numeric values that are also captured during the transaction that can be measured and aggregated such as quantity, price and tax. Fact tables are modeled after business processes, such as the sales process in the case, and described by dimensions.

As an example, this dimensional model would allow us to write a query to determine the sales quantity and dollars we had in December of last year for customers living in Green Bay, WI for a specific product. This is done by joining the FactSales table to the customer, product and date dimensions. Essentially, you can sum or aggregate the data in the fact table based on any of the columns in any of the related dimensions to get the information you are interested in.

Another benefit of dimensional modeling is the ability to re-use dimensions as new business processes are modeled. This provides the ability to have one version of the truth. For example, the marketing department may need to understand their advertising better. The product dimension could be re-used for this new model along with any new dimensions that might be needed. Re-using dimensions is much better than trying to keep multiple similar dimensions in sync with each other.

One last benefit of creating dimensional models is that they are typically the type of models that are best suited for OLAP (Online Analytical Processing) databases, such as SQL Server Analysis Services. OLAP databases contain data cubes which provide users the ability to slice and dice the data in an ad-hoc fashion without knowing how to write queries.

In summary, dimensional models are made up of fact tables based on business processes and dimension tables that describe them. They are designed to be easily joined together to provide a more intuitive data structure for writing queries that have good performance. Other benefits include the ability to re-use dimensions across an organization and these models are the right type for use with OLAP database technologies.

Relevant Insights

Are You Falling into the Cloud Conversation Gap?

Most conversations about migrating services to the cloud focus on migration of application workloads and data. The vision is often...

How Aggregation Tables Improve Performance for Power BI Reports

In this blog, Marcus Radue, Solution Architect, offers high-level guidance in the advantages of aggregation tables in Power BI. For a...

Getting Started with Azure DevOps – Views

Like most Microsoft products, there are many ways to look at or perform the same activities. Azure DevOps is no...
X