SQL Server Performance Tuning : Slowly Changing Facts?

This post is a follow-up to the previous post Update Fact Tables? Recently I have seen people try to convert SQL Server transactional fact tables into quasi-slowly changing transactional fact tables. These quasi-transactional database  fact tables will most likely have something like an EffectiveDate, ExpiryDate and maybe even an IsCurrent indicator columns. Below is a generic representation:

clip_image002

In many cases this done because the measure or fact values change over time and the analyst want not only the most recent values but the historical values as well. While something like the above diagram does address those issues, it does so at a very large cost, which will make the quasi-transactional fact unable preform into the future.

Unless you are told by the source system which fact rows have changed, all the source rows will have to be compared against the latest row versions in the SQL Server Database quasi-Transactional Fact Table. In other words, whether in SSIS (please don’t use SSIS to do this) or on the SQL Server Database Engine, both data sets of fact rows will have to be filter, sorted, joined by some key, and then compared on a column by column basis to find the changes (or you could use Hashbytes, never us Checksum !!!). The new row versions are then inserted and the old row versions have to be updated with new expiry dates.

While this might perform well on a small scale, let’s say with only a few thousand rows, imagine what would happen if the tables were increased to millions, billions, or even trillions of rows. Once you run out of memory things will get very slow very quickly!!

The correct solution is to change the Quasi-Transactional Fact Table into a true Snapshot Fact Table. In my experience too many data warehouse developers don’t understand the difference between these two very important types of fact tables.

If you have not already done so please read or re-read The Data Warehouse Lifecycle Toolkit. It should be required reading for anyone before they sit down and design their SQL Server data warehouse database. I will not rehearse what that book says because it does such a good job, but I will use one very important chart:

image

Notice that both Transactional and Snapshot Fact Tables are insert only tables.

<rant> Accumulating Snapshot Fact Tables are very rare and I have only used them once in my 10 years of experience. In my experience if you are justifying the update of your fact table by calling it an Accumulating Snapshot you are most likely doing something wrong. </rant>

Now with this knowledge we can go back and convert the original quasi-transactional fact table to a true Snapshot Table.

clip_image004

Some might protest that now that all of the source fact rows are now just inserted every day that the fact table will become huge. To that I say great! You want fact tables with lots of rows. The more rows the better! This is because there are very neat ways of managing large tables like Table Partitioning (see: Table Partitioning and Columnstore Indexes).

Advertisements

12 Responses to “SQL Server Performance Tuning : Slowly Changing Facts?”

  1. Bill Anton Says:

    Nice post Garrett! I’ve seen cubes with some crazy calculated measures as a result of not using a periodic snapshot fact table.

  2. manowar Says:

    Hi Garrett. I’ve also had to face that challenge to keep track of changes in a fact table and the Snapshot Fact table can be sometimes just to expensive in term of space used. That’s why I’ve define an “new” fact table type named “Temporal Snapshot Fact Table.” If you would like to look at it, you can find the slide here: http://www.slideshare.net/davidemauri/temporal-snapshot-fact-tables

  3. Table Partitioning with Fact Tables | Garrett Edmondson Says:

    […] Microsoft Business Intelligence and Data Warehousing « Slowly Changing Facts? […]

  4. SQL Server Performance Tuning: Update Fact Tables? | Garrett Edmondson Says:

    […] Business Intelligence and Data Warehousing « PASS Business Analytics Conference SQL Server Performance Tuning : Slowly Changing Facts? […]

  5. landrews5807 Says:

    Why do you say “never use checksum”?

  6. Nate West Says:

    Since when are accumulating snapshots rare? Business processes that track state of something over time happen in many areas. Utility service work is one such example (request placed > work scheduled > tech en route > tech on site > work complete > work approved). Why should we track all 5 million historical service orders with new rows every single day? Another example would be a manufacturing process, where you track each machine that a coil of steel travels through and the time spent for at that machine. You’ll end up with horrible subqueries if you just store transactions by machine.

    If your only verticals are retail & financial, sure, you probably won’t see to much of the accumulating snapshot. But when you are dealing with data that is not point-in-time events (HR, Services, Maintenance, Manufacturing) and what matters is measuring the time spent during each stage of the process, the accumulating snapshot is the correct choice.

    • Garrett Edmondson Says:

      Good points. The prevalence of accumulating snapshot fact tables may vary from vertical to vertical. I personally have only used them in manufacturing (oil & gas equipment) processes. They work very well in that context, however too many times I have seen fact tables were historical measure values are being constantly updated. Not only does this not scale very well (ETL performance) is also makes historical reporting inconsistent. A better solution is get the source system to stop making the retroactive historical measure updates and produce adjustment records instead. Or have the data warehouse generate adjustment records.

      https://garrettedmondson.wordpress.com/2013/04/07/update-fact-tables/

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s


%d bloggers like this: