Archive for April, 2013

SQL Server Performance Tuning : Slowly Changing Facts?

April 21, 2013

This post is a follow-up to the previous post Update Fact Tables? Recently I have seen people try to convert SQL Server transactional fact tables into quasi-slowly changing transactional fact tables. These quasi-transactional database  fact tables will most likely have something like an EffectiveDate, ExpiryDate and maybe even an IsCurrent indicator columns. Below is a generic representation:

clip_image002

In many cases this done because the measure or fact values change over time and the analyst want not only the most recent values but the historical values as well. While something like the above diagram does address those issues, it does so at a very large cost, which will make the quasi-transactional fact unable preform into the future.

Unless you are told by the source system which fact rows have changed, all the source rows will have to be compared against the latest row versions in the SQL Server Database quasi-Transactional Fact Table. In other words, whether in SSIS (please don’t use SSIS to do this) or on the SQL Server Database Engine, both data sets of fact rows will have to be filter, sorted, joined by some key, and then compared on a column by column basis to find the changes (or you could use Hashbytes, never us Checksum !!!). The new row versions are then inserted and the old row versions have to be updated with new expiry dates.

While this might perform well on a small scale, let’s say with only a few thousand rows, imagine what would happen if the tables were increased to millions, billions, or even trillions of rows. Once you run out of memory things will get very slow very quickly!!

The correct solution is to change the Quasi-Transactional Fact Table into a true Snapshot Fact Table. In my experience too many data warehouse developers don’t understand the difference between these two very important types of fact tables.

If you have not already done so please read or re-read The Data Warehouse Lifecycle Toolkit. It should be required reading for anyone before they sit down and design their SQL Server data warehouse database. I will not rehearse what that book says because it does such a good job, but I will use one very important chart:

image

Notice that both Transactional and Snapshot Fact Tables are insert only tables.

<rant> Accumulating Snapshot Fact Tables are very rare and I have only used them once in my 10 years of experience. In my experience if you are justifying the update of your fact table by calling it an Accumulating Snapshot you are most likely doing something wrong. </rant>

Now with this knowledge we can go back and convert the original quasi-transactional fact table to a true Snapshot Table.

clip_image004

Some might protest that now that all of the source fact rows are now just inserted every day that the fact table will become huge. To that I say great! You want fact tables with lots of rows. The more rows the better! This is because there are very neat ways of managing large tables like Table Partitioning (see: Table Partitioning and Columnstore Indexes).

SQL Server Database Performance Tuning: Update Fact Tables?

April 7, 2013

I have been asked on several occasions if rows/values in a SQL Server database fact table should be updated. The answer to this question in almost all cases is NO! Fact tables should not be updated. There are several really good reasons why facts measure values should almost never be updated.

1. Consistent Reporting: If I run a report today which reports on last week’s measure values I should get the exact same measure values if I run the same report tomorrow. The end users will not be happy if their historical numbers keep changing every day that they run the report!

2. Fact Table Processing: Preforming updates on tables can be a very costly operation on the SQL Server data warehouse database server! Fact tables can and should be very large. They can contain 100’s millions, billions, or trillions of rows and they should only have one index on each surrogate key column as well (useless you are fortunate enough to have SQL Server 2012 then just create one Columnstore Index which includes all the surrogate keys). All that makes updating anything in a database fact table very costly and should be done with great care and thought.

But I also realize that we also live in the real world. And in the real world sometimes fact transaction measure values change (even though they shouldn’t!!). I won’t name names but there are plenty of source systems which allow fact transaction measure values to be updated. So what can you do? You need to record those changing facts, but you also want to preserve reporting consistency and you don’t what to exponentially increase the time it takes to process the transactional fact table.

Well there are a few options:

1. Covert the transactional fact table to a snapshot fact table (more on that later)

2. Create the ‘Adjustment’ facts in the ETL process and insert them into the Fact table

The second option works especial well if the source systems actually tells you when the fact was last updated i.e. LastUpdatedDate. Personally, I like to create another TransactionType dimension which will hold a member representing the ‘normal’ non-adjusted facts and a member for the ‘adjustment’ facts that have to be created in the ETL. In the end the ERD would look something like the following:

clip_image002

An example scenario using a flatten fact table for simplicity:

Let say that the fact table starts at 01/01/2013 with only 3 transactions (one for each product)

DateKey DimProduct DimTransType Measure
20130101 ProdA normal 100
20130101 ProdB normal 100
20130101 ProdC normal 100

However on 01/02/2013 it was realized that the data entered on 01/01/2013 for ProdA was incorrect and the real value should have been 75. The ETL should hopefully be able to identify the LastUpdatedDate of the transaction and calculate the difference between the historical ‘normal’ value and the new target value. That difference should then be inserted in the fact table as an ‘adjustment’ transaction.

DateKey DimProduct DimTransType Measure
20130101 ProdA normal 100
20130101 ProdB normal 100
20130101 ProdC normal 100
20130101 ProdA adjustment -25

Now normal + adjustment still sums to the correct value of 75 but report consistency can still be preserved because the report or cube can choose to see the fact values with or without the adjustment records. This will also greatly reduce the ETL fact load processing requirements as oppose to just updating the facts in place. Note: having a good table partitioning strategy is critical for large fact tables.

I also like preserving the adjustment date value as well, as that will allow the business can see how far back transactions are being corrected. In many cases I have seen managers shocked by the frequency of these ‘rare’ corrections Smile

DateKey AdjDateKey DimProduct DimTransType Measure
20130101 20130101 ProdA normal 100
20130101 20130101 ProdB normal 100
20130101 20130101 ProdC normal 100
20130101 20130103 ProdA adjustment -25