SQL Server Database Performance Tuning: Update Fact Tables?

I have been asked on several occasions if rows/values in a SQL Server database fact table should be updated. The answer to this question in almost all cases is NO! Fact tables should not be updated. There are several really good reasons why facts measure values should almost never be updated.

1. Consistent Reporting: If I run a report today which reports on last week’s measure values I should get the exact same measure values if I run the same report tomorrow. The end users will not be happy if their historical numbers keep changing every day that they run the report!

2. Fact Table Processing: Preforming updates on tables can be a very costly operation on the SQL Server data warehouse database server! Fact tables can and should be very large. They can contain 100’s millions, billions, or trillions of rows and they should only have one index on each surrogate key column as well (useless you are fortunate enough to have SQL Server 2012 then just create one Columnstore Index which includes all the surrogate keys). All that makes updating anything in a database fact table very costly and should be done with great care and thought.

But I also realize that we also live in the real world. And in the real world sometimes fact transaction measure values change (even though they shouldn’t!!). I won’t name names but there are plenty of source systems which allow fact transaction measure values to be updated. So what can you do? You need to record those changing facts, but you also want to preserve reporting consistency and you don’t what to exponentially increase the time it takes to process the transactional fact table.

Well there are a few options:

1. Covert the transactional fact table to a snapshot fact table (more on that later)

2. Create the ‘Adjustment’ facts in the ETL process and insert them into the Fact table

The second option works especial well if the source systems actually tells you when the fact was last updated i.e. LastUpdatedDate. Personally, I like to create another TransactionType dimension which will hold a member representing the ‘normal’ non-adjusted facts and a member for the ‘adjustment’ facts that have to be created in the ETL. In the end the ERD would look something like the following:

clip_image002

An example scenario using a flatten fact table for simplicity:

Let say that the fact table starts at 01/01/2013 with only 3 transactions (one for each product)

DateKey DimProduct DimTransType Measure
20130101 ProdA normal 100
20130101 ProdB normal 100
20130101 ProdC normal 100

However on 01/02/2013 it was realized that the data entered on 01/01/2013 for ProdA was incorrect and the real value should have been 75. The ETL should hopefully be able to identify the LastUpdatedDate of the transaction and calculate the difference between the historical ‘normal’ value and the new target value. That difference should then be inserted in the fact table as an ‘adjustment’ transaction.

DateKey DimProduct DimTransType Measure
20130101 ProdA normal 100
20130101 ProdB normal 100
20130101 ProdC normal 100
20130101 ProdA adjustment -25

Now normal + adjustment still sums to the correct value of 75 but report consistency can still be preserved because the report or cube can choose to see the fact values with or without the adjustment records. This will also greatly reduce the ETL fact load processing requirements as oppose to just updating the facts in place. Note: having a good table partitioning strategy is critical for large fact tables.

I also like preserving the adjustment date value as well, as that will allow the business can see how far back transactions are being corrected. In many cases I have seen managers shocked by the frequency of these ‘rare’ corrections Smile

DateKey AdjDateKey DimProduct DimTransType Measure
20130101 20130101 ProdA normal 100
20130101 20130101 ProdB normal 100
20130101 20130101 ProdC normal 100
20130101 20130103 ProdA adjustment -25
Advertisements

Tags:

4 Responses to “SQL Server Database Performance Tuning: Update Fact Tables?”

  1. Bernard Says:

    Great post, but how do you handle dimension changes? For example an invoice was incorrectly assigned a customer and subsequently changed the next day.

    • Garrett Edmondson Says:

      Great Question, for a type 1 SCD (slowly changing dimension) simple updates would work fine (because there is no history) but for type 2 SCD things might get more complicated. Hopefully it is very rare and can be handled on an ad hoc basis. The first rule is to make sure that the source systems get corrected first! then figure out the best way to fix the SCD 2 dimension data.

      The worst case scenario involves re-keying a fact table because a new SCD 2 row version was created and the fact surrogate key values now have to be updated. If this becomes a frequent occurrence then you need to start sounding some alarms and think about changing the dimension to SCD 1 if possible or tell the source system developers to get their act together 🙂

  2. Slowly Changing Facts? | Garrett Edmondson Says:

    […] Microsoft Business Intelligence and Data Warehousing « Update Fact Tables? […]

  3. Table Partitioning with Fact Tables | Garrett Edmondson Says:

    […] I won’t go into all the details but partition switching allows inserts and deletes to a partitioned fact tables, no matter how large, to occur almost instantly. Updates can occur orders of magnitude faster as well (but I don’t recommend updating fact tables). […]

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s


%d bloggers like this: