This post is a follow-up to the previous post Update Fact Tables? Recently I have seen people try to convert SQL Server transactional fact tables into quasi-slowly changing transactional fact tables. These quasi-transactional database fact tables will most likely have something like an EffectiveDate, ExpiryDate and maybe even an IsCurrent indicator columns. Below is a generic representation:
In many cases this done because the measure or fact values change over time and the analyst want not only the most recent values but the historical values as well. While something like the above diagram does address those issues, it does so at a very large cost, which will make the quasi-transactional fact unable preform into the future.
Unless you are told by the source system which fact rows have changed, all the source rows will have to be compared against the latest row versions in the SQL Server Database quasi-Transactional Fact Table. In other words, whether in SSIS (please don’t use SSIS to do this) or on the SQL Server Database Engine, both data sets of fact rows will have to be filter, sorted, joined by some key, and then compared on a column by column basis to find the changes (or you could use Hashbytes, never us Checksum !!!). The new row versions are then inserted and the old row versions have to be updated with new expiry dates.
While this might perform well on a small scale, let’s say with only a few thousand rows, imagine what would happen if the tables were increased to millions, billions, or even trillions of rows. Once you run out of memory things will get very slow very quickly!!
The correct solution is to change the Quasi-Transactional Fact Table into a true Snapshot Fact Table. In my experience too many data warehouse developers don’t understand the difference between these two very important types of fact tables.
If you have not already done so please read or re-read The Data Warehouse Lifecycle Toolkit. It should be required reading for anyone before they sit down and design their SQL Server data warehouse database. I will not rehearse what that book says because it does such a good job, but I will use one very important chart:
Notice that both Transactional and Snapshot Fact Tables are insert only tables.
<rant> Accumulating Snapshot Fact Tables are very rare and I have only used them once in my 10 years of experience. In my experience if you are justifying the update of your fact table by calling it an Accumulating Snapshot you are most likely doing something wrong. </rant>
Now with this knowledge we can go back and convert the original quasi-transactional fact table to a true Snapshot Table.
Some might protest that now that all of the source fact rows are now just inserted every day that the fact table will become huge. To that I say great! You want fact tables with lots of rows. The more rows the better! This is because there are very neat ways of managing large tables like Table Partitioning (see: Table Partitioning and Columnstore Indexes).