I have been asked on several occasions if rows/values in a fact table should be updated. The answer to this question in almost all cases is NO! Facts should not be updated. There a several really good reasons why facts values should never be updated.
1. Consistent Reporting: If I run a report today which reports on last week’s values I should get the exact same values if I run the same report tomorrow. The end users will not be happy if their historical numbers keep changing every day that they run the report!
2. Fact Table Processing: Preforming updates on tables can be a very costly operation on the data warehouse server! Fact tables can and should be very large. They can contain 100’s millions, billions, or trillions of rows and they should have one index on each surrogate key column as well (useless you are fortunate enough to have SQL Server 2012 then just create one Columnstore Index which includes all the surrogate keys). All that makes updating anything in a fact table very costly and should be done with great care and thought.
But I also realize that we also live in the real world. And in the real world sometimes fact transaction values change (even though they shouldn’t!!). I won’t name names but there are plenty of source systems which allow fact transaction values to be updated. So what can you do? You need to record those changing facts, but you also want to preserve reporting consistency and you don’t what to exponentially increase the time it takes to process the transactional fact table.
Well there are a few options:
1. Covert the transactional fact table to a snapshot fact table (more on that later)
2. Create the ‘Adjustment’ facts in the ETL process and insert them into the Fact table
The second option works especial well if the source systems actually tells you when the fact was last updated i.e. LastUpdatedDate. Personally, I like to create another TransactionType dimension which will hold a member representing the ‘normal’ non-adjusted facts and a member for the ‘adjustment’ facts that have to be created in the ETL. In the end the ERD would look something like the following:
An example scenario using a flatten fact table for simplicity:
Let say that the fact table starts at 01/01/2013 with only 3 transactions (one for each product)
However on 01/02/2013 it was realized that the data entered on 01/01/2013 for ProdA was incorrect and the real value should have been 75. The ETL should hopefully be able to identify the LastUpdatedDate of the transaction and calculate the difference between the historical ‘normal’ value and the new target value. That difference should then be inserted in the fact table as an ‘adjustment’ transaction.
Now normal + adjustment still sums to the correct value of 75 but report consistency can still be preserved because the report or cube can choose to see the fact values with or without the adjustment records. This will also greatly reduce the ETL fact load processing requirements as oppose to just updating the facts in place. Note: having a good table partitioning strategy is critical for large fact tables.
I also like preserving the adjustment date value as well, as that will allow the business can see how far back transactions are being corrected. In many cases I have seen managers shocked by the frequency of these ‘rare’ corrections