Archive for May, 2013

SQL Server Data Warehouse Database Performance: L1 L2 and L3 caches

May 19, 2013

I recently upgraded the OS on my laptop to windows 8 and there are things that I like and things that I don’t like about it. I think that most of the things that don’t like have more to do with some of the new navigation features and should become second nature over time.

One of my favorite surprises in win8 is the new Task Manager. When I open it up, I am presented with a much cleaner interface that also provides more information than the previous versions of Task Manager. One these new goodies is now you can see the sizes of the L1, L2, and L3 caches:

clip_image001

These CPU caches act like stepping stones for data as it travels from main memory (RAM) to the CPU and the closer the cache is to the CPU the faster the data can be processed by the CPU. For example L1 and L2 caches are orders of magnitude faster than the L3 cache and as you can tell from the above image the closer the cache is to the CPU (L1) the smaller the cache size is.

If you have an application that you want to process faster on the SQL Server 2012 CPU then a good way to do it would be to have the application divide up its instructions and data into batches that would ideally fit in the L1 or L2 caches. Then magically your application would run faster (there is more to it than just that but I am keeping things simple).

Well that is great but what do L1, L2, and L3 caches have to with SQL Server databases and data warehousing? Well in a word ALOT! The new SQL Server 2012 Columnstore Indexes not only compress and store data (tables and indexes) in a column-wise format but this columnar data is also broken down into “batches” that just so happen to match the sizes of the L1 and L2 caches on the SQL Server 2012 Data Warehouse Database Server. This means that all things being equal, the same data set will actually process faster on the SQL Server 2012 Data Warehouse Database Server CPU using Columnstore Index’s data “batches” than had the same data been processed without Columnstore Indexes.

And guess what, you can actually see this happen in SQL Server Management Studio SSMS when you run query on a tables that has a SQL Server 2012  Columnstore Index and view the Execution Plan.

Example:

clip_image003

The table dbo.CSI_100m_Fact has a SQL Server 2012 Columnstore Index and at execution time the query uses Actual Execution Mode = BATCH for the SQL Server 2012 Columnstore Index Scan operator which mean that it is doing its L1 and L2 magic behind the scenes and your queries now run much faster!

Note: If you create a columnstore index on your SQL Server 2012 database and your SQL 2012 data warehouse database query does not use the columnstore index then that means that your data model is wrong (fact and dimension tables) or your query does not match typical data warehouse workloads. See: Columnstore Indexes on MSDN.

SQL Server Database Performance Tuning: Table Partitioning with Fact Tables

May 5, 2013

 Table partitioning for SQL Server database fact tables is one of the best ways to improve processing for fact data loads during the ETL process and it also can dramatically improve query response time as well. SQL Server Table partitioning should be used on all but the most trivial database fact tables (and even then I would still use it). However, despite its great advantages and the fact that it has been around since SQL Server 2005, I still find it under-utilized.

Database table partitioning compliments Kimball Style Transaction or Snapshot Fact tables very nicely (See Slowly Changing Facts).  In SQL Server 2012 the number of possible partitions per table has even been increase to 15,000! This great news because that means if you partition your database fact table by days it could hold more than 40 years’ worth of data without changing the partition scheme !!! And if the database fact table is loaded nightly with a single day’s data then SQL Server database table partition switching becomes a breeze.

I won’t go into all the details but  SQL Server database table partition switching allows inserts and deletes to a partitioned database fact tables, no matter how large, to occur almost instantly. Updates can occur orders of magnitude faster as well (but I don’t recommend updating fact tables).

Example:

I am not going to go through all the steps. See Script for details

1.      Create two tables and populate them with data:

1 .     dbo.TablePatitionDemoFact – holds historical fact data

2.      dbo.TablePatitionDemoFact_switch  –  simulates new rows to be loaded into fact table from nightly ETL process

 

2.      selectmax([OrderDateKey])asTablePartitionDemoFact

3.      fromTablePartitionDemoFact

4.      unionselectcount(1)fromTablePartitionDemoFact

5.       

6.       

7.      selectdistinct[OrderDateKey]  asTablePartitionDemoFact_switch

8.      fromTest.dbo.TablePartitionDemoFact_switch

9.      unionselectcount(1)fromTablePartitionDemoFact

 

TablePartitionDemoFact:

      60398 – rows

      20040731 – max transaction date

TablePartitionDemoFact_switch:

     60398 – rows

     20040801 – max transaction date

 

Both tables have about 60,000 rows but the _switch table has rows that need to be inserted in the fact.

2. To switch the rows  by simply using the Alter Table… switch command

ALTERTABLE[dbo].[TablePartitionDemoFact_switch]SWITCHPARTITION 2

        TO[dbo].[TablePartitionDemoFact]  PARTITION (2)

 

Rerun the queries from before and you will see that rows have now been switched to the fact table in < 1 Second!

 

TablePartitionDemoFact:

    120,796 – row count

    20040801

 

TablePartitionDemoFact_switch:

    0 

 

The process can easy be reversed if the data needs to be modified or archive/deleted by simply reversing the tables is the switch command.

 

ALTERTABLE[dbo].[TablePartitionDemoFact]SWITCHPARTITION 2

        TO[dbo].[TablePartitionDemoFact_switch]  PARTITION (2)

 

TablePartitionDemoFact:

   60398 – row count

   20040731

TablePartitionDemoFact_switch:

   60398 – row count

   20040801

 

Note: Both database table schemas must be identical. This includes indexes as well! During the switch process no table or indexes updates are required. There is just a simple pointer/metadata operation and the magic happens almost instantly!

 

For the purpose of this demo I didn’t create any indexes just to make things simpler but now you might understand why SQL Server 2012 Columnstore Indexes  were not updateable! Because the assumption was that any decently sized  database fact table is already partitioned, so therefore no need to make columnstore indexes updateable.

 

The best practice is to bulk load the data, in parallel if possible, into the database switch/stage table, using minimal logged transactions. Create the required SQL Server indexes and then simply switch the partitions. It is that simple. This process will perform well on fact tables with billions or even trillions rows!

 Script

Bret Ozar has put together a wonderful list of the most import Table Partitioning Resources

I personally like using SQL Server Partition Management tool from Code Plex