ColumnStore Indexes are the best thing to be added to SQL Server for Data Warehousing since … well ever. One of the greatest benefits of ColumnStore Indexes is an incredible increase in performance. According to SQL Server Columnstore Index FAQ ColumnStore Indexes now rebranded as xVelocity can increase query performance by a factor of 10x to 100x.
From my own experience these performance gains are not unrealistic but the exact performance gains are dependent on the data cardinality (number of distinct values) and data types in the columns. The lower the data cardinality (few distinct values) the greater the performance gains.
Here is the claim from SQL Server Columnstore Index FAQ:
“When data is stored in column-wise fashion, the data can often be compressed more effectively than when stored in row-wise fashion. Typically there is more redundancy within a column than within a row, which usually means the data can be compressed to a greater degree. When data is more compressed, less IO is required to fetch the data into memory. In addition, a larger fraction of the data can reside in a given size of memory. Reducing IO can significantly speed up query response time. Retaining more of your working set of data in memory will speed up response time for subsequent queries that access the same data.”
Let’s test it out.
Create a fact table
select top 100000000 a.*
from [AdventureWorksDWDenali].[dbo].[FactInternetSales] as a,
[AdventureWorksDWDenali].[dbo].[FactInternetSales] as b ;
I created a 100 million row fact table based on the 2012 AdventureWorks FactInternetSales table. It might not be very real world but it is still useful for demonstrating the capabilities of Columnstore Indexes.
Create a Columnstore Index on all columns
CREATE NONCLUSTERED COLUMNSTORE INDEX [NonClusteredColumnStoreIndex-20120723-215255] ON [dbo].[100millionFact]
)WITH (DROP_EXISTING = OFF) ON [PRIMARY]
Execute Simple Aggregate Query
SET STATISTICS IO ON;
SET STATISTICS TIME ON;
SELECT [SalesTerritoryKey], sum([SalesAmount])
group by [SalesTerritoryKey]
SET STATISTICS IO OFF;
SET STATISTICS TIME OFF;
Notice that I used dropcleanbuffers to remove anything from cache and I enabled to IO and TIME stats so that we can see the important query stats.
Query without Columnstore
2,040,817 logical reads * 8K per page / 1024 = 15,943 MB of data scanned into memory and then aggregate by the CPU in > 2 minute.
Query with Columnstore
Now run the the same query without the IGNORE_NONCLUSTERED_COLUMNSTORE_INDEX hint.
1020 logical reads * 8k per page / 1024 = 7.9 MB of data scanned into memory and then aggregated in < 1 second.
In the simplistic example above, the throughput requirements for the aggregate query dropped from about 16 GB to just 8 MB and CPU processing time from over 2 minutes to under 1 second by implementing a Columnstore Indexe. Columnstore Indexes represent a tremendous performance increase for data warehousing because the most expensive, complex, and important part of data warehouse is getting the maximum throughput from storage to the CPU.