SQL Server Database Performance Tuning: SQL Server Columnstore Index Throughput

SQL Server 2012 ColumnStore Indexes are the best thing to be added to SQL Server for Data Warehousing since … well ever. One of the greatest benefits  of ColumnStore Indexes is an incredible increase in performance. According to SQL Server Columnstore Index FAQ ColumnStore Indexes now rebranded as xVelocity can increase SQL query performance by a factor of 10x to 100x.

From my own experience these performance gains  are not unrealistic but the exact performance gains are dependent on the data cardinality (number of distinct values) and data types in the columns. The lower the data cardinality (few distinct values) the greater the performance gains.

Here is the claim from SQL Server Columnstore Index FAQ:

“When data is stored in column-wise fashion, the data can often be compressed more effectively than when stored in row-wise fashion. Typically there is more redundancy within a column than within a row, which usually means the data can be compressed to a greater degree. When data is more compressed, less IO is required to fetch the data into memory. In addition, a larger fraction of the data can reside in a given size of memory. Reducing IO can significantly speed up query response time. Retaining more of your working set of data in memory will speed up response time for subsequent queries that access the same data.”

Let’s test it out.

Scenario

Create a fact table

select top 100000000 a.*

into [100millionFact]

from [AdventureWorksDWDenali].[dbo].[FactInternetSales] as a,
[AdventureWorksDWDenali].[dbo].[FactInternetSales] as b ;
Go

I created a 100 million row fact table based on the 2012 AdventureWorks FactInternetSales table. It might not be very real world but it is still useful for demonstrating the capabilities of Columnstore Indexes.

Create a Columnstore Index on all columns

CREATE NONCLUSTERED COLUMNSTORE INDEX [NonClusteredColumnStoreIndex-20120723-215255] ON [dbo].[100millionFact]
(
[ProductKey],

[ShipDate]
)WITH (DROP_EXISTING = OFF) ON [PRIMARY]
GO

Execute Simple Aggregate Query

DBCC dropcleanbuffers;
GO
SET STATISTICS IO ON;
GO
SET STATISTICS TIME ON;
GO

SELECT [SalesTerritoryKey], sum([SalesAmount])
FROM [Test].[dbo].[100millionFact]
group by [SalesTerritoryKey]
OPTION (IGNORE_NONCLUSTERED_COLUMNSTORE_INDEX)
GO
SET STATISTICS IO OFF;
GO
SET STATISTICS TIME OFF;
GO

Notice that I used dropcleanbuffers to remove anything from cache and I enabled to IO and TIME stats so that we can see the important query stats.

Query without Columnstore

image

2,040,817 logical reads * 8K per page / 1024 = 15,943 MB of data scanned into memory and then aggregate by the CPU in > 2 minute.

Query with SQL Server 2012 Columnstore Indexes

Now run the the same SQL query without the IGNORE_NONCLUSTERED_COLUMNSTORE_INDEX hint.

image

1020 logical reads * 8k per page / 1024 = 7.9 MB of data scanned into memory and then aggregated in < 1 second.

Summary

In the simplistic example above, the throughput requirements for the aggregate SQL query dropped from about 16 GB to just 8 MB and CPU processing time from over 2 minutes to under 1 second by implementing a SQL Server 2012 Columnstore Index. Columnstore Indexes represent a tremendous performance increase for SQL Server data warehousing databases  because the most expensive, complex, and important part of  a SQL Server data warehouse server is getting the maximum throughput from storage to the CPU.

Code

Advertisements

Tags:

4 Responses to “SQL Server Database Performance Tuning: SQL Server Columnstore Index Throughput”

  1. Why is my DW Query so SLLLOW?! « Garrett Edmondson Says:

    […] double the Data Warehouse’s throughput to 2 MB/sec (or reduce the amount of data returned think: ColumnStore Indexes/Page Compression […]

  2. Great BLOG posts part 2 « Garrett Edmondson Says:

    […] is an excellent presentation that explains why my results for measuring Columnstore Index Throughput in a previous post was so […]

  3. Update Fact Tables? | Garrett Edmondson Says:

    […] key column as well (useless you are fortunate enough to have SQL Server 2012 then just create one Columnstore Index which includes all the surrogate keys). All that makes updating anything in a fact table very […]

  4. Slowly Changing Facts? | Garrett Edmondson Says:

    […] Some might protest that now that all of the source fact rows are now just inserted every day that the fact table will become huge. To that I say great! You want fact tables with lots of rows. The more rows the better! This is because there are very neat ways of managing large tables like Table Partitioning (see: Table Partitioning and Columnstore Indexes). […]

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s


%d bloggers like this: