Columnstore: Gooooooal!!!

June 22, 2014

In the spirit of the World Cup, I am pleased to announce that the current leader of the TPC-H Performance Results is… Microsoft SQL Server 2014 for the 1 TB, 3 TB, and 10 TB Data Warehouse.

How did SQL Server manage this virtual sweep over all of its competitors? Well you can download all the code and configurations that were used by each winning system. For example the 10 TB system’s file are located here TPC-H Result Highlights.

Of course there was a lot of hardware and software configurations that went into each of these systems but the thing that pushed SQL Server above the crowd was the Clustered Columnstore Index. See code below for the “LINEITEM” table which in the TPC-H standard is similar to a fact table. (Unfortunately, TPC has no true official star schema standard)

– Create Clustered ColumnStore Index on LINEITEM on all columns



create clustered columnstore index L_SHIPDATE_IDX on LINEITEM with(drop_existing = on,maxdop = 8)on DATESCHEME(L_SHIPDATE)



While is there a lot that can be learned from the documentation of these systems. (Joe Chang has an excellent post “TPC-H Benchmarks on SQL Server 2014 with Columnstore”)

There might be a few hidden surprises.

1.Hard Disk Drives are not dead

Contrary to popular belief, Solid State Drives will not solve your performance/scalability issues, especially for scan centric Data Warehouse Work Loads (don’t let any hardware vendor tell you different). There is a reason why the vast majority of the FastTrack Data Warehouse Appliances use HDDs. Therefore, it should come as no surprise that the three award winning systems in the TPC-H standard all use good old spinning disks.

Storage Subsystem Disk Drives(Qty 56:146GB 6G SAS 15K):

(Qty 472: 300GB 6G SAS 15K):



While that is a hell of a lot of disks, keep in mind that these systems are also graded on cost and energy consumption as well and even with those criteria HDDs still win for DW workloads.

2. Table Partitioning

“Horizontal partitioning is used on LINEITEM and ORDERS tables and the partitioning columns are L_SHIPDATE and O_ORDERDATE. The partition granularity is by week.”

Even with the substantial benefit of Clustered Columnstore segment elimination, careful table partitioning, can still have a significant effect on performance. Keep in mind that correct table partitioning requires in depth knowledge of query predicate values. With TPC-H this easy to do since there are only a few dozen queries, in the real world this will require a lot more work.

Total Row with Grouping Sets

May 26, 2014

In a previous post I mentioned that under certain situations (CPU intensive calculations like log() and exp()) Columnstore Indexes can out preform SSAS MOLAP.

However, creating results sets in T-SQL maybe a little more complicated than with MDX. In MDX for example dimension attributes can have a “All” member. This “All” member can be used to create results sets that aggregate fact data by a dimension attribute that then have a “Total” or ”All” row appended to the bottom of the attributes. This “All” represents the aggregate of all the members from the dimension(s).

In the simple example below I have aggregated the LINEITEM fact table from the TPC-H standard  by SUPPKEY and also Appended a “Total” or “All” member row.

SUPPKEY Quantity
630 16026
7697 15288
8776 16089
3751 15406
Total 62809

However, getting the Total row to appear on the bottom is not as strait forward as it is with MDX. While there are several ways to produce the Total grouping/row in T-SQL because I would also like the make the best or most efficient use of the Columnstore Index that is on the table as well.

The options:

1.   Common Table Expression

with CTE  (L_SUPPKEY,Quantity)as


       select [L_SUPPKEY]

          ,sum([L_QUANTITY])as Quantity

       from [dbo].[LINEITEMs1]

       where [L_SUPPKEY] in(3751,8776,630,7697)

       groupby [L_SUPPKEY]


 select cast(L_SUPPKEY aschar)as SUPPKEY, Quantity

 from CTE

 union all


 from CTE

While this might seem like a good option if you check out the execution plan you see that query actually scans the fact table 2 TIMES which will at least double the CPU time.

table scan

2.Table Variables

declare @tbl astable (SUPPKEY char(20), Quantity int)


insertinto @tbl

select [L_SUPPKEY],sum([L_QUANTITY])as Quantity

from [dbo].[LINEITEMs1]

where [L_SUPPKEY] in(3751,8776,630,7697)

groupby [L_SUPPKEY]


select SUPPKEY, Quantity

from @tbl



from @tbl

Even though the result set is small this is a terrible option  because inserting into a table variable forces the query into “row mode” which is much slower than “batch mode

row mode

3. Temporary Tables

select cast([L_SUPPKEY] aschar)as SUPPKEY,sum([L_QUANTITY])as Quantity

into #temp

from [dbo].[LINEITEMs1]

where [L_SUPPKEY] in (3751,8776,630,7697)

groupby [L_SUPPKEY]


select SUPPKEY,Quantity

from #temp

union all

select ‘Total’,sum(Quantity)

from #temp

While this is pretty good option it seems a little excessive to write the small result set all the way back down to disk (TempDB) just pull it back again for the Total row calculation. That is a unnecessary use of IO. Why go all the way back to disk when it is not required.

4. Grouping Sets 

select isnull(cast([L_SUPPKEY] aschar),‘Total’)as SUPPKEY

       ,sum([L_QUANTITY])as Quantity

from [dbo].[LINEITEMs1]

where [L_SUPPKEY] in(3751,8776,630,7697)

group by grouping sets(([L_SUPPKEY]),())

This is the best option because it has none of the limitations from the the above options.

SQL Saturday Houston

April 29, 2014

If you live the Houston area I will speaking at  SQL Saturday Houston #308 on May 10th.


I will be giving the following 2 presentations:

Data Warehouse ETL

“OLTP versus DW ETL Strategies (Availability Groups/Mirroring, Change Data Capture, SSIS, FlatFiles). SSIS Synchronous & Asynchronous transforms. Demo SSIS performance. ETL versus ELT. Demo FileTable.”

Measuring Data Warehouse Performance

“DW Hardware Architecture Comparisons (SAN, DAS). Throughput (MB/sec) versus IOPS. Demo SQLIO – block size, sequential & random reads/writes, IO Requests. Query IO/Time Statistics. Demo Columstore Index Segment Elimination. Demo Extended Events.”

Of course many of theses topics will be familiar to you if you have read much of my BLOG or checked out my Youtube Channel. While I do like presenting and meeting new people, I recorded the videos for those who can’t make it to my presentations.

If you can make it to SQL Saturday don’t hesitate to say hi  and bring your hardest DW questions! Winking smile

Columnstore vs. SSAS

March 23, 2014


As part of a project that I have been engaged on recently, I have had the opportunity to test the performance difference between SQL Server 2014 Clustered Columnstore Indexes and SQL Server Analysis Services Multi-dimensional (MOLAP). To test the performance differences between the two technologies, I converted the three most CPU intensive MDX queries to T-SQL queries (Clustered Columnstore). Both sets of queries were then executed from cache, with a variety of predicate values, on identical hardware. Aggregate performance in seconds shown below:


It must be emphasized that the performance of Columnstore Indexes is dependent on many variables, including but not limited to calculation logic, number of distinct values, distribution of values, predicated values, and hardware IO throughput. In this case the queries involve the following calculations: exp(), sum(), and then log().

After testing the initial set of three queries, I then tested the performance of over 20 queries divided into 3 groups. These series of tests included several levels of aggregation with different predicate values as well.



Even though Columnstore Indexes do not create pre-aggregates, like in SSAS multi-dimensional (MOLAP), under the right circumstances they can match or even out preform SSAS multi-dimensional. The performance increase is largely due to the highly efficient nature of the Columnstore Indexes being processed in “Batch” mode.

Financial Data Warehouse Video

February 21, 2014

Over the last few years that I have been blogging my  Dimensional Modeling Financial Data in SSAS and Dimensional Modeling Financial Data in SSAS pt2 posts have  by far been my most popular blog posts.

Since both of these posts are about how to setup a Financial (General Ledger/Chart of Accounts) Data Warehouse, I decided to create a Financial Data Warehouse Video which not only provides a fuller explanation of how to build a Financial Data Warehouse using the MSSQL toolset but also demonstrates why such a data warehouse is valuable.

Financial Data Warehouse

Out of all the different types of Data Warehouses that I have built in my career, Financial (GL/Chart of Accounts) Data Warehouses  provide the most business value, are the quickest to implement, and have the lowest risk.

FileTable and BULK Insert

January 21, 2014

In my video entitled Data Warehouse 12 ELT and FlatFiles  starting at the about the 10 minute, I demonstrate how to manage flat files with SQL Server FileTables.

While FileTables are great for managing flat files, I  failed to include and example of how to BULK insert data from a FileTable into a regular SQL Server table.

This is a very important step for data warehousing and unfortunately, as of the writing of this post, it is not documented at all! and in fact if you to simply preform a BULK insert of the data from the FileTable into a normal SQL table you will get the following cryptic error:

Msg 4861, Level 16, State 1, Line 1
Cannot bulk load because the file “\\Garrett-hp\sql2012\FileTableDB\ETLFileTable_Dir\TestTable.txt” could not be opened. Operating system error code 50(The request is not supported.).

Not very helpful  and there is very little about it on the WWW Sad smile , hence the origin of this post.

There are several work-a-rounds to this issue none of which are perfect however my preferred method is to simply map the FileTable directory to  a drive on the OS and then magically BULK insert works!

see screen shot below of a working example:


select * from [dbo].[ETLFileTable];
truncate table  FileTableDB.[dbo].[TestLoadTable];
BULK INSERT FileTableDB.[dbo].[TestLoadTable]
 FROM 'Z:test.txt' /*mapped drive*/
select * from FileTableDB.[dbo].[TestLoadTable];

Additionally, I would caution against using SSIS just to do the insert, especially if SSIS is installed on another server!?

The FileTable data already in the  SQL Server filegroup on the DW server would have to leave the server, traverse the network, be processed by SSIS, then return over the network again, just to be inserted into a table on the DW!!!

Avoid all that and just do a BULK insert. For those who are interested SSIS in FastMode simply does a BULK insert:

Jamie Thompson: SSIS: Destination Adapter Comparison

“Where OLE DB Destination uses an access mode of OPENROWSET, OLE DB Destination with FastLoad uses OPENROWSET with FastLoad.

FastLoad means that the adapter uses BULK INSERT for the insertion.”

Master Data Services

December 6, 2013


Master Data Services (MDS) is the SQL Server solution for master data management. Master data management (MDM) describes the efforts made by an organization to discover and define non-transactional lists of data, with the goal of compiling maintainable master lists. An MDM project generally includes an evaluation and restructuring of internal business processes along with the implementation of MDM technology. The result of a successful MDM solution is reliable, centralized data that can be analyzed, resulting in better business decisions.

ELT (ETL) Videos

September 27, 2013

There are not any BLOG posts so far this month because I have been concentrating on producing detailed YouTube videos that discuss some of the most important concepts in Data Warehousing.

the latest series of videos deal with ELT (ETL)






Performance Tuning SQL Server: Indexing

August 15, 2013

Many times I am asked about how to create indexes on a data warehouse SQL Server Database so that queries execute much faster. Unfortunately, unless they are able to upgrade to SQL Server 2012 in order to use ColumnStore Indexes, there is usually very little that I can do to speed up the queries. This is usually do to the fact that the Data Warehouse Server was not architected for data warehouse workloads to begin with and therefore is physically constrained on its throughput (MB/sec).

So I have created this little demonstration to illustrate the point:

I created a 10 million row fact table based on the good old AdventureWorks DB and setup a very simple scan centric SQL Server data warehouse query:






   whereSalesTerritoryKey= 1

        orSalesTerritoryKey= 4

        orSalesTerritoryKey= 6

        orSalesTerritoryKey= 7



With logical reads 208337 that means that 1,627 MB of data was returned from disk to satisfy the query. And magically, if you check the size of the table in SQL Server Management Studio it is also reads 1,627 MB. This is due to the fact that there were no indexes to read from and so the SQL Server had to scan the entire table:


Now indexes can be created to reduce the amount of data that needs to be scanned by only scanning the index data.









Note: The best practice is to actually create a separate index for each fact table column, but just to keep things simple I only created an index for the columns that will be queried.

Now if you run the same query again there the message box reads logical reads 36812 or 287 MB. Creating indexes drops by over 500% in the amount of data required for this query. This reduction in data being read is primarily due to the fact that the data is being read from the indexes, which are much smaller, and therefore less time consuming than read all the data directly from the table. See Index Scans:


Index Sizes:


[ix_NoCompression_10m_SalesAmount] – 27,176 * 8 / 1024 = 212 MB/sec

[ix_NoCompression_10m_SalesTerritoryKey] – 22,276 * 8 / 1024 = 174 MB/sec


However, the limitation to this approach is that the data being returned will never be smaller than indexes that are created on the fact table. Real world fact tables have billions of rows and so those indexes will not all fit into memory therefore the data must be retrieved directly from disk which will effect query performance negatively, especially if the data warehouse server has not been specifically tuned to deliver good throughput (MB/sec).


In fact even if query patterns (something which you cannot control) change, the amount of data required by the queries will remain about the same. In the following example I commented out 2 predicate values (SalesTerritory). One would think that now the data required would drop by about ½ as well.






  whereSalesTerritoryKey= 1

        orSalesTerritoryKey= 4

        –or SalesTerritoryKey = 6

        –or SalesTerritoryKey = 7


However the logical reads only dropped from 36,812 (287 MB) to 33612 (262 MB) or only about 25 MB less data required by removing ½ of the predicate values. Even if the number of predicates values were too drop to 1, the logical reads are still 30128 or 235 MB. This is due to the fact that no matter how many predicate values are used the index for that predicate still has to be scanned. Remember Data Warehouse queries require a lot of historical data and so are scan centric. While scanning the index might not seem all that bad for this small 10 million row fact table imagine what would happen if you increased the number rows of rows by a factor of 100 or even 1,000 that would be potentially 10s of billions of rows and 100s of GB for the indexes!


Some of you might be tempted to say that an index could be created for each predicate value (SalesTerritory) while that is true it would be very quickly become unmanageable and in most cases it is nearly impossible to predict all the ways that the users will query the data for historical analysis.


The real answer is to correctly architect the data warehouse server ahead of time in order to deliver the query results in a predictable and acceptable period of time. However, with that said there are a few things that you can do to increase the ability data warehouse server to deliver higher MB/second than it normal could under normal conditions.


Page Compression works by cramming more data on to each database page which means that fewer pages need to be read from disk.








Note the indexes new sizes:


ix_NoCompression_10m_SalesAmount 14279 = 11 MB

ix_NoCompression_10m_SalesTerritoryKey 14350 = 11 MB


Due to using Page Compression the indexes are significantly smaller and so it is no surprise that if the same query is run again even less data needs to be scanned from the indexes:


Logical Reads = 20475 or 159 MB


Page compression reduces the data required from 287 MB to 159 MB or a saving of about 55%. In the real world I see about a 33% savings.


And just in case you were wondering, if you upgrade to SQL Server 2012 Columnstore Indexes the data required drops from 287 MB to an incredible 1.06 MB !!!




Complex indexing strategies for Data Warehouse workloads provide no additional benefit. The Data Warehouse Server needs to be carefully architected upfront to deliver the required MB/second throughput so that the users are happy with the performance of their queries.


Performance Tuning SQL Server: Hardware Architectures Throughput Analysis

July 21, 2013

Over the last few years I have performed many SQL Server Data Warehouse Database Performance Tuning Sessions with a variety of customers in many different environments. This sessions usually consist of a weeklong engagement in which I lead an in-depth software (SQL Server) and hardware tuning classes. The results of which are put into a highly detailed roadmap and list recommendations for improved SQL Server Data Warehouse performance.

As part of this testing, I use SQLIO to benchmark the hardware performance capabilities of the SQL Server Data Warehouse database server that I am testing (see: Data Warehouse Storage Throughput (MB/sec) ). There are of course many other types of tests and analyses that I preform as well as part of this week long process but I am going to stick with SQLIO for now.

Since I have performed these tests many times I would like to share some of the results that I have seen from various customer SQL Server Data Warehouse server environments.

Note: Even though these are actual aggregate figures from actual customers, given the complexity of each environment, your results may vary. This is meant to give you an idea what kind of performance to expect.

QSDW: QuickStart Data Warehouse 1000, This is a Dell Microsoft SQL Data Warehouse Appliance that is built to Microsoft FastTrack Reference Architecture Best Practices such as parallelism.

DAS – SSD: Direct Attach Storage with Solid State Drives. SSDs not architected for Data Warehouse specific workloads.

SAN – SSD/HDD: Tier 1 SAN that uses a combination of Solid State Drives (SSD) and Hard Disk Drives (HDD). Not architected for Data Warehouse specific workloads.

SAN – HDD: Tier 2 SAN that uses only Hard Disks Drives (HDD). Not architected for Data Warehouse specific workloads.

Laptop: My personal Laptop with one 7.2 K HDD


As can be determined from the graph the Dell QuickStart Data Warehouse out preforms even Solid
State Devices and both types of SANs. This due primarily to the fact that the QSDW is specifically tuned to data warehouse workloads which are typically characterized by sequential read/writes in block sizes greater than 64KB. In fact had I increased the block sizes for the tests, the gaps in performance would have increased.


Ensuring that the Data Warehouse server is specifically tuned for Data Warehouse workloads can have a dramatic impact on performance. In fact this tuning can even be more import than the type of storage technology selected e.g. SAN, SSD, or HDD.

Note: The above graph shows server hardware performance not SQL Server Database Engine performance. The tool SQLIO has nothing to do with SQL Server even though the name might suggest otherwise


Get every new post delivered to your Inbox.

Join 49 other followers