Data Loading performance considerations with Clustered Columnstore indexes

March 16, 2015

The SQL CAT resently released an excellent article called “Data Loading performance considerations with Clustered Columnstore indexes“.  It is useful even for those who, like me, have been working with the Clustered Columnstore Indexes for several years now. The Overview of Loading Data into Columnstore Indexes section mentions that an index REGORG can effect batch size. In my experience, inconsistent  performance can sometimes be caused by a batch being too small i.e. less than the 102,400 row minimum, and so the tuple mover may not be triggered in time leaving some data operations in row mode instead of the more efficient batch mode. For small to medium size fact tables that maybe not always exceed the 102,400 row batch medium it maybe necessary to execute an index REORG which immediately will trigger the tuple mover after the normal data loading process is complete. This disadvantage to his approach is that the batch sizes over time may not be optimal because they were forced to be smaller than the 102,400 row minimum. If performance starts to suffer then implementing a periodic index REDBUILD routine will rebuild the index/table with batches closer to the 1 million rows/batch goal.

Additionally, I still, from time to time, see a lot of confusion on the distinction between an Operational Data Store (ODS) and a Data Warehouse. James Serra recently wrote an excellent BLOG about Operational Data Stores that captures their most important  functions. In my opinion one of the easiest and most effective ways to create an ODS is to simply setup an asynchronous SQL Server AllwaysOn Availability Group secondary.  Several  of the Availability Group database could form the core of an ODS but would most likely be a poor substitute for a Data Warehouse.

Mastering SQL Server 2014 Data Mining

January 24, 2015


I recently had the good fortune to collaborate with Amarpreet Bassan and Debarchan Sarkar from Microsoft in writing Mastering SQL Server 2014 Data Mining. This book is unique in that it covers all the major Microsoft tools for Data Warehousing, Data Mining, and Machine Learning. Each one of the tools is a major subject area and many books could be written about each one. For example just in the area of Data Warehousing there are multiple books devoted to each Data Warehousing subcategory (Microsoft BI Stack) i.e. SSIS (ETL), Dimensional Modeling (The Kimball Group), SSAS (OLAP), SSRS (Reporting), MDS (Master Data Management) etc…

Most books devoted to these topics are aimed to guide the beginner step-by-step through the process of building solutions with each of these tools. However, this book only devotes one chapter to these topics. In other words this book is not, for the most part, aimed at the absolute DW beginner. The table of contents is a good guide to determining the topics that the book chooses to focus on.

Note: I would highly recommend getting the digital version of the book as it has full color graphics which the print version does not. I would also highly recommend downloading the book’s code since this is not a step-by-step book and it would be very time consuming to reproduce every example from scratch.


  1. Identifying, Staging, and Understanding Data
  2. Data Model Preparation and Deployment
  3. Tools of the Trade
  4. Preparing the Data
  5. Classification Models
  6. Segmentation and Association Models
  7. Sequence and Regression Models
  8. Data Mining Using Excel and Big Data
  9. Tuning the Models
  10. Troubleshooting


Below is a summary of the things from each chapter that I think are interesting and/or unique.

Chapter 1 – Identifying, Staging, and Understanding Data

I like that several Adventure works based dimensional data loading scripts are provided in order to give the reader a taste of what a real world ETL process might look like. I wish I would had access to something like this over a decade ago when I was just starting with DW.

Chapter 2 – Data Model Preparation and Deployment

This chapter makes references to several excellent outside sources that provide much more in-depth treatment of the topics discussed. Example:

Chapter 3 – Tools of the Trade

While this a brief introduction to the BI Stack. If you are not already familiar with it I would highly recommend pick up a more beginner oriented book. I do like that data warehousing is included in a data mining book because  in the real world I see too much separation between traditional IT/DW and Data Science/Data Mining/Statistics professionals. I believe that having both work together will only improve the final result.

Chapter 4 – Preparing the Data

I think that is it great that this chapter dives into the details of extracting data from Oracle and IBM. In the real world most data warehouses or data mining solutions do not extract all of their data from just one source (SQL Server).

Chapter 5 – Chapter 7

While these chapters provide a good over view of the various SQL Server Data Mining Algorithms. I really like that they also include techniques for tuning the algorithms in order improve their accuracy.

Chapter 8 – Data Mining Using Excel and Big Data

While the Data Mining plugin for excel has been around for a number years now this chapter as provides an in-depth discussion of the newer data mining tools as well, namely HDInsight and Azure Machine Learning. I would also highly recommend Predictive Analytics with Microsoft Azure Machine Learning: Build and Deploy Actionable Solutions in Minutes for those who are just getting started with AzureML.

Chapter 9 – Tuning the Models

This perhaps my favorite chapter because it uses real world data. No AdventureWorks insight!!! Plus the dataset (Housing Affordability Data System (HADS)) is also very interesting because I am looking at purchasing a house in the near future. I have spent many hours with the dataset making additional data mining discoveries than those covered in the book.




APS AU 3 Documentation

December 15, 2014

Microsoft just released the APS AU 3 Documentation. This is the first time that Microsoft has shared detailed information about APS/PDW with the public. In the good old days (like when I was working on PDW v1 implementations) this documentation was not shared with the public. Hence the reason for the absence of PDW/APS information on my BLOG even though I have been working with it for several years.

The documentation servers not only as the “bible” for those who already have an APS/PDW but is also an excellent introduction to Massive Parallel Processing (MPP) systems as well. See example below:


The willingness of Microsoft to release this information signals that the product is maturing.

SQL Server 2014 Fast Track

November 25, 2014

Over the past several months the major hardware vendors have been hard at work developing detailed documentation around how to build an optimal data warehouse using SQL Server 2014. My friend and MSFT APS (PDW) TSP James Serra (you get a prize for knowing what all that means) has already blogged about it here. But I would like to add a few of my own observations as well.


HP DL380 Gen8 and HP LE PCIe Workload Accelerator 28TB/45 TB Data Warehouse Fast Track Reference Architecture

HP DL580 Gen 8 and HP LE PCIe Workload Accelerator 90TB Microsoft SQL Server Data Warehouse Fast Track Reference Architecture

Generally speaking, I have been impressed over that last few years with HP dedication in the DW space. This is based on having worked closely with them in past with FastTrack and PDW/APS. I like that they are making use of JBOD storage which also used in the APS (aka PDW)


HP sql server 2014


JBOD is a much more efficient approach and still preforms the same as the classic storage array/enclosure and storage controllers/processors but usually at a lower cost.

I am also happy to see that they stayed with the strategy of spreading database files from multiple file groups over the LUNs/mount points.

hp sql server 2014 2

This is that same basic setup that I cover in my DW videos: Data Warehouse 5 Parallelism.

I was much less impressed with the EMC FTDW solution it smells of EMC trying to force fit it’s VNX 5600 storage platform into the DW world. Most of the document feels like a VNX OLTP sales pitch that stresses things that are not major considerations for a DW solution.


p.9 “It allows the storage system to provide flash-drive class performance to data with a high locality of reference, which increases IOPS without placing all the data onto flash drives”. This sounds great for an OLTP solution (random reads/writes) but not for DW (sequential reads/writes).

“Multicore Fully Automated Storage Tiering (FAST) Cache reduces the load on the hard drive (HDD) by absorbing I/O bursts forms applications…” Again sounds good for OLTP but not applicable for DW. How do I/O bursts work for a DW? How would auto-tiering handle large analytical/aggregate queries i.e. last year sales aggregated by quarter? Where is the I/O hot spot there?

However, if EMC is you corporate standard you may have little choice in the matter and so choosing their FastTrack solution maybe your best option.

In summary

Not much has changed in the SQL Server 2014 FastTrack. Other than Clustered ColumStore Indexes are updatable and give world class performance. I think that JBOD, much to some SAN vendors dismay, is the future of at least DW if not OLTP storage.

Tabular and NUMA

July 30, 2014

Recently a great white paper Entitled “An Analysis Services Case Study: Using Tabular Models in a Large-scale Commercial Solution” was released which provides a lot of information that would be helpful when on is faced with evaluating the various Microsoft BI tools (SSAS Tabular, SSAS Multi-dimensional, and ColumnStore Indexes).

I will not reproduce the whole white paper on my BLOG but below are some quotes from the white paper that I found particularly interesting:

ColumnStore Indexes:

“Unpredictable Query speed. In SQL 2012, specific query plans can really take advantage of columnstores (which are an absolute need for the volume of data of inContact). However, as soon as the query became complex (for example, mixing SUM with DISTINCT COUNT in the same query) SQL Server stopped using the columnstore and reverted to row mode, leading to poor performance. It looked nearly impossible to produce a general-purpose query engine that gave enough freedom to users in expressing their query while retaining efficiency.”

While I too had problems with SQL Server 2012 Non-Clustered ColumnStore Indexes and the query optimizer picking the much slower “row mode” instead of “batch mode”, after using SQL Server 2014 Clustered ColumnStore Indexes for almost a year, I have not seen any major issues related to “batch mode”. As a matter of fact, I have even observed about a 15% – 30% performance improvement of Clustered ColumnStore Indexes over the non-clustered variety.

The section that by far was the most surprising to was the section entitled “The complexity and ubiquity of NUMA ” as it reminded me of something that I ran across a few years ago when I was testing the then brand new SSAS Tabular with NUMA servers.

AS the article clearly states:

“Most of the servers available on the market with enough expansion to handle terabytes of RAM are based on the Non Uniform Memory Access (NUMA) architecture. This is a big issue because Tabular is not NUMA aware.”

In other words any enterprise hardware will make use of NUMA nodes and so putting a none NUMA app on this kind of hardware would be the wrong approach. What then follows in the article is a good explanation of what NUMA is and why it is important. The final two paragraphs really hit home for me why SSAS Tabular, without a significant work around, is not enterprise worthy:

“As you can see, programmers need to optimize the software to run efficiently on NUMA nodes. You can’t simply use any software on NUMA and expect good performance. The problems of memory access speeds are compounded when the software relies heavily on RAM.

Unfortunately, not only is Tabular not NUMA aware, it also uses RAM as primary storage. Thus, the marriage between Tabular and NUMA is a not an easy one.”

While this is a bit of an understatement, it is now easy to find out if your server has multiple NUMA nodes by simply opening up Task Manager to the CPU tab. If your server has multiple NUMA nodes then you will see CPU usage for each node. See example below of an 8 NUMA node server:

numa nodes


The solution to the Tabular NUMA node problem is (assuming that it is possible) to break the data up into separate distributions according to the number of NUMA nodes on the server and then create an equal number of VMs, each with an instance of SSAS Tabular. See below:

“· Use Hyper-V VMs to set node affinity. Tabular might not be NUMA aware, but it runs great if you install a tabular instance on VM that is affinitized to a particular node. This approach has an added benefit; you don’t need to adjust memory configuration settings.

· Distinct count performs better in a Tabular implementation. This is a well-known fact about Tabular, but if you are new to the technology, it’s worth remembering. This is especially true if the alternative you are evaluating is SSAS Multidimensional.”

Columnstore: Gooooooal!!!

June 22, 2014

In the spirit of the World Cup, I am pleased to announce that the current leader of the TPC-H Performance Results is… Microsoft SQL Server 2014 for the 1 TB, 3 TB, and 10 TB Data Warehouse.

How did SQL Server manage this virtual sweep over all of its competitors? Well you can download all the code and configurations that were used by each winning system. For example the 10 TB system’s file are located here TPC-H Result Highlights.

Of course there was a lot of hardware and software configurations that went into each of these systems but the thing that pushed SQL Server above the crowd was the Clustered Columnstore Index. See code below for the “LINEITEM” table which in the TPC-H standard is similar to a fact table. (Unfortunately, TPC has no true official star schema standard)

— Create Clustered ColumnStore Index on LINEITEM on all columns



create clustered columnstore index L_SHIPDATE_IDX on LINEITEM with(drop_existing = on,maxdop = 8)on DATESCHEME(L_SHIPDATE)



While is there a lot that can be learned from the documentation of these systems. (Joe Chang has an excellent post “TPC-H Benchmarks on SQL Server 2014 with Columnstore”)

There might be a few hidden surprises.

1.Hard Disk Drives are not dead

Contrary to popular belief, Solid State Drives will not solve your performance/scalability issues, especially for scan centric Data Warehouse Work Loads (don’t let any hardware vendor tell you different). There is a reason why the vast majority of the FastTrack Data Warehouse Appliances use HDDs. Therefore, it should come as no surprise that the three award winning systems in the TPC-H standard all use good old spinning disks.

Storage Subsystem Disk Drives(Qty 56:146GB 6G SAS 15K):

(Qty 472: 300GB 6G SAS 15K):



While that is a hell of a lot of disks, keep in mind that these systems are also graded on cost and energy consumption as well and even with those criteria HDDs still win for DW workloads.

2. Table Partitioning

“Horizontal partitioning is used on LINEITEM and ORDERS tables and the partitioning columns are L_SHIPDATE and O_ORDERDATE. The partition granularity is by week.”

Even with the substantial benefit of Clustered Columnstore segment elimination, careful table partitioning, can still have a significant effect on performance. Keep in mind that correct table partitioning requires in depth knowledge of query predicate values. With TPC-H this easy to do since there are only a few dozen queries, in the real world this will require a lot more work.

Total Row with Grouping Sets

May 26, 2014

In a previous post I mentioned that under certain situations (CPU intensive calculations like log() and exp()) Columnstore Indexes can out preform SSAS MOLAP.

However, creating results sets in T-SQL maybe a little more complicated than with MDX. In MDX for example dimension attributes can have a “All” member. This “All” member can be used to create results sets that aggregate fact data by a dimension attribute that then have a “Total” or ”All” row appended to the bottom of the attributes. This “All” represents the aggregate of all the members from the dimension(s).

In the simple example below I have aggregated the LINEITEM fact table from the TPC-H standard  by SUPPKEY and also Appended a “Total” or “All” member row.

SUPPKEY Quantity
630 16026
7697 15288
8776 16089
3751 15406
Total 62809

However, getting the Total row to appear on the bottom is not as strait forward as it is with MDX. While there are several ways to produce the Total grouping/row in T-SQL because I would also like the make the best or most efficient use of the Columnstore Index that is on the table as well.

The options:

1.   Common Table Expression

with CTE  (L_SUPPKEY,Quantity)as


       select [L_SUPPKEY]

          ,sum([L_QUANTITY])as Quantity

       from [dbo].[LINEITEMs1]

       where [L_SUPPKEY] in(3751,8776,630,7697)

       groupby [L_SUPPKEY]


 select cast(L_SUPPKEY aschar)as SUPPKEY, Quantity

 from CTE

 union all


 from CTE

While this might seem like a good option if you check out the execution plan you see that query actually scans the fact table 2 TIMES which will at least double the CPU time.

table scan

2.Table Variables

declare @tbl astable (SUPPKEY char(20), Quantity int)


insertinto @tbl

select [L_SUPPKEY],sum([L_QUANTITY])as Quantity

from [dbo].[LINEITEMs1]

where [L_SUPPKEY] in(3751,8776,630,7697)

groupby [L_SUPPKEY]


select SUPPKEY, Quantity

from @tbl



from @tbl

Even though the result set is small this is a terrible option  because inserting into a table variable forces the query into “row mode” which is much slower than “batch mode

row mode

3. Temporary Tables

select cast([L_SUPPKEY] aschar)as SUPPKEY,sum([L_QUANTITY])as Quantity

into #temp

from [dbo].[LINEITEMs1]

where [L_SUPPKEY] in (3751,8776,630,7697)

groupby [L_SUPPKEY]


select SUPPKEY,Quantity

from #temp

union all

select ‘Total’,sum(Quantity)

from #temp

While this is pretty good option it seems a little excessive to write the small result set all the way back down to disk (TempDB) just pull it back again for the Total row calculation. That is a unnecessary use of IO. Why go all the way back to disk when it is not required.

4. Grouping Sets 

select isnull(cast([L_SUPPKEY] aschar),‘Total’)as SUPPKEY

       ,sum([L_QUANTITY])as Quantity

from [dbo].[LINEITEMs1]

where [L_SUPPKEY] in(3751,8776,630,7697)

group by grouping sets(([L_SUPPKEY]),())

This is the best option because it has none of the limitations from the the above options.

SQL Saturday Houston

April 29, 2014

If you live the Houston area I will speaking at  SQL Saturday Houston #308 on May 10th.


I will be giving the following 2 presentations:

Data Warehouse ETL

“OLTP versus DW ETL Strategies (Availability Groups/Mirroring, Change Data Capture, SSIS, FlatFiles). SSIS Synchronous & Asynchronous transforms. Demo SSIS performance. ETL versus ELT. Demo FileTable.”

Measuring Data Warehouse Performance

“DW Hardware Architecture Comparisons (SAN, DAS). Throughput (MB/sec) versus IOPS. Demo SQLIO – block size, sequential & random reads/writes, IO Requests. Query IO/Time Statistics. Demo Columstore Index Segment Elimination. Demo Extended Events.”

Of course many of theses topics will be familiar to you if you have read much of my BLOG or checked out my Youtube Channel. While I do like presenting and meeting new people, I recorded the videos for those who can’t make it to my presentations.

If you can make it to SQL Saturday don’t hesitate to say hi  and bring your hardest DW questions! Winking smile

Columnstore vs. SSAS

March 23, 2014


As part of a project that I have been engaged on recently, I have had the opportunity to test the performance difference between SQL Server 2014 Clustered Columnstore Indexes and SQL Server Analysis Services Multi-dimensional (MOLAP). To test the performance differences between the two technologies, I converted the three most CPU intensive MDX queries to T-SQL queries (Clustered Columnstore). Both sets of queries were then executed from cache, with a variety of predicate values, on identical hardware. Aggregate performance in seconds shown below:


It must be emphasized that the performance of Columnstore Indexes is dependent on many variables, including but not limited to calculation logic, number of distinct values, distribution of values, predicated values, and hardware IO throughput. In this case the queries involve the following calculations: exp(), sum(), and then log().

After testing the initial set of three queries, I then tested the performance of over 20 queries divided into 3 groups. These series of tests included several levels of aggregation with different predicate values as well.



Even though Columnstore Indexes do not create pre-aggregates, like in SSAS multi-dimensional (MOLAP), under the right circumstances they can match or even out preform SSAS multi-dimensional. The performance increase is largely due to the highly efficient nature of the Columnstore Indexes being processed in “Batch” mode.

Financial Data Warehouse Video

February 21, 2014

Over the last few years that I have been blogging my  Dimensional Modeling Financial Data in SSAS and Dimensional Modeling Financial Data in SSAS pt2 posts have  by far been my most popular blog posts.

Since both of these posts are about how to setup a Financial (General Ledger/Chart of Accounts) Data Warehouse, I decided to create a Financial Data Warehouse Video which not only provides a fuller explanation of how to build a Financial Data Warehouse using the MSSQL toolset but also demonstrates why such a data warehouse is valuable.

Financial Data Warehouse

Out of all the different types of Data Warehouses that I have built in my career, Financial (GL/Chart of Accounts) Data Warehouses  provide the most business value, are the quickest to implement, and have the lowest risk.


Get every new post delivered to your Inbox.

Join 63 other followers