Tabular and NUMA

July 30, 2014

Recently a great white paper Entitled “An Analysis Services Case Study: Using Tabular Models in a Large-scale Commercial Solution” was released which provides a lot of information that would be helpful when on is faced with evaluating the various Microsoft BI tools (SSAS Tabular, SSAS Multi-dimensional, and ColumnStore Indexes).

I will not reproduce the whole white paper on my BLOG but below are some quotes from the white paper that I found particularly interesting:

ColumnStore Indexes:

“Unpredictable Query speed. In SQL 2012, specific query plans can really take advantage of columnstores (which are an absolute need for the volume of data of inContact). However, as soon as the query became complex (for example, mixing SUM with DISTINCT COUNT in the same query) SQL Server stopped using the columnstore and reverted to row mode, leading to poor performance. It looked nearly impossible to produce a general-purpose query engine that gave enough freedom to users in expressing their query while retaining efficiency.”

While I too had problems with SQL Server 2012 Non-Clustered ColumnStore Indexes and the query optimizer picking the much slower “row mode” instead of “batch mode”, after using SQL Server 2014 Clustered ColumnStore Indexes for almost a year, I have not seen any major issues related to “batch mode”. As a matter of fact, I have even observed about a 15% – 30% performance improvement of Clustered ColumnStore Indexes over the non-clustered variety.

The section that by far was the most surprising to was the section entitled “The complexity and ubiquity of NUMA ” as it reminded me of something that I ran across a few years ago when I was testing the then brand new SSAS Tabular with NUMA servers.

AS the article clearly states:

“Most of the servers available on the market with enough expansion to handle terabytes of RAM are based on the Non Uniform Memory Access (NUMA) architecture. This is a big issue because Tabular is not NUMA aware.”

In other words any enterprise hardware will make use of NUMA nodes and so putting a none NUMA app on this kind of hardware would be the wrong approach. What then follows in the article is a good explanation of what NUMA is and why it is important. The final two paragraphs really hit home for me why SSAS Tabular, without a significant work around, is not enterprise worthy:

“As you can see, programmers need to optimize the software to run efficiently on NUMA nodes. You can’t simply use any software on NUMA and expect good performance. The problems of memory access speeds are compounded when the software relies heavily on RAM.

Unfortunately, not only is Tabular not NUMA aware, it also uses RAM as primary storage. Thus, the marriage between Tabular and NUMA is a not an easy one.”

While this is a bit of an understatement, it is now easy to find out if your server has multiple NUMA nodes by simply opening up Task Manager to the CPU tab. If your server has multiple NUMA nodes then you will see CPU usage for each node. See example below of an 8 NUMA node server:

numa nodes


The solution to the Tabular NUMA node problem is (assuming that it is possible) to break the data up into separate distributions according to the number of NUMA nodes on the server and then create an equal number of VMs, each with an instance of SSAS Tabular. See below:

“· Use Hyper-V VMs to set node affinity. Tabular might not be NUMA aware, but it runs great if you install a tabular instance on VM that is affinitized to a particular node. This approach has an added benefit; you don’t need to adjust memory configuration settings.

· Distinct count performs better in a Tabular implementation. This is a well-known fact about Tabular, but if you are new to the technology, it’s worth remembering. This is especially true if the alternative you are evaluating is SSAS Multidimensional.”


Columnstore: Gooooooal!!!

June 22, 2014

In the spirit of the World Cup, I am pleased to announce that the current leader of the TPC-H Performance Results is… Microsoft SQL Server 2014 for the 1 TB, 3 TB, and 10 TB Data Warehouse.

How did SQL Server manage this virtual sweep over all of its competitors? Well you can download all the code and configurations that were used by each winning system. For example the 10 TB system’s file are located here TPC-H Result Highlights.

Of course there was a lot of hardware and software configurations that went into each of these systems but the thing that pushed SQL Server above the crowd was the Clustered Columnstore Index. See code below for the “LINEITEM” table which in the TPC-H standard is similar to a fact table. (Unfortunately, TPC has no true official star schema standard)

— Create Clustered ColumnStore Index on LINEITEM on all columns



create clustered columnstore index L_SHIPDATE_IDX on LINEITEM with(drop_existing = on,maxdop = 8)on DATESCHEME(L_SHIPDATE)



While is there a lot that can be learned from the documentation of these systems. (Joe Chang has an excellent post “TPC-H Benchmarks on SQL Server 2014 with Columnstore”)

There might be a few hidden surprises.

1.Hard Disk Drives are not dead

Contrary to popular belief, Solid State Drives will not solve your performance/scalability issues, especially for scan centric Data Warehouse Work Loads (don’t let any hardware vendor tell you different). There is a reason why the vast majority of the FastTrack Data Warehouse Appliances use HDDs. Therefore, it should come as no surprise that the three award winning systems in the TPC-H standard all use good old spinning disks.

Storage Subsystem Disk Drives(Qty 56:146GB 6G SAS 15K):

(Qty 472: 300GB 6G SAS 15K):



While that is a hell of a lot of disks, keep in mind that these systems are also graded on cost and energy consumption as well and even with those criteria HDDs still win for DW workloads.

2. Table Partitioning

“Horizontal partitioning is used on LINEITEM and ORDERS tables and the partitioning columns are L_SHIPDATE and O_ORDERDATE. The partition granularity is by week.”

Even with the substantial benefit of Clustered Columnstore segment elimination, careful table partitioning, can still have a significant effect on performance. Keep in mind that correct table partitioning requires in depth knowledge of query predicate values. With TPC-H this easy to do since there are only a few dozen queries, in the real world this will require a lot more work.

Total Row with Grouping Sets

May 26, 2014

In a previous post I mentioned that under certain situations (CPU intensive calculations like log() and exp()) Columnstore Indexes can out preform SSAS MOLAP.

However, creating results sets in T-SQL maybe a little more complicated than with MDX. In MDX for example dimension attributes can have a “All” member. This “All” member can be used to create results sets that aggregate fact data by a dimension attribute that then have a “Total” or ”All” row appended to the bottom of the attributes. This “All” represents the aggregate of all the members from the dimension(s).

In the simple example below I have aggregated the LINEITEM fact table from the TPC-H standard  by SUPPKEY and also Appended a “Total” or “All” member row.

SUPPKEY Quantity
630 16026
7697 15288
8776 16089
3751 15406
Total 62809

However, getting the Total row to appear on the bottom is not as strait forward as it is with MDX. While there are several ways to produce the Total grouping/row in T-SQL because I would also like the make the best or most efficient use of the Columnstore Index that is on the table as well.

The options:

1.   Common Table Expression

with CTE  (L_SUPPKEY,Quantity)as


       select [L_SUPPKEY]

          ,sum([L_QUANTITY])as Quantity

       from [dbo].[LINEITEMs1]

       where [L_SUPPKEY] in(3751,8776,630,7697)

       groupby [L_SUPPKEY]


 select cast(L_SUPPKEY aschar)as SUPPKEY, Quantity

 from CTE

 union all


 from CTE

While this might seem like a good option if you check out the execution plan you see that query actually scans the fact table 2 TIMES which will at least double the CPU time.

table scan

2.Table Variables

declare @tbl astable (SUPPKEY char(20), Quantity int)


insertinto @tbl

select [L_SUPPKEY],sum([L_QUANTITY])as Quantity

from [dbo].[LINEITEMs1]

where [L_SUPPKEY] in(3751,8776,630,7697)

groupby [L_SUPPKEY]


select SUPPKEY, Quantity

from @tbl



from @tbl

Even though the result set is small this is a terrible option  because inserting into a table variable forces the query into “row mode” which is much slower than “batch mode

row mode

3. Temporary Tables

select cast([L_SUPPKEY] aschar)as SUPPKEY,sum([L_QUANTITY])as Quantity

into #temp

from [dbo].[LINEITEMs1]

where [L_SUPPKEY] in (3751,8776,630,7697)

groupby [L_SUPPKEY]


select SUPPKEY,Quantity

from #temp

union all

select ‘Total’,sum(Quantity)

from #temp

While this is pretty good option it seems a little excessive to write the small result set all the way back down to disk (TempDB) just pull it back again for the Total row calculation. That is a unnecessary use of IO. Why go all the way back to disk when it is not required.

4. Grouping Sets 

select isnull(cast([L_SUPPKEY] aschar),‘Total’)as SUPPKEY

       ,sum([L_QUANTITY])as Quantity

from [dbo].[LINEITEMs1]

where [L_SUPPKEY] in(3751,8776,630,7697)

group by grouping sets(([L_SUPPKEY]),())

This is the best option because it has none of the limitations from the the above options.

SQL Saturday Houston

April 29, 2014

If you live the Houston area I will speaking at  SQL Saturday Houston #308 on May 10th.


I will be giving the following 2 presentations:

Data Warehouse ETL

“OLTP versus DW ETL Strategies (Availability Groups/Mirroring, Change Data Capture, SSIS, FlatFiles). SSIS Synchronous & Asynchronous transforms. Demo SSIS performance. ETL versus ELT. Demo FileTable.”

Measuring Data Warehouse Performance

“DW Hardware Architecture Comparisons (SAN, DAS). Throughput (MB/sec) versus IOPS. Demo SQLIO – block size, sequential & random reads/writes, IO Requests. Query IO/Time Statistics. Demo Columstore Index Segment Elimination. Demo Extended Events.”

Of course many of theses topics will be familiar to you if you have read much of my BLOG or checked out my Youtube Channel. While I do like presenting and meeting new people, I recorded the videos for those who can’t make it to my presentations.

If you can make it to SQL Saturday don’t hesitate to say hi  and bring your hardest DW questions! Winking smile

Columnstore vs. SSAS

March 23, 2014


As part of a project that I have been engaged on recently, I have had the opportunity to test the performance difference between SQL Server 2014 Clustered Columnstore Indexes and SQL Server Analysis Services Multi-dimensional (MOLAP). To test the performance differences between the two technologies, I converted the three most CPU intensive MDX queries to T-SQL queries (Clustered Columnstore). Both sets of queries were then executed from cache, with a variety of predicate values, on identical hardware. Aggregate performance in seconds shown below:


It must be emphasized that the performance of Columnstore Indexes is dependent on many variables, including but not limited to calculation logic, number of distinct values, distribution of values, predicated values, and hardware IO throughput. In this case the queries involve the following calculations: exp(), sum(), and then log().

After testing the initial set of three queries, I then tested the performance of over 20 queries divided into 3 groups. These series of tests included several levels of aggregation with different predicate values as well.



Even though Columnstore Indexes do not create pre-aggregates, like in SSAS multi-dimensional (MOLAP), under the right circumstances they can match or even out preform SSAS multi-dimensional. The performance increase is largely due to the highly efficient nature of the Columnstore Indexes being processed in “Batch” mode.

Financial Data Warehouse Video

February 21, 2014

Over the last few years that I have been blogging my  Dimensional Modeling Financial Data in SSAS and Dimensional Modeling Financial Data in SSAS pt2 posts have  by far been my most popular blog posts.

Since both of these posts are about how to setup a Financial (General Ledger/Chart of Accounts) Data Warehouse, I decided to create a Financial Data Warehouse Video which not only provides a fuller explanation of how to build a Financial Data Warehouse using the MSSQL toolset but also demonstrates why such a data warehouse is valuable.

Financial Data Warehouse

Out of all the different types of Data Warehouses that I have built in my career, Financial (GL/Chart of Accounts) Data Warehouses  provide the most business value, are the quickest to implement, and have the lowest risk.

FileTable and BULK Insert

January 21, 2014

In my video entitled Data Warehouse 12 ELT and FlatFiles  starting at the about the 10 minute, I demonstrate how to manage flat files with SQL Server FileTables.

While FileTables are great for managing flat files, I  failed to include and example of how to BULK insert data from a FileTable into a regular SQL Server table.

This is a very important step for data warehousing and unfortunately, as of the writing of this post, it is not documented at all! and in fact if you to simply preform a BULK insert of the data from the FileTable into a normal SQL table you will get the following cryptic error:

Msg 4861, Level 16, State 1, Line 1
Cannot bulk load because the file “\\Garrett-hp\sql2012\FileTableDB\ETLFileTable_Dir\TestTable.txt” could not be opened. Operating system error code 50(The request is not supported.).

Not very helpful  and there is very little about it on the WWW Sad smile , hence the origin of this post.

There are several work-a-rounds to this issue none of which are perfect however my preferred method is to simply map the FileTable directory to  a drive on the OS and then magically BULK insert works!

see screen shot below of a working example:


select * from [dbo].[ETLFileTable];
truncate table  FileTableDB.[dbo].[TestLoadTable];
BULK INSERT FileTableDB.[dbo].[TestLoadTable]
 FROM 'Z:test.txt' /*mapped drive*/
select * from FileTableDB.[dbo].[TestLoadTable];

Additionally, I would caution against using SSIS just to do the insert, especially if SSIS is installed on another server!?

The FileTable data already in the  SQL Server filegroup on the DW server would have to leave the server, traverse the network, be processed by SSIS, then return over the network again, just to be inserted into a table on the DW!!!

Avoid all that and just do a BULK insert. For those who are interested SSIS in FastMode simply does a BULK insert:

Jamie Thompson: SSIS: Destination Adapter Comparison

“Where OLE DB Destination uses an access mode of OPENROWSET, OLE DB Destination with FastLoad uses OPENROWSET with FastLoad.

FastLoad means that the adapter uses BULK INSERT for the insertion.”

Master Data Services

December 6, 2013


Master Data Services (MDS) is the SQL Server solution for master data management. Master data management (MDM) describes the efforts made by an organization to discover and define non-transactional lists of data, with the goal of compiling maintainable master lists. An MDM project generally includes an evaluation and restructuring of internal business processes along with the implementation of MDM technology. The result of a successful MDM solution is reliable, centralized data that can be analyzed, resulting in better business decisions.

ELT (ETL) Videos

September 27, 2013

There are not any BLOG posts so far this month because I have been concentrating on producing detailed YouTube videos that discuss some of the most important concepts in Data Warehousing.

the latest series of videos deal with ELT (ETL)






Performance Tuning SQL Server: Indexing

August 15, 2013

Many times I am asked about how to create indexes on a data warehouse SQL Server Database so that queries execute much faster. Unfortunately, unless they are able to upgrade to SQL Server 2012 in order to use ColumnStore Indexes, there is usually very little that I can do to speed up the queries. This is usually do to the fact that the Data Warehouse Server was not architected for data warehouse workloads to begin with and therefore is physically constrained on its throughput (MB/sec).

So I have created this little demonstration to illustrate the point:

I created a 10 million row fact table based on the good old AdventureWorks DB and setup a very simple scan centric SQL Server data warehouse query:






   whereSalesTerritoryKey= 1

        orSalesTerritoryKey= 4

        orSalesTerritoryKey= 6

        orSalesTerritoryKey= 7



With logical reads 208337 that means that 1,627 MB of data was returned from disk to satisfy the query. And magically, if you check the size of the table in SQL Server Management Studio it is also reads 1,627 MB. This is due to the fact that there were no indexes to read from and so the SQL Server had to scan the entire table:


Now indexes can be created to reduce the amount of data that needs to be scanned by only scanning the index data.









Note: The best practice is to actually create a separate index for each fact table column, but just to keep things simple I only created an index for the columns that will be queried.

Now if you run the same query again there the message box reads logical reads 36812 or 287 MB. Creating indexes drops by over 500% in the amount of data required for this query. This reduction in data being read is primarily due to the fact that the data is being read from the indexes, which are much smaller, and therefore less time consuming than read all the data directly from the table. See Index Scans:


Index Sizes:


[ix_NoCompression_10m_SalesAmount] – 27,176 * 8 / 1024 = 212 MB/sec

[ix_NoCompression_10m_SalesTerritoryKey] – 22,276 * 8 / 1024 = 174 MB/sec


However, the limitation to this approach is that the data being returned will never be smaller than indexes that are created on the fact table. Real world fact tables have billions of rows and so those indexes will not all fit into memory therefore the data must be retrieved directly from disk which will effect query performance negatively, especially if the data warehouse server has not been specifically tuned to deliver good throughput (MB/sec).


In fact even if query patterns (something which you cannot control) change, the amount of data required by the queries will remain about the same. In the following example I commented out 2 predicate values (SalesTerritory). One would think that now the data required would drop by about ½ as well.






  whereSalesTerritoryKey= 1

        orSalesTerritoryKey= 4

        –or SalesTerritoryKey = 6

        –or SalesTerritoryKey = 7


However the logical reads only dropped from 36,812 (287 MB) to 33612 (262 MB) or only about 25 MB less data required by removing ½ of the predicate values. Even if the number of predicates values were too drop to 1, the logical reads are still 30128 or 235 MB. This is due to the fact that no matter how many predicate values are used the index for that predicate still has to be scanned. Remember Data Warehouse queries require a lot of historical data and so are scan centric. While scanning the index might not seem all that bad for this small 10 million row fact table imagine what would happen if you increased the number rows of rows by a factor of 100 or even 1,000 that would be potentially 10s of billions of rows and 100s of GB for the indexes!


Some of you might be tempted to say that an index could be created for each predicate value (SalesTerritory) while that is true it would be very quickly become unmanageable and in most cases it is nearly impossible to predict all the ways that the users will query the data for historical analysis.


The real answer is to correctly architect the data warehouse server ahead of time in order to deliver the query results in a predictable and acceptable period of time. However, with that said there are a few things that you can do to increase the ability data warehouse server to deliver higher MB/second than it normal could under normal conditions.


Page Compression works by cramming more data on to each database page which means that fewer pages need to be read from disk.








Note the indexes new sizes:


ix_NoCompression_10m_SalesAmount 14279 = 11 MB

ix_NoCompression_10m_SalesTerritoryKey 14350 = 11 MB


Due to using Page Compression the indexes are significantly smaller and so it is no surprise that if the same query is run again even less data needs to be scanned from the indexes:


Logical Reads = 20475 or 159 MB


Page compression reduces the data required from 287 MB to 159 MB or a saving of about 55%. In the real world I see about a 33% savings.


And just in case you were wondering, if you upgrade to SQL Server 2012 Columnstore Indexes the data required drops from 287 MB to an incredible 1.06 MB !!!




Complex indexing strategies for Data Warehouse workloads provide no additional benefit. The Data Warehouse Server needs to be carefully architected upfront to deliver the required MB/second throughput so that the users are happy with the performance of their queries.