Archive for June, 2014

Columnstore: Gooooooal!!!

June 22, 2014

In the spirit of the World Cup, I am pleased to announce that the current leader of the TPC-H Performance Results is… Microsoft SQL Server 2014 for the 1 TB, 3 TB, and 10 TB Data Warehouse.

How did SQL Server manage this virtual sweep over all of its competitors? Well you can download all the code and configurations that were used by each winning system. For example the 10 TB system’s file are located here TPC-H Result Highlights.

Of course there was a lot of hardware and software configurations that went into each of these systems but the thing that pushed SQL Server above the crowd was the Clustered Columnstore Index. See code below for the “LINEITEM” table which in the TPC-H standard is similar to a fact table. (Unfortunately, TPC has no true official star schema standard)

— Create Clustered ColumnStore Index on LINEITEM on all columns

print ‘Start COLUMNSTORE L_SHIPDATE_IDX’

SELECT SYSDATETIME() AS‘SYSDATETIME’

create clustered columnstore index L_SHIPDATE_IDX on LINEITEM with(drop_existing = on,maxdop = 8)on DATESCHEME(L_SHIPDATE)

print ‘Finish COLUMNSTORE L_SHIPDATE_IDX’

SELECT SYSDATETIME() AS ‘SYSDATETIME’

While is there a lot that can be learned from the documentation of these systems. (Joe Chang has an excellent post “TPC-H Benchmarks on SQL Server 2014 with Columnstore”)

There might be a few hidden surprises.

1.Hard Disk Drives are not dead

Contrary to popular belief, Solid State Drives will not solve your performance/scalability issues, especially for scan centric Data Warehouse Work Loads (don’t let any hardware vendor tell you different). There is a reason why the vast majority of the FastTrack Data Warehouse Appliances use HDDs. Therefore, it should come as no surprise that the three award winning systems in the TPC-H standard all use good old spinning disks.

Storage Subsystem Disk Drives(Qty 56:146GB 6G SAS 15K):

(Qty 472: 300GB 6G SAS 15K):

528

 

While that is a hell of a lot of disks, keep in mind that these systems are also graded on cost and energy consumption as well and even with those criteria HDDs still win for DW workloads.

2. Table Partitioning

“Horizontal partitioning is used on LINEITEM and ORDERS tables and the partitioning columns are L_SHIPDATE and O_ORDERDATE. The partition granularity is by week.”

Even with the substantial benefit of Clustered Columnstore segment elimination, careful table partitioning, can still have a significant effect on performance. Keep in mind that correct table partitioning requires in depth knowledge of query predicate values. With TPC-H this easy to do since there are only a few dozen queries, in the real world this will require a lot more work.