Posts Tagged ‘Fast Track’

Great BLOG posts part 2

October 21, 2012

some great resources have been made available with the release of SQL Server 2012 which provide a deep understanding to some the most important SQL Server data warehousing features.

Inside SQL Server 2012’s ColumnStore Index :

This talk will describe how the new ColumnStore index technology in SQL Server 2012 makes queries go faster. It will cover details of the storage and execution model down to how this model interacts with modern CPUs to deliver significant performance benefits on Data Warehouse queries.

This is an excellent presentation that explains why my results for measuring Columnstore Index Throughput in a previous post was so impressive.

DW Design With The Product Team

This talk will explain the patterns that we recommend to customers when designing data warehouses. This talk will help people learning about DW for the first time and also give insight for those who wish to learn more.

This a great introduction to Data Warehousing done the right way from a couple of the most knowledgeable people in the industry.

Fast Track: improving performance through correct LUN Mapping and Storage Enclosure configuration

Reference architectures and appliances give us greatly balanced configurations which help to speed up Data Warehouse deployments and vendors give very clear guidance on how to set it up for better performance. However, the old know wisdom “Trust but verify” is still true and may help your setup look way better.

I particularly like the following graphics which illustrate a problem that I have run across several times in the past:

Bad Mapping:

Good Mapping:

PDW How-To: Avoid ShuffleMove and PartitionMove Operations 

Reducing shufflemoves in PDW is one the most important ways  to get the best prossible perfromance out of PDW. When used correctly, CTAS  can help unleash the full potential of an MPP architecture.

Advertisements

Performance Tuning: Data Warehouse Storage Throughput (MB/sec)

August 1, 2012

SQL Server Data Warehouse Workload Optimization

There are significant differences in how to achieve optimal resource balance for a SQL Server Data Warehouse system versus accomplishing the same optimization for lookup based online transactional system (OLTP). OTLP systems are heavily indexed in order to support high concurrency (many users) with low latency retrieval of a small number of rows from datasets that often contain very little historical data. This workload produces significant disk head movement which can be measured in random I/O reads and writes (IOPS). Data warehouse analytical queries, however, have relatively low concurrency (fewer users) and involve much larger historical datasets that benefit greatly from sequential reads and writes (disk scans). As a result, Data Warehouse systems are best measured by analyzing total throughput (sequential reads and writes) in MB/s rather than the more traditional focus on operations per second (IOPS) for OLTP Systems.

SQL Server Data Warehouse Balanced Architecture

Due to the fact the data warehouse scan centric (sequential reads/writes) workload is best measured in MB/sec, data warehouse hardware components need be architected in such way as to allow the maximum storage system data throughput (MB/sec) to be able to be consumed by the processors. Having a balanced Data Warehouse Architecture means that the storage system has been optimized for scan centric (sequential reads/writes measured in MB/sec) operations and that the processors can receive and process the data delivered by the storage at the same MB/sec rate.

Storage

In the SQL Server Data Warehouse the throughput of the storage is the most critical, complex, and costly portion of the system. Because Data Warehouse Analytical queries are too large to fit in memory, the storage must be able to quickly locate and deliver the data from disk where it can then be aggregated by the processors. In most data warehouse systems it is the storage that is the bottle neck and so careful monitoring and measuring of its performance is essential.

The best tool for measuring storage performance is a free Microsoft tool call SQLIO. It can test the performance of storage under a great variety scenarios and produces very detailed results which can then be charted out overtime. SQLIO should be used once the storage hardware is installed to validate that it has been configured properly and during regularly scheduled maintenances cycles to ensure that the storage is preforming properly.

Laptop Throughput (MB/sec)

As an example I used SQLIO to generate the performance statistics of my personal laptop. I also used David K Lee’s excellent  Storage Analysis Website to produce the following chart which summarizes the SQLIO results very nicely.

image

As you can tell from the graph my puny laptop is actually pretty well configured for Data Warehouse workloads.  The sequential reads/writers preform much better than random reads/writes.

In fact, I would guess that my laptop might preform better than SANs that cost 100x more simply because my laptop has dedicated Direct Attach Storage (DAS) that does have to fight with other resources in a huge (OLTP/IOPs centric) SAN that every application shares.

Conclusion:

If your SQL Server Data Warehouse is preforming poorly download SQLIO and see if a laptop can get better performance. If so then you might have a problem with your storage Winking smile

Additional Resources:

Fast Track Data Warehouse Reference Guide for SQL Server 2012 – The Bible for all things DW. It is a must read !!!

Dell Reference Configuration –  This Dell ”starter” Fast Track gets 4,700+ MBs/sec!!!

image

Also see: SQL Server OLTP vs. Data Warehouse Performance Tuning