Archive for October, 2012

Great BLOG posts part 2

October 21, 2012

some great resources have been made available with the release of SQL Server 2012 which provide a deep understanding to some the most important SQL Server data warehousing features.

Inside SQL Server 2012’s ColumnStore Index :

This talk will describe how the new ColumnStore index technology in SQL Server 2012 makes queries go faster. It will cover details of the storage and execution model down to how this model interacts with modern CPUs to deliver significant performance benefits on Data Warehouse queries.

This is an excellent presentation that explains why my results for measuring Columnstore Index Throughput in a previous post was so impressive.

DW Design With The Product Team

This talk will explain the patterns that we recommend to customers when designing data warehouses. This talk will help people learning about DW for the first time and also give insight for those who wish to learn more.

This a great introduction to Data Warehousing done the right way from a couple of the most knowledgeable people in the industry.

Fast Track: improving performance through correct LUN Mapping and Storage Enclosure configuration

Reference architectures and appliances give us greatly balanced configurations which help to speed up Data Warehouse deployments and vendors give very clear guidance on how to set it up for better performance. However, the old know wisdom “Trust but verify” is still true and may help your setup look way better.

I particularly like the following graphics which illustrate a problem that I have run across several times in the past:

Bad Mapping:

Good Mapping:

PDW How-To: Avoid ShuffleMove and PartitionMove Operations 

Reducing shufflemoves in PDW is one the most important ways  to get the best prossible perfromance out of PDW. When used correctly, CTAS  can help unleash the full potential of an MPP architecture.

Advertisements

SQL Server Database Performance Tuning: Page Compression

October 14, 2012

Page Compression has been around since SQL Server 2008 and until ColumnStore Indexes were introduced in SQL Server 2012, it was one of the most important features that you could use to improve the performance and scalability of your  SQL Server Data Warehouse Database.

It still amazes me that in 2012 that there are SQL Server data warehouse databases that do not use Page Compression.  If you are not lucky enough to  be able to upgrade your SQL Server  data warehouse to SQL Server 2012 and take advantage ColumnStore Indexes then Page Compression is the next best thing.

The only situation were I might not recommend using Page Compression is were the SQL Server data warehouse is not I/O bound. In other words the SQL Server Data Warehouse server has more than enough throughput (MB/sec) to handle the queries. (I find this to a very rare situation. If you don’t know what your SQL Server Data Warehouse server throughput is then chances are good that I/O is a bottleneck).

Even if you are not currently  experiencing a I/O bottleneck  currently, I would bet that in the very near future that you will! A SQL Server data warehouse database never seems to be static or shrinking but they are always going!!! Keeping up with that database data growth while still maintaining good performance is one of the hardest challenges of data warehousing.
SQL Server Page Compression is a quick and easy way tune SQL Server data warehouse database performance because it has the effect of reducing server throughput requirements (MB/sec). This is done by literally moving less data (because it is compressed) from storage to the processors. But SQL Server Page Compression is not magic and so there is a cost associated with it. SQL Server Page Compression does create a little bit more CPU overhead because the data going to and from  storage has to be compressed and decompressed respectively. In my experience it increases CPU utilization by about 1%-3% but in most cases this that is a lot better than having to go out a by more storage components (e.g. RAID arrays, HBAs, New SAN, Fiber Channel Switches).

Additionally the performance boost that you get will depend on closely you are able to follow SQL Server data warehouse database best practices. Example:

  1. Star Schema (Fact and Dimension Tables)
  2. Integer Surrogate Keys for Fact Tables
  3. Use short data types in Fact Tables – No strings (char/varchar)

So let’s prove it!

Create Test Tables

select top 10000000 a.*
into PageCompression_10m
from [dbo].[FactInternetSales] a ,[dbo].[FactInternetSales] b

select top 10000000 a.*
into NoPageCompression_10m
from [dbo].[FactInternetSales] a ,[dbo].[FactInternetSales] b

ALTER TABLE [dbo].[PageCompression_10m] REBUILD PARTITION = ALL
WITH (DATA_COMPRESSION = PAGE)

Query Test Tables

DBCC dropcleanbuffers;
GO
SET STATISTICS IO ON;
GO
SET STATISTICS TIME ON;
GO
SELECT [SalesTerritoryKey], sum([SalesAmount])
FROM [dbo].[NoCompression_10m]
group by [SalesTerritoryKey]
Option (MAXDOP 8);
GO
SET STATISTICS IO OFF;
GO
SET STATISTICS TIME OFF;
GO
.

/**
Table ‘NoCompression_10m’. Scan count 9, logical reads 204082, physical reads 0, read-ahead reads 204082, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0

SQL Server Execution Times:
CPU time = 8016 ms,  elapsed time = 14522 ms.
**/

DBCC dropcleanbuffers;
GO
SET STATISTICS IO ON;
GO
SET STATISTICS TIME ON;
GO

SELECT [SalesTerritoryKey], sum([SalesAmount])
FROM [dbo].[PageCompression_10m]
group by [SalesTerritoryKey]
Option (MAXDOP 8);
GO
SET STATISTICS IO OFF;
GO
SET STATISTICS TIME OFF;
GO

/**
Table ‘PageCompression_10m’. Scan count 9, logical reads 21412, physical reads 0, read-ahead reads 21412, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:
CPU time = 5664 ms,  elapsed time = 1569 ms.
**/

Non-compressed Table:

20,4082 logical reads * 8 K / 1024 = 1,594 MB

Page Compressed Table:

21,412 logical reads * 8 K / 1024 = 167 MB

Conclusion

Page Compression increases SQL Server data warehouse performance by reducing the amount of data traveling between storage and disk. It is a free way to reduce SQL Server data warehouse server throughput requirements and should be enabled for all database fact tables and possibly large dimension tables.

See: Data Compression: Strategy, Capacity Planning and Best Practices

SQL Saturday # 163

October 7, 2012

I will be presenting  Data Warehousing and PDW on Oct. 13th for SQL Saturday # 163 in Dallas.

Session:

Scaling SQL Server to HUNDREDS of Terabytes!

Provide an introduction to the Microsoft Appliance and Reference Architectures. Learn how these Appliance Architectures can scale to hundreds of Terabytes. The presentation will focus on the Fast Track Reference Architecture and also contain real world performance metrics and lessons learned directly from someone who actually developed a Parallel Data Warehouse (PDW) BI Solution.

Session Level: Intermediate

Garrett Edmondson, MCITP has 10+ years experience working with the full Microsoft BI Stack. For the past year he has been working on the Microsoft’s Parallel Data Warehouse (PDW) Appliance, which allows SQL Server to scale to hundreds of terabytes.

Contact URL: https://garrettedmondson.wordpress.com/

North Texas SQL Server User Group (NTSSUG) in partnership with Microsoft, is proud to present SQLSaturday #163 – BI Edition, a dedicated Business Intelligence event for SQL Server Professionals.

SQLSaturday #163 – BI Edition is a free training event for technical professionals wanting to learn about SQL Server, Business Intelligence, and its related technologies.

This event will be held Oct 13, 2012 08:00 AM – 05:30 PM CST at the Microsoft Campus , 7000 State Hwy 161, Bldg LC1, Irving TX 75039 Map. Admittance to this event is FREE.

Our main focus for this event will be towards SQL Server 2012 Business Intelligence topics – Data Quality Services, Power View, PowerPivot, BigData (Hadoop/Hive), Dashboard/Reporting, Parallel Data Warehouse, etc. The goal of this event is to help SQL Server professionals get a jump start on SQL Server BI and related technologies from Ground Zero. There will be advanced level sessions as well for those seasoned BI folks. Registration is now open, please Register soon as seating is limited, and let friends and colleagues know about the event.

Breakfast/Lunch: Attendees are responsible for their own breakfast/lunch. There are tons of restaurants in a 1 mile radius. If the organizers decide to provide food, they will keep you informed closer to the event. We want to keep this event as simple as possible.