SQL Server Database Performance Tuning: Why is my DW Query so SLLLOW?!

I get asked this question a lot and in most instances the answer is… the query is returning a lot of data.

Most SQL Server Data Warehouse and/or Analytical queries return a lot of data because they aggregate transactions (facts) from a low level of granularity (day) to a higher level of granularity (month). The higher target level of granularity, the greater the number of  transactions that need be aggregated.

On other words, in order to find out what my total sales for this year is, I have to aggregate all of the sales transactions for this year. This could be mean that millions, billions, or even TRILLIONS of  sales transactions need to be aggregated to produce the Sales Grand Total for the year.

From a hardware perspective all of your transactions are sitting in your storage/disks (DAS/NAS). At SQL query execution time all of that data needs to be moved from storage to the processors to be aggregated.

The “pipe” that connects storage to the CPUs needs to be wide enough to flood the processors with data so that CPUs/users are not sitting idle waiting for data! This “pipe” width or throughput  is measured in MB/sec.

There are standard to tools measure your SQL Server Data Warehouse Server’s throughput in MB/sec. As explained in a previous post.

Once you know the SQL Server Data Warehouse server throughput in MB/sec you then need to measure the amount of data returned by your SQL query. By comparing the throughput (MB/sec) to the amount of data returned, it should then not be so surprising why your SQL queries are taking so long!

Example:

  • DW MB/sec = 1 MB/sec
  • SQL Query data = 100 MB

Minimum SQL query response time = 100 seconds

If you need reduce the SQL query response by 1/2 to 50 seconds then you need double the SQL Server Data Warehouse’s throughput to 2 MB/sec (or reduce the amount of data returned think: SQL Server 2012  ColumnStore Indexes/Page Compression Smile )

That is great, but how do I measure the amount of data that is returned by a SQL query? Glad you asked !

Create/Populate test table: – (use AdventureWorks)

select top 10000000 a.*

into test10m

from [dbo].[FactInternetSales] a
,[dbo].[FactInternetSales] b

Very Simple Query Example:

DBCC dropcleanbuffers;
GO
SET STATISTICS IO ON;
GO
SET STATISTICS TIME ON;
GO
SELECT [SalesTerritoryKey], sum([SalesAmount])
FROM [dbo].[NoCompression_10m]
group by [SalesTerritoryKey]
Option (MAXDOP 8);
GO
SET STATISTICS IO OFF;
GO
SET STATISTICS TIME OFF;
GO

Message Results:

Table ‘test10m’. Scan count 9, logical reads 204082, physical reads 0, read-ahead reads 204082, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:
CPU time = 8016 ms,  elapsed time = 14522 ms.

Follow this simple formula:

Logical Reads  *  8 (single DB page) / 1024 (MB)

204,082 * 8 / 1024 = 1,594 MB

Amazingly if you check the size of the table in SQL Server Management Studios  it reads: 1,594 MB as well !! In other words this SQL query returns all the data from the database table.

image

Summary:

SQL Data Warehouse Database Servers are built to move large amounts of data from storage to the CPUs and then aggregate the data. If you know the throughput of the pipe that connects storage to the processors (MB/sec) and the amount of data returned by the target SQL query, then you can start the process of planning how to tune a scalable and predictable SQL Server Data Warehouse database. Now you can accurately set expectations upfront with the users.

Note: there many other factors that will effect SQL Server database  performance, this is just one of the most fundamental. No amount of SQL query tuning or index creation will ever make the SQL query preform faster than the SQL Server Data Warehouse server’s throughput MB/sec.

Also see: Performance Tuning a SQL Server Data Warehouse

Advertisements

6 Responses to “SQL Server Database Performance Tuning: Why is my DW Query so SLLLOW?!”

  1. Ralph D. Wilson II Says:

    So, is there any merit to performing what might be termed pre-aggregation? For instance, if your initial level of granularity is at a raw transaction level and you know that you can expect to need the aggregations at day, week, month, quarter, and year level then, at least for historical (i.e., say, last year and older) data, might it be worthwhile to perform those aggregations one time into a set of fact tables so that ad hoc and reporting queries that are analyzing day over day, wek over week, etc., don’t have to constantly aggregate and re-aggregate the results?

  2. Garrett Edmondson Says:

    To handle pre-aggregations I would highly recommend using SSAS multi-dimensional cubes. Cubes are much easier to develop and maintain than aggregate tables, provided that the Data Warehouse is architected correctly (facts/dimensions). They also have many other features that can be leveraged by the users i.e. Pivot Tables for ad-hoc analysis. In SQL Server 2012 the paradigm shifts to Columnstore Indexes plus SSAS tabular cubes.

    • Ralph D. Wilson II Says:

      As you may have guessed, I am not well versed in SSAS. I was partially basing my question on my experience with a rather massive data warehouse on Oracle. Apparently, they had run into some processing issues and had created a set of tables to handle the aggregates at various time intervals which they then used in doing their analysis. I kind of thought there had to be a better way, though. 😉

      I know that it posed an interesting problem for _me_ because, at the time, I was developing an application that would allow the marketing staff to define queries, pretty much in their own terminology, that could be saved and/or executed to refine the deployment of product based upon historical information. Because of the _sets_ of tables, I had to handle slight variations in column names and datatypes as well as handling non-trivial variations in the JOINs..

  3. PAGEIOLATCH_XX is evil « Garrett Edmondson Says:

    […] If SQL Server has to wait along time on the data then that means that it is trying to suck a lot of data through a tiny “straw” (the connection between SQL server and it precious data). In other words there is to little throughput  see Why is my DW Query so SLLLOW?! […]

  4. Etsuko Says:

    Very good blog! Do you have any recommendations for aspiring writers?
    I’m planning to start my own blog soon but I’m a little lost on everything.
    Would you advise starting with a free platform like WordPress or go for
    a paid option? There are so many options out there that I’m totally overwhelmed .. Any tips? Thank you!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s


%d bloggers like this: