There are not any BLOG posts so far this month because I have been concentrating on producing detailed YouTube videos that discuss some of the most important concepts in Data Warehousing.
the latest series of videos deal with ELT (ETL)
Many times I am asked about how to create indexes on a data warehouse SQL Server Database so that queries execute much faster. Unfortunately, unless they are able to upgrade to SQL Server 2012 in order to use ColumnStore Indexes, there is usually very little that I can do to speed up the queries. This is usually do to the fact that the Data Warehouse Server was not architected for data warehouse workloads to begin with and therefore is physically constrained on its throughput (MB/sec).
So I have created this little demonstration to illustrate the point:
I created a 10 million row fact table based on the good old AdventureWorks DB and setup a very simple scan centric SQL Server data warehouse query:
With logical reads 208337 that means that 1,627 MB of data was returned from disk to satisfy the query. And magically, if you check the size of the table in SQL Server Management Studio it is also reads 1,627 MB. This is due to the fact that there were no indexes to read from and so the SQL Server had to scan the entire table:
Now indexes can be created to reduce the amount of data that needs to be scanned by only scanning the index data.
Note: The best practice is to actually create a separate index for each fact table column, but just to keep things simple I only created an index for the columns that will be queried.
Now if you run the same query again there the message box reads logical reads 36812 or 287 MB. Creating indexes drops by over 500% in the amount of data required for this query. This reduction in data being read is primarily due to the fact that the data is being read from the indexes, which are much smaller, and therefore less time consuming than read all the data directly from the table. See Index Scans:
[ix_NoCompression_10m_SalesAmount] – 27,176 * 8 / 1024 = 212 MB/sec
[ix_NoCompression_10m_SalesTerritoryKey] – 22,276 * 8 / 1024 = 174 MB/sec
However, the limitation to this approach is that the data being returned will never be smaller than indexes that are created on the fact table. Real world fact tables have billions of rows and so those indexes will not all fit into memory therefore the data must be retrieved directly from disk which will effect query performance negatively, especially if the data warehouse server has not been specifically tuned to deliver good throughput (MB/sec).
In fact even if query patterns (something which you cannot control) change, the amount of data required by the queries will remain about the same. In the following example I commented out 2 predicate values (SalesTerritory). One would think that now the data required would drop by about ½ as well.
–or SalesTerritoryKey = 6
–or SalesTerritoryKey = 7
However the logical reads only dropped from 36,812 (287 MB) to 33612 (262 MB) or only about 25 MB less data required by removing ½ of the predicate values. Even if the number of predicates values were too drop to 1, the logical reads are still 30128 or 235 MB. This is due to the fact that no matter how many predicate values are used the index for that predicate still has to be scanned. Remember Data Warehouse queries require a lot of historical data and so are scan centric. While scanning the index might not seem all that bad for this small 10 million row fact table imagine what would happen if you increased the number rows of rows by a factor of 100 or even 1,000 that would be potentially 10s of billions of rows and 100s of GB for the indexes!
Some of you might be tempted to say that an index could be created for each predicate value (SalesTerritory) while that is true it would be very quickly become unmanageable and in most cases it is nearly impossible to predict all the ways that the users will query the data for historical analysis.
The real answer is to correctly architect the data warehouse server ahead of time in order to deliver the query results in a predictable and acceptable period of time. However, with that said there are a few things that you can do to increase the ability data warehouse server to deliver higher MB/second than it normal could under normal conditions.
Page Compression works by cramming more data on to each database page which means that fewer pages need to be read from disk.
Note the indexes new sizes:
ix_NoCompression_10m_SalesAmount 14279 = 11 MB
ix_NoCompression_10m_SalesTerritoryKey 14350 = 11 MB
Due to using Page Compression the indexes are significantly smaller and so it is no surprise that if the same query is run again even less data needs to be scanned from the indexes:
Logical Reads = 20475 or 159 MB
Page compression reduces the data required from 287 MB to 159 MB or a saving of about 55%. In the real world I see about a 33% savings.
And just in case you were wondering, if you upgrade to SQL Server 2012 Columnstore Indexes the data required drops from 287 MB to an incredible 1.06 MB !!!
Complex indexing strategies for Data Warehouse workloads provide no additional benefit. The Data Warehouse Server needs to be carefully architected upfront to deliver the required MB/second throughput so that the users are happy with the performance of their queries.
Over the last few years I have performed many SQL Server Data Warehouse Database Performance Tuning Sessions with a variety of customers in many different environments. This sessions usually consist of a weeklong engagement in which I lead an in-depth software (SQL Server) and hardware tuning classes. The results of which are put into a highly detailed roadmap and list recommendations for improved SQL Server Data Warehouse performance.
As part of this testing, I use SQLIO to benchmark the hardware performance capabilities of the SQL Server Data Warehouse database server that I am testing (see: Data Warehouse Storage Throughput (MB/sec) ). There are of course many other types of tests and analyses that I preform as well as part of this week long process but I am going to stick with SQLIO for now.
Since I have performed these tests many times I would like to share some of the results that I have seen from various customer SQL Server Data Warehouse server environments.
Note: Even though these are actual aggregate figures from actual customers, given the complexity of each environment, your results may vary. This is meant to give you an idea what kind of performance to expect.
DAS – SSD: Direct Attach Storage with Solid State Drives. SSDs not architected for Data Warehouse specific workloads.
SAN – SSD/HDD: Tier 1 SAN that uses a combination of Solid State Drives (SSD) and Hard Disk Drives (HDD). Not architected for Data Warehouse specific workloads.
SAN – HDD: Tier 2 SAN that uses only Hard Disks Drives (HDD). Not architected for Data Warehouse specific workloads.
Laptop: My personal Laptop with one 7.2 K HDD
As can be determined from the graph the Dell QuickStart Data Warehouse out preforms even Solid
State Devices and both types of SANs. This due primarily to the fact that the QSDW is specifically tuned to data warehouse workloads which are typically characterized by sequential read/writes in block sizes greater than 64KB. In fact had I increased the block sizes for the tests, the gaps in performance would have increased.
Ensuring that the Data Warehouse server is specifically tuned for Data Warehouse workloads can have a dramatic impact on performance. In fact this tuning can even be more import than the type of storage technology selected e.g. SAN, SSD, or HDD.
Note: The above graph shows server hardware performance not SQL Server Database Engine performance. The tool SQLIO has nothing to do with SQL Server even though the name might suggest otherwise
I am experimenting with creating a series of Data Warehousing videos. Let me know if these videos are helpful and if you would like to see more.
The videos are loosely based on my PASS and SQL Saturday presentations but go into more detail.
One of the most important architectural principles built into Dell’s latest generation of data warehouse appliances called Quickstart Data Warehouse (QSDW)is parallelism. The following post will give an over of how parallelism can be implement using QSDW as a reference.
Breaking a large data warehouse workload into smaller pieces that then can be worked on in parallel is one of the best ways of scaling SQL Server. This parallel processing technique is exemplified in Non-Uniform Memory Access (NUMA).
Even though the NUMA architecture is used to create parallelism between main memory (RAM) the server’s CPU cores, the same principle can be applied to the entire SQL Server Data Warehouse Server as well.
However this parallelism is not present in most OLTP systems because those systems rely on a few LUNs/Disk Groups with many disks each to achieve maximum IOPs. While this approach is good for OLTP systems it is not the proper architecture for a SQL Server Data Warehouse Server which seeks to maximize MB/second not IOPS.
The best approach, therefore, is break up the disks into several separate Disk Groups or LUNs and place a SQL Server database file on each LUN. When this is done the SQL Server Database Engine can break the read and write operations into smaller chunks according to the number LUNs that were created thus achieving parallelism. The image below shows have 9 LUNs can be created and have a SQL Server Database file put on each LUN.
Because the data for the entire Filegroup/Database is broken out to to multiple LUNs by placing a SQL Server database file on each LUN, all database operations will be parallelized. This has the added effect of increasing MB/sec by a factor roughly equal to the number of LUNs, in this case eight.
This architecture does create parallelism a the disk/database level, however it is not yet aligned to the number CPU cores present on the Data Warehouse Server. To accomplish this next level of parallelism, instead of creating one Filegroup for the data base, the number of Filegroups will have to align to the number of CPU cores.
The illustration below demonstrates what the configuration may look like on a server with eight cores.
Physically each LUN is implemented on Windows server as a Mounted Volume and each mounted volume will have eight database files, one database file for each of the eight file groups.
Only now can the SQL Server Data Warehouse Workload be parallelized across the entire server, from disks to the each CPU core. While this approach might seem like a lot of work, the performance benefits are tremendous and lead to much more reliable and consistent performance under a wide variety of data warehouse workload scenarios.
The parallelism techniques described above are built directly into Dell Quickstart Data Warehouse Appliance (QSDW), thus making it a highly tuned SQL Server Data Warehouse solution that can save a customer significant time and money.
I felt a little too ambitious and actually recorded a series of 3 videos at MSBI Academy which cover all the content of this post. The videos also give a little bit more back ground information about DQS for those who have not used it yet.
Data Quality Services (DQS) is a great addition to SQL Server 2012. There were many times in the past were I had wished there was a tool which allowed Data Stewards to clean , validate, and preform matching or de-duplication on the data before it went into the Data Warehouse. In my opinion there are only two kinds of companies: those that don’t recognize that they have data quality issues and those that realize that they have data quality issues! Unfortunately, most organizations don’t find out that they have data quality issues until they attempt a data warehouse project and then those issues become all too apparent.
I am not going to go into all the functional details of DQS. (See above link for great training materials on the DQS team website) However, I want to focus on one of the most important features of DQS that seems to be misunderstood.
The ability to create Composite Domains for data correction or validation is one of the most critical features of DQS and would be required in virtually any DQS solution. However, creating Composite Domain rules in the DQS client UI can be a little tricky if you don’t know what to look for. (At least it was for me hence the reason for this post)
The following example uses the AdventureWorks Sample that is available on the DQS Team website. I suggest that you watch their training videos so that you have a good understanding of what I will describe in the next section.
You want to create Composite Domain rules that populates Title based on the values in Marital Status and Gender such that;
when Gender = ‘M’ male
- then Title = ‘Mr.’, or
when Marital Status = ‘S’ single and Gender = ‘F’ female
- then Title = ‘Ms.’ , or
Marital Status = ‘M’ single and Gender = ‘F’ male
- then Title = ‘Mrs.’
To accomplish this first you need to create all the individual domains and then create the ‘Title Enrichment’ Composite Domain. Next the Tile Enrichment Composite Domain needs a few rules to correctly populate the Title domain/column.
For the Single Female Title rule the first part of the rule or the test condition needs to be created. This is similar to the “IF” of the “IF…THEN…” syntax.
In the following example shows: if Marital Status = ‘S’ then Title = ‘Ms.’
Everything looks good at this point. Now all that is needed is to add an additional condition to the ‘IF’ to test when Gender = ‘F’. However, when you click on the “Adds new condition to the selected clause” button, a new clause is created for the Marital Status domain.
This not what is needed because the Gender domain needs to be used. It would appear that DQS is missing the very important ability to have multiple domains be tested in the same Composite Domain Rule.
The key to creating this type of Composite Domain Rule is to ensure that that no part of the current condition (“Martial Status Value is equal to S”) is in focus when you click on the “Add new condition to the selected clause”. You will know when the current condition is in focus because the vertical line will be solid blue or a blue box will around a part of the condition. See below:
What is required is to; 1. Click away for the part of the current condition before 2. You click on the “Add new condition to the selected clause”
If you do that then you can create the required condition using the Gender domain
Now when a project is run with this Composite Domain and the data in the Title column will have the correct values:
BlueGranite, a company that I do a lot work for and who has a great team of BI and DW professionals, are creating great BLOG posts that will be useful for those who are not in-the-weeds developers, such as self
Another great resource is Rob Ker’s (SQL MVP) MSBI Academy which some good short videos on great variety of Microsoft BI topics. I like his videos because they get to the point and have much better sound quality. No more having to sit through long demos that ramble on and at times are very hard to understand. (I have been known to do that from time to time)
To be completely transparent I will be creating some content for these sites as well. You might even see content there before it appears in my BLOG.
My SQL Performance Tuning Test Results for Data Warehouse Architectures post compiles over a year’s worth of SQLIO work with variety of SQL Server Data Warehouse Hardware Architectures.
I also like Rob’s Big Data Series
And in case you missed TechEd North America 2013 here are my top picks:
Extended Events are not new to SQL Server 2012 but they have been greatly enhanced from when they were first introduced in SQL 2008. As is my style, I am not going go into to a long protracted discussion of extended events. That is what MSDN is for and they do good job of covering all the details there.
What I am going to focus on is how Extended Events can be used in data warehousing. To begin with, Extended Events (XEvent) can be incredibly valuable for data warehouse scenarios because it can measure the throughput (MB/sec) for each query executed against the data warehouse server. Knowing the throughput requirements (MB/sec) of the date warehouse queries is absolutely critical for architecting a data warehouse server that preforms as expected! See: Data Warehouse Storage Throughput (MB/sec) for more information.
If for example you know that a data warehouse query returns 2,000 MB and that the users want the query to complete in 10 seconds, and assuming that the query takes 5 seconds to aggregate on the CPU, then you know that the minimum required throughput should be 400 MB/sec:
Query MB / seconds required
2,000 / 5 = 400 MB/sec
Note: The above does not account for concurrence (number of users and the other queries running at the same time on the server which will have to be accounted for as well).
You can use IO Statistics to measure a query’s MB as explained here: Page Compression, but that is tedious to do for a lot of queries and doesn’t give a good idea of the other queries running on the server as well. Now enter Extended Events, with extended events you create a session get all that information plus much more.
1. Create an Extended Events session with the following actions:
WITH (MAX_DISPATCH_LATENCY= 5 SECONDS);
Note: I choose to save the results to an xml file instead of keeping them in memory because they can get very large, depending on how active your server is.
2. You then let the session run for the period of time that you want to collect query data for and with a little bit of knowledge about querying xml you have access to a wealth of knowledge.
<rant> yeah I know querying xml makes my brain hurt too, but it is so worth it !!!</rant>
With these results you can now analyze the query patterns aggregated by a certain time interval (hours by day) and compare that to actual hardware IO limitations as measures by SQLIO. You can now answer difficult questions like the following with confidence:
How will the date warehouse server preform if I add X amount of data?
What are the largest queries and peak query times?
Will my data warehouse perform well if I add X amount of users?
Is my ETL taking too long because of lack of server resources or slow ETL logic?
There is so much more that can be learned from the above extended event xml query, such as query and query plan hash but I hope that this gives you an idea of what is possible.
I recently upgraded the OS on my laptop to windows 8 and there are things that I like and things that I don’t like about it. I think that most of the things that don’t like have more to do with some of the new navigation features and should become second nature over time.
One of my favorite surprises in win8 is the new Task Manager. When I open it up, I am presented with a much cleaner interface that also provides more information than the previous versions of Task Manager. One these new goodies is now you can see the sizes of the L1, L2, and L3 caches:
These CPU caches act like stepping stones for data as it travels from main memory (RAM) to the CPU and the closer the cache is to the CPU the faster the data can be processed by the CPU. For example L1 and L2 caches are orders of magnitude faster than the L3 cache and as you can tell from the above image the closer the cache is to the CPU (L1) the smaller the cache size is.
If you have an application that you want to process faster on the SQL Server 2012 CPU then a good way to do it would be to have the application divide up its instructions and data into batches that would ideally fit in the L1 or L2 caches. Then magically your application would run faster (there is more to it than just that but I am keeping things simple).
Well that is great but what do L1, L2, and L3 caches have to with SQL Server databases and data warehousing? Well in a word ALOT! The new SQL Server 2012 Columnstore Indexes not only compress and store data (tables and indexes) in a column-wise format but this columnar data is also broken down into “batches” that just so happen to match the sizes of the L1 and L2 caches on the SQL Server 2012 Data Warehouse Database Server. This means that all things being equal, the same data set will actually process faster on the SQL Server 2012 Data Warehouse Database Server CPU using Columnstore Index’s data “batches” than had the same data been processed without Columnstore Indexes.
And guess what, you can actually see this happen in SQL Server Management Studio SSMS when you run query on a tables that has a SQL Server 2012 Columnstore Index and view the Execution Plan.
The table dbo.CSI_100m_Fact has a SQL Server 2012 Columnstore Index and at execution time the query uses Actual Execution Mode = BATCH for the SQL Server 2012 Columnstore Index Scan operator which mean that it is doing its L1 and L2 magic behind the scenes and your queries now run much faster!
Note: If you create a columnstore index on your SQL Server 2012 database and your SQL 2012 data warehouse database query does not use the columnstore index then that means that your data model is wrong (fact and dimension tables) or your query does not match typical data warehouse workloads. See: Columnstore Indexes on MSDN.
Table partitioning for SQL Server database fact tables is one of the best ways to improve processing for fact data loads during the ETL process and it also can dramatically improve query response time as well. SQL Server Table partitioning should be used on all but the most trivial database fact tables (and even then I would still use it). However, despite its great advantages and the fact that it has been around since SQL Server 2005, I still find it under-utilized.
Database table partitioning compliments Kimball Style Transaction or Snapshot Fact tables very nicely (See Slowly Changing Facts). In SQL Server 2012 the number of possible partitions per table has even been increase to 15,000! This great news because that means if you partition your database fact table by days it could hold more than 40 years’ worth of data without changing the partition scheme !!! And if the database fact table is loaded nightly with a single day’s data then SQL Server database table partition switching becomes a breeze.
I won’t go into all the details but SQL Server database table partition switching allows inserts and deletes to a partitioned database fact tables, no matter how large, to occur almost instantly. Updates can occur orders of magnitude faster as well (but I don’t recommend updating fact tables).
I am not going to go through all the steps. See Script for details
1. Create two tables and populate them with data:
1 . dbo.TablePatitionDemoFact – holds historical fact data
2. dbo.TablePatitionDemoFact_switch – simulates new rows to be loaded into fact table from nightly ETL process
7. selectdistinct[OrderDateKey] asTablePartitionDemoFact_switch
60398 – rows
20040731 – max transaction date
60398 – rows
20040801 - max transaction date
Both tables have about 60,000 rows but the _switch table has rows that need to be inserted in the fact.
2. To switch the rows by simply using the Alter Table… switch command
TO[dbo].[TablePartitionDemoFact] PARTITION (2)
Rerun the queries from before and you will see that rows have now been switched to the fact table in < 1 Second!
120,796 – row count
The process can easy be reversed if the data needs to be modified or archive/deleted by simply reversing the tables is the switch command.
TO[dbo].[TablePartitionDemoFact_switch] PARTITION (2)
60398 – row count
60398 – row count
Note: Both database table schemas must be identical. This includes indexes as well! During the switch process no table or indexes updates are required. There is just a simple pointer/metadata operation and the magic happens almost instantly!
For the purpose of this demo I didn’t create any indexes just to make things simpler but now you might understand why SQL Server 2012 Columnstore Indexes were not updateable! Because the assumption was that any decently sized database fact table is already partitioned, so therefore no need to make columnstore indexes updateable.
The best practice is to bulk load the data, in parallel if possible, into the database switch/stage table, using minimal logged transactions. Create the required SQL Server indexes and then simply switch the partitions. It is that simple. This process will perform well on fact tables with billions or even trillions rows!
Bret Ozar has put together a wonderful list of the most import Table Partitioning Resources
I personally like using SQL Server Partition Management tool from Code Plex