Archive for January, 2015

Mastering SQL Server 2014 Data Mining

January 24, 2015


I recently had the good fortune to collaborate with Amarpreet Bassan and Debarchan Sarkar from Microsoft in writing Mastering SQL Server 2014 Data Mining. This book is unique in that it covers all the major Microsoft tools for Data Warehousing, Data Mining, and Machine Learning. Each one of the tools is a major subject area and many books could be written about each one. For example just in the area of Data Warehousing there are multiple books devoted to each Data Warehousing subcategory (Microsoft BI Stack) i.e. SSIS (ETL), Dimensional Modeling (The Kimball Group), SSAS (OLAP), SSRS (Reporting), MDS (Master Data Management) etc…

Most books devoted to these topics are aimed to guide the beginner step-by-step through the process of building solutions with each of these tools. However, this book only devotes one chapter to these topics. In other words this book is not, for the most part, aimed at the absolute DW beginner. The table of contents is a good guide to determining the topics that the book chooses to focus on.

Note: I would highly recommend getting the digital version of the book as it has full color graphics which the print version does not. I would also highly recommend downloading the book’s code since this is not a step-by-step book and it would be very time consuming to reproduce every example from scratch.


  1. Identifying, Staging, and Understanding Data
  2. Data Model Preparation and Deployment
  3. Tools of the Trade
  4. Preparing the Data
  5. Classification Models
  6. Segmentation and Association Models
  7. Sequence and Regression Models
  8. Data Mining Using Excel and Big Data
  9. Tuning the Models
  10. Troubleshooting


Below is a summary of the things from each chapter that I think are interesting and/or unique.

Chapter 1 – Identifying, Staging, and Understanding Data

I like that several Adventure works based dimensional data loading scripts are provided in order to give the reader a taste of what a real world ETL process might look like. I wish I would had access to something like this over a decade ago when I was just starting with DW.

Chapter 2 – Data Model Preparation and Deployment

This chapter makes references to several excellent outside sources that provide much more in-depth treatment of the topics discussed. Example:

Chapter 3 – Tools of the Trade

While this a brief introduction to the BI Stack. If you are not already familiar with it I would highly recommend pick up a more beginner oriented book. I do like that data warehousing is included in a data mining book because  in the real world I see too much separation between traditional IT/DW and Data Science/Data Mining/Statistics professionals. I believe that having both work together will only improve the final result.

Chapter 4 – Preparing the Data

I think that is it great that this chapter dives into the details of extracting data from Oracle and IBM. In the real world most data warehouses or data mining solutions do not extract all of their data from just one source (SQL Server).

Chapter 5 – Chapter 7

While these chapters provide a good over view of the various SQL Server Data Mining Algorithms. I really like that they also include techniques for tuning the algorithms in order improve their accuracy.

Chapter 8 – Data Mining Using Excel and Big Data

While the Data Mining plugin for excel has been around for a number years now this chapter as provides an in-depth discussion of the newer data mining tools as well, namely HDInsight and Azure Machine Learning. I would also highly recommend Predictive Analytics with Microsoft Azure Machine Learning: Build and Deploy Actionable Solutions in Minutes for those who are just getting started with AzureML.

Chapter 9 – Tuning the Models

This perhaps my favorite chapter because it uses real world data. No AdventureWorks insight!!! Plus the dataset (Housing Affordability Data System (HADS)) is also very interesting because I am looking at purchasing a house in the near future. I have spent many hours with the dataset making additional data mining discoveries than those covered in the book.