Posts Tagged ‘SSAS’

Installing SSAS 2012

May 6, 2012

I don’t usually write posts like this but there have been a few substantial changes to how SSAS 2012 is installed as compared to the previous versions.  The changes also underscore the changes that have occurred to SSAS with the introduction of the Tabular Model.

I am not going to take you through all the steps the. But when you get to the Analysis Services Configuration screen you can now choose to install SSAS in  the traditional Multidimensional Mode or Tabular Mode. You cannot install both at the same time but you must install them one at a time.

In the example below I choose to install the Tabular Model first:

image

I am not going to walk through the rest of the steps. If you want to install the Multidimensional you will have to re-run the installation again and choose the Multidimensional Mode. See the next three screen shots:

image

image

image

If you have run through the installation two times, one for each mode, then you should see something like the following in SSMS. Notice the two different instances:

image

The above only serves as an introduction and provides a context to some of the differences you will encounter when working with a Tabular project in SQL Server Data Tools (SSDT) aka the new BIDS.

When you first fire up SSDT (I will miss BIDS because it sounds so much smoother;) and create your first Tabular project see below..

image

You will be confronted with a strange new question:

image

You will be asked to choose a workspace server. What this means is that you must choose a previously installed instance of SSAS in Tabular mode. The workspace server will be used during development of the Tabular project in SSDT.  And in fact SSAS automatically creates a new database with the project’s name with a GUID appended to it.

image

This means in order to develop a Tabular project in SSDT you must have first installed a SSAS instance in Tabular mode. (Don’t be surprised if you see this on a test sometime see my linkedin profile). With that said  I like this change because unlike the old multidimensional project in BIDS which runs in 32 bit during debug, you can now develop a Tabular project in SSDT in 64 bit mode and then deploy it to your 64 bit server. This eliminates the headache of having deal with both 32 bit and 64 bit code when deploying a project.

Additional Resource:

Cathy Dumas: Configuring a workspace database server

SSAS Tabular Model – Query Modes

March 8, 2012

One of the most compelling features of the new SSAS Tabular Model is the ability to set the model’s Query Mode to In-memory (aka Vertipaq), DirectQuery or both!

QueryMode can be set to one of the following:

  • DirectQuery – This setting specifies all queries to the model should use the relational data source only.
  • DirectQuery with In-Memory – This setting specifies, by default, queries should be answered by using the relational source, unless otherwise specified in the connection string from the client.
  • In-Memory – This setting specifies queries should be answered by using the cache only.
  • In-Memory with DirectQuery – This setting specifies, by default. queries should be answered by using the cache, unless otherwise specified in the connection string from the client.

The hybrid modes give the best of both In-memory and DirectQuery but there are a few gotchas. One of the hybrid or DirectQuery modes must be set in 2 places.

Configuration: Set Query Mode at Project level:

image

Next you can edit the table partitions if you don’t like the defaults:

image

Some DAX functions and calculated columns are not yet supported in DirectQuery Mode. If You try to deploy/process the model you will get a errors explaining which objects need to be removed.

image

In the above example  the calculated column needs to be deleted before DirectQuery Mode can be enabled.

Summary:

Configuring Tabular model to use one of the Hybrid Query Modes will allow it to scale past the memory limitation of the server and when combined with Columnstore indexes in SQL server 2012 provides a compelling reason to use a Tabular Model for enterprise BI solutions. Although there are currently some limitations I would expect them to be resolved in the future.

Additionally making Columnstore indexes available in the next version of Microsoft’s Parallel Data Warehouse would be a huge performance boost to an already very powerful appliance. The next logical step would be to simply put a Tabular Model on top of it. All this is just dream right now but won’t it be nice!

Tabular Model

January 30, 2012

One of the most exciting features in SQL Server 2012 is the introduction of the Tabular Model. The Tabular model along with the multidimensional model comprise the Business Intelligence Symantec Model (BISM).

To be honest when the Tabular Model and the BISM where introduced along with DAX I was more than a little skeptical. It seemed like yet another addition to the already complex Microsoft BI stack.  It was presented as ‘individual’ or ‘Group’ BI that allowed users to quickly and easily develop their own models. But didn’t scale because it had to fit into memory and could not handle complex requirements like Parent-Child relationships financial calculations.

However, after playing around with it for a little and talking some of the leaders in that space I have completely changed my mind. I will address my initial concerns in a general manner below:

1. Scalability – While it true that the Tabular Model is an in-memory database system. The memory limitation can be overcome by using the DirectQuery query mode (or one of the “hybrid mode”) which passes the query to the SQL Database Engine. When combined with Columnstore indexes you can build BI solutions that scale to the enterprise level.

2. Complexity – Although DAX has not reached the level sophistication as MDX to able to handle complex requirements, the gap is closing and it is closing fast! Here are some good examples:

 Parent/Child Hierarchies

Unary Operators

In summary I am now convinced that Tabular and DAX are not going to complicate MS BI but simplify and allow me to more quickly and on a more iterative basis deliver solutions to my clients. I am excited about the future.

Here are some of the best resources that I have found:

Cathy Dumas BLOG – great info.

SQLBI: Marco and Alberto have tons of great DAX information.

DAX Editor: The DAX Editor for SQL Server an extension to Visual Studio 2010 that implements a language service for the DAX language. The DAX language is the expression language used for SQL Server Analysis Services tabular models. This extension can be used when editing DAX queries and also for writing DAX measures.

MDX and DAX formatter

DAX in the BI Tabular Model Whitepaper and Samples: This whitepaper and sample workbook introduce Data Analysis Expressions (DAX), a formula expression language used to define calculations in PowerPivot for Excel® workbooks and Analysis Services tabular model projects authored in SQL Server Data Tools.

PowerPivot-info: The 1 stop shop for all things DAX/PowerPivot/Tubular!

DAX Resource Center: This Wiki includes articles, whitepapers, videos, and samples published by both Microsoft as well as experts in the Business Intelligence community. Because this is a Wiki, you too can contribute. If you have some great information about DAX and how you use it in your organization, please share it!

Data Mining

December 14, 2011

Data Mining Data mining techniques can be used in virtually all business applications, answering various types of businesses questions. In truth, given the software available today, all you need is the motivation and the know-how. In general, data mining can be applied whenever something could be known, but is not. The following examples describe some scenarios:

Recommendation generation— What products or services should you offer to your customers? Generating recommendations is an important
business challenge for retailers and service providers. Customers who  are provided appropriate and timely recommendations are likely to be
more valuable (because they purchase more) and more loyal (because  they feel a stronger relationship to the vendor). For example, if you go to online stores such as Amazon.com or Barnesandnoble.com to purchase an item, you are provided with recommendations about other items you may be interested in. These recommendations are derived from using data mining to analyze purchase behavior of all of the retailer’s customers, and applying the derived rules to your personal information.

Anomaly detection — How do you know whether your data is ‘‘good’’ or not? Data mining can analyze your data and pick out those items that don’t fit with the rest. Credit card companies use data mining–driven anomaly detection to determine if a particular transaction is valid. If the data mining system flags the transaction as anomalous, you get a call to see if it was really you who used your card. Insurance companies also use anomaly detection to determine if claims are fraudulent. Because these companies process thousands of claims a day, it is impossible to investigate each case, and data mining can identify which claims are likely to be false. Anomaly detection can even be used to validate data entry—checking to see if the data entered is correct at the point of entry.

Churn analysis — Which customers are most likely to switch to a competitor? The telecom, banking, and insurance industries face severe competition. On average, obtaining a single new mobile phone subscriber costs more than $200. Every business would like to retain as many customers as possible. Churn analysis can help marketing managers identify the customers who are likely to leave and why, and as a result, they can improve customer relations and retain customers.

Risk management—Should a loan be approved for a particular customer? Since the subprime mortgage meltdown, this is the single most common question in banking. Data mining techniques are used to determine the risk of a loan application, helping the loan officer make appropriate decisions on the cost and validity of each application.

Customer segmentation —How do you think of your customers? Are your customers the indescribable masses, or can you learn more about  your customers to have a more intimate and appropriate discussion with them. Customer segmentation determines the behavioral and descriptive profiles for your customers. These profiles are then used to provide personalized marketing programs and strategies that are appropriate for each group.

Targeted ads — Web retailers or portal sites like to personalize their  content for their Web customers. Using navigation or online purchase
patterns, these sites can use data mining solutions to display targeted advertisements to their Web navigators.

Forecasting — How many cases of wine will you sell next week in this  store? What will the inventory level be in one month? Data mining forecasting techniques can be used to answer these types of time-related questions.

Resources:

Data Mining with Microsoft SQL Server 2008: Provides in-depth knowledge about Data Mining with SSAS. Covers the Algorithms, DMX and SSIS Data Mining solutions in detail. This is an absolute must for learning how to develop a Data Mining solution. It has tons of great examples.

Predixion Insight: Jamie MacLennan, Director of SSAS at Microsoft, left Microsoft and started Predixion which offers the best Data Mining SSAS based product on the market. The excel demo comes with great examples as well.

Data Mining Techniques: Provides good examples of how to develop a strategy for using Data Mining to solve business problems.

Data Mining Webcasts

Automation of Data Mining Using Integration Services: This article is a walkthrough that illustrates how to build multiple related data models by using the tools that are provided with Microsoft SQL Server Integration Services. In this walkthrough, you will learn how to automatically build and process multiple data mining models based on a single mining structure, how to create predictions from all related models, and how to save the results to a relational database for further analysis. Finally, you view and compare the predictions, historical trends, and model statistics in SQL Server Reporting Services reports.

Dimensional Modeling Financial Data in SSAS pt2

November 3, 2011

new: Financial Data Warehouse video 

Unary Operator and Aggregating over time

In my last post I discussed how to arrange all of the GLs into a Parent Child Hierarchy. I also mentioned that I would address the issue of having the child GLs sum to their parent through addition or subtraction.

Using Unary Operators as an attribute in the GL Account dimension controls the aggregations of child GLs to their parents.

See example below:

Net Sales = Goss Sales – Returns and Adjustments – Discounts

ni-statement

Unary Operator Attribute in the Adventureworks DimAccount dimension:

SELECT
AccountCodeAlternateKey
,[ParentAccountCodeAlternateKey]
,[AccountDescription]
,[Operator]

FROM [AdventureWorksDW2008R2].[dbo].[DimAccount]
where AccountDescription = ‘Net Sales’
or AccountDescription = ‘Gross Sales’
or AccountDescription = ‘Discounts’
or AccountDescription = ‘Returns and Adjustments’

image

Cube configuration in BIDS:

image

The last step is to create another attribute for controlling how these GLs aggregate across time. The Balance Sheet GLs are semi-additive (meaning like personal bank accounts they have running balances) and the Net Income GLs are fully additive across time. More SSAS Cube configurations are  needed to be able to have different parts of the same GL Parent Child Hierarchy aggregate differently over time.

Configure GL Account dimension to an Account Type then configure the SSAS Database like the following:

image

Keep in mind that you will need to create the ‘Account Type’ attribute to hold the values shown above.

Here is what it looks like in the AdventureWorks DimAccounts table:

SELECT
AccountCodeAlternateKey
,[ParentAccountCodeAlternateKey]
,[AccountDescription]
,[Operator]
,AccountType

FROM [AdventureWorksDW2008R2].[dbo].[DimAccount]
where AccountDescription = ‘Net Sales’
or AccountDescription = ‘Gross Sales’
or AccountDescription = ‘Discounts’
or AccountDescription = ‘Returns and Adjustments’

image

Now that all the required GL Account  attributes have been created ( unary, ‘Account Type’) and the GLs have been arranged into a Parent Child Hierarchy, how to you manage these important attributes that are not present in the source system? the answer is:

Master Data Management with SQL Master Data Services

Dimensional Modeling Financial Data in SSAS

October 26, 2011

Part 1 – Parent Child Hierarchies and Pseudo GL Accounts

new: Financial Data Warehouse video 

This post is born out of some real life  lessons that I learned while putting together a Financial Data Warehouse based on General Ledger (GL) data i.e. all the GL’s in the foundational financial statements from Net Income and  Balance Sheet to  Statistical GLs and  Cash Flow statements.

While there are many great sources for dimensional modeling, Kimball’s book The Data Warehouse Toolkit: The Complete Guide to Dimensional Modeling (Second Edition) is essential, but there are few resources which discuss dimensional modeling for Financial Data. I am going assume that the reader already has a firm knowledge base of Dimensional Modeling.  If not then see my recommendations for books that will help form that foundation: The Start of Something Great.

To be frank this could be a book sized topic in and of it’s self, but I am going to try to cover all the basics so that the reader can avoid the major pit falls.

The first step is gain an understanding of the business processes being modeled (GL creation and maintenance, Period End Close-Out etc..) and the associated requirements (financial statements: Net Income, Balance Sheet, Cash Flow etc…). It is absolutely critical to gain a deep understanding the GL analytical attributes e.g. Business Units, Cost Centers, and Sales Channels. Don’t be surprised when you find out that most of the critical GL attributes are not maintained in any source system but in excel spreadsheets!

I am going to dispense with all the gory details that can be encountered in this process and jump right into the nature and structure of Financial Statements, because their structure will have a significant effect on how the data is modeled.

To make thing easier, I am going to use AdventureWorks as a common point of reference. Here is an example Net Income Statement:

Adventure Works Cube -> Financial Reporting -> rows = named set -> Summary P&L, columns = Amount

I have rearranged the rows from the named set Summary P&L by click and dragging them into something that might look like a financial statement.

A typical financial system will generate financial statements like the one shown above through a series of calculations for each line in the financial statement.

Example:

Line/row 5 (Net Sales) = line 2 (Gross Sales) – line 3 (Returns and Adjustments) – line 4 (Discounts)

In other words Net Sales is calculated by aggregating its dependents/children (Gross Sales, Returns and Adjustments, and Discount).

Note: I know that Returns and Discounts need to be subtracted instead of added, I will discuss ways of dealing with that later.

In fact if you look at the rest of the P&L you will realize that all of the line items have a parent child relationship to each other with Net Income being the parent of all the other line items. The following image shows exactly what that a Financial GL Parent Child Hierarchy
might look like.

Create another pivot table by using the following:

Adventure Works Cube -> Financial Reporting -> Rows = Accounts, Columns = Amount

This is the full GL Accounts Parent Child Hierarchy and it is incredibly valuable to providing consistent and flexible financial analysis. The user can drill down to any level starting from the top (Net Income) to the bottom (individual GLs) and preform all their analysis at any level of granularity. Add to that the analytical attributes like Business Unit, Sales Channel, and Cost Center and you have an extremely valuable solution capable of analyzing any part of the company horizontally (GL Hierarchy) and vertically (Analytical attributes which can have their own hierarchies) instantly!!

Note: The full GL Hierarchy not only contains Net Income GL but Balance Sheet and Statistical GLs as well. Net Income is a child of the Balance Sheet which typically goes under one the Current Retained Earnings Accounts.

Take a look at the DimAccount table in the AdventureWorksDW database:

Notice how the Parent of Balance Sheet member is null. That is because Balance Sheet the parent of all GL’s it has no parent.

Ok now that we a explored the potential of the GL Parent Child Hierarchy, the down side that most financial source systems don’t have a nice GL Hierarchy pre-built for you. You are going to have to get business consensus on the structure of the hierarchy and build it yourself. and you will also need to way to manage it once it is built. You will also have to create pseudo GL accounts that don’t already exist in the source system.  These pseudo GLs will be parent GLs, like a Balance Sheet or Net Income which be required to complete the calculations in the financial statements.

For example what to do when new GLs are created and where do they go in the hierarchy?

Fortunately, there is a solution for that through Master Data Management. Microsoft’s Master Data Management’s solution is called Master Data Services which handles GL Parent Child Hierarchies very nicely.

Building the GL hierarchy is not a trivial task. The one that I built recently was for a larger company had 180,000 plus GLs. All the GLs (members of the GL Account dimension) had put into exactly the correct order for the calculations to be correct.

The next post will discuss the other critical parts of the Financial DW Solution

Things to consider: Parent Child Hierarchies and Cube performance:

See:Analysis Services 2008 R2 Performance Guide

Note: Chris Webb has a good blog post about how to change the order of the named set (Income Statement) http://cwebbbi.wordpress.com/2013/07/31/ordering-of-named-sets-in-excel/

Self Service BI and PowerPivot

October 17, 2011

  Marco Russo and Alberto Ferrari have written a book called Microsoft PowerPivot for Excel 2010: Give Your Data Meaning that I finally got around to reading.  It is great book with many good practical examples. If you are not already familiar with PowerPivot and DAX I highly recommend reading and walking through the examples in the book and then reading the authors BLOGs linked above.

This book changed my perception of the value of PowerPivot and its usefulness in the Microsoft BI stack.  I greatly respect Marco and Alberto and as a rule I try to read everything they have written.

Another great resource is the Analysis Services and PowerPivot Team Blog

From the book:

Since we first began thinking about Microsoft PowerPivot for Excel 2010, we have had to revise our thoughts a number of times. First we needed to be persuaded to look at the tool seriously.

A first impression of PowerPivot might mislead you. We made this wrong evaluation too, at the beginning of our acquaintance with PowerPivot. Do not make the same mistake. PowerPivot is powerful and useful, and it defines a new path for the era of self-service Business Intelligence (BI).

Microsoft SQL Server PowerPivot for Excel is a new technology aimed at providing selfservice Business Intelligence (BI). PowerPivot is a real revolution inside the world of data analysis because it gives you all the power you need to perform complex analysis of data without requiring the intervention of BI technicians. This tool, an Excel add-in, implements a powerful in-memory database that can organize data, detect interesting relationships, and give you a swift way to browse information. These are some of the most interesting features of PowerPivot.

■ The ability to organize tables for the PivotTable tool in a relational way, freeing the analyst from the need to import data as Excel worksheets before analyzing the data.

■ The availability of a fast, space-saving columnar database that can handle huge amounts of data without the limitations of Excel worksheets.

■ DAX  a powerful programming language that defines complex expressions on top of the relational database. DAX allows you to define surprisingly rich expressions, compared to those that are standard in Excel.

■ The ability to integrate different sources and almost any kind of data, such as information from databases, Excel worksheets, and sources available on the Internet.

■ Amazingly fast in-memory processing of complex queries over the whole database.

Some people might think of PowerPivot as a simple replacement for the PivotTable, some might use it as a rapid development tool for complex BI solutions, and others might believe it is a real replacement for a complex BI solution. PowerPivot is a great tool for exploring the BI world and implementing BI solutions. It is not a replacement for a complex BI solution, such as the ones built on top of Microsoft Analysis Services, but it is much more than a simple replacement for the Excel PivotTable.

PowerPivot fills the gap between an Excel workbook and a complete BI solution, and it has some unique characteristics that make it appealing to both Excel power users and seasoned BI
analysts. This book examines all the features of PowerPivot.

Analysis Services 2008 R2 Performance Guide

October 11, 2011

You must read this if you work with SSAS:

Analysis Services 2008 R2 Performance Guide

Infact, I would recommend re-reading it a number times.

Introduction

This guide contains information about building and tuning Analysis Services in SQL Server 2005, SQL Server 2008, and SQL Server 2008 R2 cubes for the best possible performance. It is primarily aimed at business intelligence (BI) developers who are building a new cube from scratch or optimizing an existing cube for better performance.

The goal of this guide is to provide you with the necessary background to understand design tradeoffs and with techniques and design patterns that will help you achieve the best possible performance of even large cubes.

Cube performance can be divided into two types of workload: query performance and processing performance. Because these workloads are very different, this paper is organized into four main sections.

Design Patterns for Scalable Cubes – No amount of query tuning and optimization can beat the benefits of a well-designed data model. This section contains guidance to help you get the design right the first time. In general, good cube design follows Kimball modeling techniques, and if you avoid some typical design mistakes, you are in very good shape.

Tuning Query Performance – Query performance directly impacts the quality of the end-user experience. As such, it is the primary benchmark used to evaluate the success of an online analytical processing (OLAP) implementation. Analysis Services provides a variety of mechanisms to accelerate query performance, including aggregations, caching, and indexed data retrieval. This section also provides guidance on writing efficient Multidimensional Expressions (MDX) calculation scripts.

Tuning Processing Performance – Processing is the operation that refreshes data in an Analysis Services database. The faster the processing performance, the sooner users can access refreshed data. Analysis Services provides a variety of mechanisms that you can use to influence processing performance, including parallelized processing designs, relational tuning, and an economical processing strategy (for example, incremental versus full refresh versus proactive caching).

Special Considerations – Some features of Analysis Services such as distinct count measures and many-to-many dimensions require more careful attention to the cube design than others. At the end of the paper you will find a section that describes the special techniques you should apply when using these features.

The Start of Something Great

July 20, 2011

The purpose of this BLOG is archive, as much as I can, the solutions that I have found/developed throughout my journey in Data Warehousing. These are things that I have found useful and so they might be useful for you as well.

Getting started:

Gathering a good knowledge base is a good place to start. While BLOGs are a good way find information on specific problems, I like to
read a few books on a subject before I start diving in. They provide me with good context for understanding the subject area. Listed below, in no particular order are the books that I have read that have helped me the most.

Kimball:

MDS

SQL Server

SSAS

SSIS:

SSRS:

Read, Develop, Break, Fix and Repeat …..