Archive for October, 2011

Dimensional Modeling Financial Data in SSAS

October 26, 2011

Part 1 – Parent Child Hierarchies and Pseudo GL Accounts

new: Financial Data Warehouse video 

This post is born out of some real life  lessons that I learned while putting together a Financial Data Warehouse based on General Ledger (GL) data i.e. all the GL’s in the foundational financial statements from Net Income and  Balance Sheet to  Statistical GLs and  Cash Flow statements.

While there are many great sources for dimensional modeling, Kimball’s book The Data Warehouse Toolkit: The Complete Guide to Dimensional Modeling (Second Edition) is essential, but there are few resources which discuss dimensional modeling for Financial Data. I am going assume that the reader already has a firm knowledge base of Dimensional Modeling.  If not then see my recommendations for books that will help form that foundation: The Start of Something Great.

To be frank this could be a book sized topic in and of it’s self, but I am going to try to cover all the basics so that the reader can avoid the major pit falls.

The first step is gain an understanding of the business processes being modeled (GL creation and maintenance, Period End Close-Out etc..) and the associated requirements (financial statements: Net Income, Balance Sheet, Cash Flow etc…). It is absolutely critical to gain a deep understanding the GL analytical attributes e.g. Business Units, Cost Centers, and Sales Channels. Don’t be surprised when you find out that most of the critical GL attributes are not maintained in any source system but in excel spreadsheets!

I am going to dispense with all the gory details that can be encountered in this process and jump right into the nature and structure of Financial Statements, because their structure will have a significant effect on how the data is modeled.

To make thing easier, I am going to use AdventureWorks as a common point of reference. Here is an example Net Income Statement:

Adventure Works Cube -> Financial Reporting -> rows = named set -> Summary P&L, columns = Amount

I have rearranged the rows from the named set Summary P&L by click and dragging them into something that might look like a financial statement.

A typical financial system will generate financial statements like the one shown above through a series of calculations for each line in the financial statement.

Example:

Line/row 5 (Net Sales) = line 2 (Gross Sales) – line 3 (Returns and Adjustments) – line 4 (Discounts)

In other words Net Sales is calculated by aggregating its dependents/children (Gross Sales, Returns and Adjustments, and Discount).

Note: I know that Returns and Discounts need to be subtracted instead of added, I will discuss ways of dealing with that later.

In fact if you look at the rest of the P&L you will realize that all of the line items have a parent child relationship to each other with Net Income being the parent of all the other line items. The following image shows exactly what that a Financial GL Parent Child Hierarchy
might look like.

Create another pivot table by using the following:

Adventure Works Cube -> Financial Reporting -> Rows = Accounts, Columns = Amount

This is the full GL Accounts Parent Child Hierarchy and it is incredibly valuable to providing consistent and flexible financial analysis. The user can drill down to any level starting from the top (Net Income) to the bottom (individual GLs) and preform all their analysis at any level of granularity. Add to that the analytical attributes like Business Unit, Sales Channel, and Cost Center and you have an extremely valuable solution capable of analyzing any part of the company horizontally (GL Hierarchy) and vertically (Analytical attributes which can have their own hierarchies) instantly!!

Note: The full GL Hierarchy not only contains Net Income GL but Balance Sheet and Statistical GLs as well. Net Income is a child of the Balance Sheet which typically goes under one the Current Retained Earnings Accounts.

Take a look at the DimAccount table in the AdventureWorksDW database:

Notice how the Parent of Balance Sheet member is null. That is because Balance Sheet the parent of all GL’s it has no parent.

Ok now that we a explored the potential of the GL Parent Child Hierarchy, the down side that most financial source systems don’t have a nice GL Hierarchy pre-built for you. You are going to have to get business consensus on the structure of the hierarchy and build it yourself. and you will also need to way to manage it once it is built. You will also have to create pseudo GL accounts that don’t already exist in the source system.  These pseudo GLs will be parent GLs, like a Balance Sheet or Net Income which be required to complete the calculations in the financial statements.

For example what to do when new GLs are created and where do they go in the hierarchy?

Fortunately, there is a solution for that through Master Data Management. Microsoft’s Master Data Management’s solution is called Master Data Services which handles GL Parent Child Hierarchies very nicely.

Building the GL hierarchy is not a trivial task. The one that I built recently was for a larger company had 180,000 plus GLs. All the GLs (members of the GL Account dimension) had put into exactly the correct order for the calculations to be correct.

The next post will discuss the other critical parts of the Financial DW Solution

Things to consider: Parent Child Hierarchies and Cube performance:

See:Analysis Services 2008 R2 Performance Guide

Note: Chris Webb has a good blog post about how to change the order of the named set (Income Statement) http://cwebbbi.wordpress.com/2013/07/31/ordering-of-named-sets-in-excel/

Advertisements

Self Service BI and PowerPivot

October 17, 2011

  Marco Russo and Alberto Ferrari have written a book called Microsoft PowerPivot for Excel 2010: Give Your Data Meaning that I finally got around to reading.  It is great book with many good practical examples. If you are not already familiar with PowerPivot and DAX I highly recommend reading and walking through the examples in the book and then reading the authors BLOGs linked above.

This book changed my perception of the value of PowerPivot and its usefulness in the Microsoft BI stack.  I greatly respect Marco and Alberto and as a rule I try to read everything they have written.

Another great resource is the Analysis Services and PowerPivot Team Blog

From the book:

Since we first began thinking about Microsoft PowerPivot for Excel 2010, we have had to revise our thoughts a number of times. First we needed to be persuaded to look at the tool seriously.

A first impression of PowerPivot might mislead you. We made this wrong evaluation too, at the beginning of our acquaintance with PowerPivot. Do not make the same mistake. PowerPivot is powerful and useful, and it defines a new path for the era of self-service Business Intelligence (BI).

Microsoft SQL Server PowerPivot for Excel is a new technology aimed at providing selfservice Business Intelligence (BI). PowerPivot is a real revolution inside the world of data analysis because it gives you all the power you need to perform complex analysis of data without requiring the intervention of BI technicians. This tool, an Excel add-in, implements a powerful in-memory database that can organize data, detect interesting relationships, and give you a swift way to browse information. These are some of the most interesting features of PowerPivot.

■ The ability to organize tables for the PivotTable tool in a relational way, freeing the analyst from the need to import data as Excel worksheets before analyzing the data.

■ The availability of a fast, space-saving columnar database that can handle huge amounts of data without the limitations of Excel worksheets.

■ DAX  a powerful programming language that defines complex expressions on top of the relational database. DAX allows you to define surprisingly rich expressions, compared to those that are standard in Excel.

■ The ability to integrate different sources and almost any kind of data, such as information from databases, Excel worksheets, and sources available on the Internet.

■ Amazingly fast in-memory processing of complex queries over the whole database.

Some people might think of PowerPivot as a simple replacement for the PivotTable, some might use it as a rapid development tool for complex BI solutions, and others might believe it is a real replacement for a complex BI solution. PowerPivot is a great tool for exploring the BI world and implementing BI solutions. It is not a replacement for a complex BI solution, such as the ones built on top of Microsoft Analysis Services, but it is much more than a simple replacement for the Excel PivotTable.

PowerPivot fills the gap between an Excel workbook and a complete BI solution, and it has some unique characteristics that make it appealing to both Excel power users and seasoned BI
analysts. This book examines all the features of PowerPivot.

Event Driven SSIS with Synchronous Processing and Dynamic Subscriptions

October 12, 2011

I know the title of this post is a little cryptic and something that only a techie would like but let me give you the real world requirements so that is will be a little more concrete.

Solution requirements:

  1. User uploads file to SharePoint 2010
  2. On file upload  an Event Handler executes a SSIS package
  3. SSIS package loads aggregate data into a database for SSRS Reports
  4. Upon completion of SISS package, a SSRS  Subscription executes SSRS reports (in SharePoint Integrated mode) which  get new  data from the database
  5. Subscription delivers reports (email, shared folder) and Event handler continues with some other tasks

Sounds simple enough right, but there a few challenges

  1. SSIS is not Event Driven

This is by design, because in most cases SSIS is used to ETL large amounts of data. Executing SSIS packages can have a big impact of IT environment resources (think night ETL process for Enterprise Data Warehouse).

However, there are several ways to execute a SSIS package outside of its schedule. DTEXEC can do it but for security reasons in this situation I prefer to use the sp_start_job  System Stored Procedure.

  1. sp_start_job permissions

Make sure that the user/services account that executes sp_start_job has the correct permissions:

By default, members of the sysadmin fixed server role can execute this stored procedure. Other users must be granted one of the following SQL Server Agent fixed database roles in the msdb database:

  • SQLAgentUserRole
  • SQLAgentReaderRole
  • SQLAgentOperatorRole

For details about the permissions of these roles, see SQL Server Agent
Fixed Database Roles
.

Members of SQLAgentUserRole and SQLAgentReaderRole
can only start jobs that they own. Members of SQLAgentOperatorRole can start all local jobs including those that are owned by other users. Members of sysadmin can start all local and multiserver jobs.

  1. sp_start_job is asynchronous

Although is it not officially documented by Microsoft, sp_start_job is asynchronous which means when the sp_start_job executes it immediately
returns that the SSIS package was executed successfully. Per the requirements above, this will cause the subscription to generate the reports before the SSIS package has staged the new data into the database.

sp_start_job can be  synchronous by simply adding this code to a  stored procedure that calls sp_start_job.

  1. Dynamic SSRS  Subscription

I am not referring to Data Driven Subscritpions. Now the SSIS package can be executed synchronously, I can be sure that the reports have fresh data and that the Event Handler can continue with some other tasks that are dependent on the SSIS package.

I could simply create an Execute SQL task in the SSIS package which uses sp_start_job to run the subscription but there are few issues with this.

When a subscription is created for the SSRS reports (The reports in this case are deployed to SharePoint). The SQL agent job which executes the subscription gets a GUID for its name

I don’t want to have to hard code the job GUID in my SSIS package. That would also add more steps to my deployment to production. And if the subscription is ever modified it could cause the whole process to fail.

If you open up the step which executes the subscription you will see the following:

The exec [ReportServer].dbo.AddEvent schedules the subscription to be run but to get the exact subscription for the report requires a little more work.

Notice that the @EventData = … is the GUID for the subscription.

But the subscription GUID needs to be associated to the correct report. To get that use the following query in the ReportServer DB:

select *

FROM
[ReportServer].[dbo].[Subscriptions]

In the Results plane Expand the ExtensionSettings field and you will see something like the following

<ParameterValues><ParameterValue><Name>FILENAME</Name><Value>Weekly_Summary_Analytics</Value></ParameterValue>…

Now the file name from the ExtensionSettings can be associated to the Subscription GUID and the correct subscription can be executed without hard coding GUIDs.

Just add the following code to an Execute SQL task that is connected to the ReportServer DB.

declare @GUID uniqueidentifier = (SELECT top 1
[SubscriptionID]

  FROM [ReportServer].[dbo].[Subscriptions]

  where ExtensionSettings like ‘<ParameterValues><ParameterValue><Name>FILENAME</Name><Value>Weekly_Summary_Analytics</Value></ParameterValue>%’)

exec [ReportServer].dbo.AddEvent @EventType=’TimedSubscription’, @EventData= @GUID

This assumes that you have unique file names/ SSRS reports.

Analysis Services 2008 R2 Performance Guide

October 11, 2011

You must read this if you work with SSAS:

Analysis Services 2008 R2 Performance Guide

Infact, I would recommend re-reading it a number times.

Introduction

This guide contains information about building and tuning Analysis Services in SQL Server 2005, SQL Server 2008, and SQL Server 2008 R2 cubes for the best possible performance. It is primarily aimed at business intelligence (BI) developers who are building a new cube from scratch or optimizing an existing cube for better performance.

The goal of this guide is to provide you with the necessary background to understand design tradeoffs and with techniques and design patterns that will help you achieve the best possible performance of even large cubes.

Cube performance can be divided into two types of workload: query performance and processing performance. Because these workloads are very different, this paper is organized into four main sections.

Design Patterns for Scalable Cubes – No amount of query tuning and optimization can beat the benefits of a well-designed data model. This section contains guidance to help you get the design right the first time. In general, good cube design follows Kimball modeling techniques, and if you avoid some typical design mistakes, you are in very good shape.

Tuning Query Performance – Query performance directly impacts the quality of the end-user experience. As such, it is the primary benchmark used to evaluate the success of an online analytical processing (OLAP) implementation. Analysis Services provides a variety of mechanisms to accelerate query performance, including aggregations, caching, and indexed data retrieval. This section also provides guidance on writing efficient Multidimensional Expressions (MDX) calculation scripts.

Tuning Processing Performance – Processing is the operation that refreshes data in an Analysis Services database. The faster the processing performance, the sooner users can access refreshed data. Analysis Services provides a variety of mechanisms that you can use to influence processing performance, including parallelized processing designs, relational tuning, and an economical processing strategy (for example, incremental versus full refresh versus proactive caching).

Special Considerations – Some features of Analysis Services such as distinct count measures and many-to-many dimensions require more careful attention to the cube design than others. At the end of the paper you will find a section that describes the special techniques you should apply when using these features.

Microsoft Data Warehouse Offerings

October 6, 2011

James Serra has recently posted a lot of good information about Microsoft’s Data Warehouse offerings. He has also provided a lot of good links to more information.

MicrosoftSQL Server Reference Architecture and Appliances

MicrosoftSQL Server Parallel Data Warehouse (PDW) Explained

The article Implementing a Microsoft SQL Server Parallel Data Warehouse Using the Kimball Approach  is a great article to get a little deeper into Microsoft’s Parallel Data Warehouse as well.

This white paper explores how the Kimball approach to architecting and building data warehouse/business intelligence (DW/BI) system works with Microsoft’s Parallel Data Warehouse, and how you would incorporate this new product as the cornerstone of your DW/BI system. For readers who are not familiar with the Kimball approach, we begin with a brief overview of the approach and its key principles. We then explore the Parallel Data Warehouse (PDW) system architecture and discuss its alignment with the Kimball approach. In the last section, we identify key best practices and pitfalls to avoid when building or migrating a large data warehouse to a Microsoft SQL Server PDW system. This includes a look at how Parallel Data Warehouse can work with other Microsoft offerings such as SQL Server 2008 R2, the Microsoft Fast Track Reference Architecture for Data Warehouse, and the new Business Data Warehouse to provide a complete solution for enterprise-class data warehousing.

I have even proposed my own ideas to help with SSISETL Scalability

As a side note, I was recently was drawn into a discussion of Kimball vs. Inmon.  In my experience, most people who say they understand both really don’t. Here is a good article from Kimball which explains some of the differences.

The Kimball bus architecture and the Corporate
Information Factory: What are the fundamental differences?

Facts and Fables About Dimensional Modeling

Honestly, I believe that reading all of Kimball’s books should be a requirement before anyone who is called a Microsoft BI Professional.

Transforming Source Keys to Real Keys

October 3, 2011

Thomas Kejser a SQL Server Regional Director has recently made several great posts about

Transforming Source Keys to Real Keys Part1

Transforming Source Keys to Real Keys – Part 2

fixing Zombie and Dolly keys !  

It is a good approach to starting with Master Data Manangement.  

His Physically Placing the Maps in the Architecture post brings it altogether.