Archive for July, 2012

SSIS 2012 Flat File Connection Manager

July 15, 2012

There have been several enhancements made and features added to Integration Services with the release of SQL Server 2012.

One of the SSIS components that has been enhanced is the Flat File Connection Manager. The Flat File Connection Manager no longer errors out when columns are missing from the Flat File. See Row2 in example below:

Figure 1

image

Figure 1 is a simple example of a text file with two columns and three rows that tests three different scenarios.  Row1 has values for both columns. Row2 is missing the last column and row3 contains extra columns which are not defined in SSIS package metadata.

Figure 2

image

In Figure 2 I setup a very simple SSIS package that pulls all the rows from the text file in Figure 1. Notice that row2 from the text file is missing the last column and returns an empty/null value. In the older version of SSIS row2 would have produced an error at run time.

The interesting thing about this example is that row3 col2 contains all the  values for columns that were not originally defined in the SSIS metadata. This means that unless you want all those values in the last column you should redirect those rows using a Conditional Split.

SQL 2012 DW Learning Resources

July 8, 2012

There are a lot of things to learn with the release of SQL Server 2012.  While there are many good resources to learn about the new features, TechEd is one the best places to learn from the actual engineers that made the feature or from industry leaders:

TechEd North America

TechEd is Microsoft’s premier technology conference for IT professionals and developers, offering the most comprehensive technical education across Microsoft’s current and soon-to-be- released suite of products, solutions, tools, and services. This June we celebrate the 20th annual TechEd North America in Orlando. Experience four days of hands-on training from Microsoft experts.

The following are some of my favorite sessions:

Killer Real-World PowerPivot Examples -This session is even more jam packed with amazing PowerPivot report examples than last year’s presentation. This is *the* session to attend if you want to get the most out of PowerPivot visualizations. See real-world PowerPivot reports from clients in Education, Retail, Banking, and Telco, and learn a variety of visualization techniques including sparklines, slicers, and charts. You will leave this fun session armed with plenty of ideas for your next personal BI project. Oh, and watch out for flying Koalas. #TEDBI330

SQL Server AlwaysOn: Active Secondaries – One of the most highly requested features in Microsoft SQL Server 2012 is the ability to offload read workloads and backups to secondary replicas in AlwaysOn configurations. In this talk we describe, using demos, how this feature works, how to configure it, and what the advantages and limitations are.

Enterprise Information Management (EIM): Bringing Together SSIS, DQS, and MDS – Enterprise Information Management (EIM) is an industry term for managing your data for data integration, quality, and governance, and is an important part of the Microsoft SQL Server 2012 release. This session revolves around a demo which brings together our EIM functionality in SQL Server 2012 and which tells our Credible, Consistent Data story. We show you how SQL Server Integration Services (SSIS), Data Quality Services (DQS), Master Data Services (MDS) and other Microsoft technologies work together to provide a comprehensive EIM solution. #TEDBI310

SQLCAT: SQL Server HA and DR Design Patterns, Architectures, and Best Practices Using Microsoft SQL Server 2012 AlwaysOnThis session describes how to build practical end-to-end HA and DR solutions for mission critical applications using SQL Server 2012 AlwaysOn technologies. For example, how to combine Failover Cluster Instances with Availability Groups to provide a complete HA and DR solution, and how to use Availability Groups Multiple Secondaries to replace existing database mirroring + log shipping solutions. We share learnings from early customer engagements, and discuss some practical customer examples on why they picked a particular solution. We also discuss upgrade from existing solutions to new "Denali" solutions with minimal downtime. #TEDBI316

Incremental ETL Using CDC for SQL and Oracle with SQL Server Integration Services (SSIS) 2012 Exploring real-world customer scenarios, we take a look at how the new Change Data Capture (CDC) components for SSIS in Microsoft SQL Server 2012 simplify incremental ETL and Data Warehouse loads. We demo the new functionality in SSIS, and show off how the new tools make it easy to set up and configure CDC against both SQL Server and Oracle sources. We present the different CDC configuration options, and demonstrate design patterns that can be used to overcome common problems you’ll face while doing incremental loading of data. If your ETL solution needs to handle incremental loads from data sources that were designed explicitly to support them, then this is a session you can’t afford to miss. #TEDBI322

Optimizing Microsoft SQL Server Analysis Services for Big Data – Big Data is now a reality and more than ever we are asked to analyze greter volumes of data. Find out how to make Analysis Services work harder and more effectively in this new world of Big Data. Specific strategies will be discussed. #TEDBI412

Big Data, Big Deal? Are you ready for the exploding world of big data? Do you know the difference between Hive and Pig? Do you know why MapReduce is being taught in many universities rather than SQL? If not, pay attention because this talk will help get you started in understanding this new world. While sometimes the Hadoop toolkit (which includes HDFS, MapReduce, Hive, Pig, and Sqoop) is used as an alternative to relational database systems such as SQL Server, more frequently customers are using it as a complementary tool. Sometimes it may be used as an ETL tool or to perform an initial analysis of a freshly acquired data set to determine whether or not it is worth loading into the data warehouse, and sometimes to process massive data sets that are too big to even contemplate loading into all but the very largest data warehouses. In addition to covering the basics of the various parts of the Hadoop stack, this talk discusses the strengths and weakness of the Hadoop approach compared to that provided by relational database systems and explores how the two technologies can be used productively in conjunction with one another. #TEDBI209

SQL Server Columnstore Performance TuningColumnstore indexes, and a new, vector-based query execution method called “batch mode,” can speed up query processing in data warehouses and data marts using Microsoft SQL Server 2012 by a breathtaking 10X to 100X. This talk describes how to tune indexes, queries, and load processes to help your customers get the most out of columnstore indexes and query execution. Learn secrets that can help you turn modest performance gains into stunning, orders-of-magnitude improvements, and blow away the competition. We focus on how to work around constraints that limit the use of batch processing when outer joins, unions, and certain other constructs are present in queries. We also describe how to load columnstores effectively, and even perform real-time loads on columnstore indexes that are technically read-only.

Building the Fastest SQL ServersSQL Server speed boils down to how fast we can get data in and out of storage. Sooner or later, we’re going to run out of memory to cache. In this session, we take a lesson from Microsoft’s blazing-fast Fast Track Reference Architectures, look at why they work so well for the wildest queries, and explain how to build our own SQL Servers for quick queries. Microsoft Certified Master Brent Ozar shows you how to test your own servers and compares them to servers he’s worked with in the field.

Building BI Solutions with Microsoft SQL Server PDW AU3SQL Server is releasing AU3 version of Parallel Data Warehouse (PDW) with a new software architecture featuring a cost-based optimizer, stored procedure support, SQL Server security model, as well as the Microsoft TDS protocol that allows native integration with existing Microsoft tools. The new architecture delivers order of magnitude (10x or more) performance improvement compared to the existing AU2 product, and greatly improves connectivity and programmability functionality. This session focuses on the benefits to BI solutions using SQL Server PDW, showing how to integrate PDW AU3 into key BI solutions: Microsoft BI Platform as well as key third-party products –

BISM: Multidimensional vs. TabularWith the introduction of BISM Tabular in Microsoft SQL Server 2012, you have two options for a new project. BISM Multidimensional, formerly known as UDM, is still an option. In this session, see a comparison between the two “engines” available in Analysis Services to learn when they are similar and when they are different. Learn what to choose and why, depending on your requirements.

Optimizing Your BI Semantic Model for Performance and Scale – The release of Microsoft SQL Server 2012 allows standalone tabular models. This session goes into the internal architecture of the engine and describes implications on performance and scale of the tabular models and how they can be tuned effectively.

SSIS Expressions 4000 Character Limit

July 1, 2012

One of the most powerful and most overlooked features of SSIS is the ability have virtually every property of any component be controled by an expression. This allows for SSIS packages to have great flexibility.

However in SSIS 2008 the output of an expression cannot exceed 4000 characters. While in most cases your expression output will not exceed the limit, if you are ever unfortunate enough to run across this limit you will soon realize that there is no easy work-around for it.

See this Microsoft connect: SSIS Maximum string length of 4000 is too restrictive

Testing this restriction is as easy as creating a expression on a variable that returns more than 4000 characters.

Example: REPLICATE(“1”,4001)

image

Note: The above uses BIDS Helper to evaluate the expression.

If you change the expression to  REPLICATE(“1”,4000) then the expression is evaluated successfully.

image

However, this limitation has been removed in SSIS 2012. Just open up SSDT (no more BIDS) and create the following  expression on a variable:  REPLICATE(“1”,40001)

image