SQL Server Database Performance Tuning: To ETL or To ELT ?

ELT v.s. ELT has been something that has been talked about a lot over the years but I still find that it is still an esoteric subject to most people. One reason for this might be because, in the beginning days of SQL Server SSIS, there were some mixed signals which has lead to some confusion that still persists to today.

Just to make sure that we are all on the same page, ETL , for the purposes of this post, means that you are using SSIS  to first Extract data from your source system(s)and then SSIS to Transform the data into a dimensional format using all those pretty transforms in  SSIS Toolbox. The last step is then to Load the data into the target dimension or fact tables.

image

Figure 1: Pretty Transforms

They are even prettier in SSDT. Awwwe …they are so cute!

ELT then means first Extracting the data from your sources and then Loading it to a staging location on the SQL Server data warehouse server. The data is then Transformed into a dimensional format not with SSIS but with the DATABASE ENGINE. In other words you won’t get to use many of those pretty transforms in the Data Flow – SSIS toolbox. This is usually accomplished by using  the Control Flow – Execute T-SQL Statement to execute store procedures or SQL commands.

image

Figure 2: ELT SSIS Package

As shown in Figure 2 it is best to use the T-SQL Merge command to dimensionalize the data because it preforms much faster than separate update, insert, and delete commands.

Which is Better ?

So now the question becomes, which is better for data warehousing loads ETL or ELT? The answer is… ELT !!!! or in other words ALWAYS ELT !!! The reason for this is strait forward, the database engine can preform asynchronous transformations much more efficiently than SSIS !

The SQL Server Database Engine has been around much more longer than SSIS and has been tuned a lot over the years to preform things like sorts and joins (which are need to dimensionalize data) efficiently. That is kind of the whole point of the Query Engine and Query Optimizer!

Bottom Line

Let the Database Engine do what it is best at namely TRANSFORMATIONS and let SSIS do what it does best  namely data pipeline and workflow management. The right tool for the right job!

SSIS Transforms

In the world of SSIS not all transforms are created equal in terms of performance. If you must use them, then you should be aware of which ones will cause the most performance issues.

image

Figure 3: Data Flow Transformations

The worst offenders starting on the right  are the fully-blocking asynchronous transformations followed by the partially-blocking ones which are not as bad but they can still cause issues. The synchronous data flow transformation preform good but you still need to watch out for memory pressure as the datasets increase in size.

Last note: NEVER use the OLE DB Command for a SQL Server data warehouse batch load process it is pure evil because  it does DML commands on a row-by-row basis. Good luck loading a lot rows with that!

Advertisements

3 Responses to “SQL Server Database Performance Tuning: To ETL or To ELT ?”

  1. Difference between ETL and ELT | James Serra's Blog Says:

    […] To ETL or To ELT ? […]

  2. Ralph D. Wilson II Says:

    Garrett,

    First, let me thank you for your blog . . . it provides great insights (and I don’t just say that because this one validated my personal preference for accomplishing data loads ;-).

    Second, a request for clarification/expansion of your statement:
    “Last note: NEVER use the OLE DB Command for a datawahrehouse batch load process it is pure evil because as it does DML commands on a row-by-row basis. ”

    I have to admit that, after having briefly looked at the description of the “OLE DB Command”, I have _never_ used it (or understood the need for it); however, are there other transforms that resuklt in rows being processed in a RBAR manner? For instance, it appears as though the use of Lookup, Data Conversion, Derived Column, and several other tasks that are not uncommonly used within Data Flow Tasks tend to result in a RBAR processing (or, at the very kleast, to significantly slow down the processing of the data).

    I have found that several of these can often be accomplished within SQL statatements with a bit of creativivity in using JOINs and CTE’s and the results seem to be derived more quickly.

    • Garrett Edmondson Says:

      Ralph –
      I think that you are on the right track. At a high level SSIS processes data in batches/ “buffers” however each individual Data Flow Tasks can process each buffer differently. Synchronous tasks are like the T-SQL CAST. They don’t need to spool rows in memory. The SORT task is asynchronous and fully blocking because it has to spool all the rows into memory in order to perform the sort. The OLE BE COMMAND accepts in coming rows from the data flow and executes an individual DML command for every row.
      Matt Masson has a great presentation that talks about all this in detail: http://channel9.msdn.com/Events/TechEd/NorthAmerica/2010/BIE13-INT . BTW He is a product manager for SSIS and personally developed some of the Data Flow Tasks.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s


%d bloggers like this: