Posts Tagged ‘Data Loading performance’

Data Loading performance considerations with Clustered Columnstore indexes

March 16, 2015

The SQL CAT resently released an excellent article called “Data Loading performance considerations with Clustered Columnstore indexes“.  It is useful even for those who, like me, have been working with the Clustered Columnstore Indexes for several years now. The Overview of Loading Data into Columnstore Indexes section mentions that an index REGORG can effect batch size. In my experience, inconsistent  performance can sometimes be caused by a batch being too small i.e. less than the 102,400 row minimum, and so the tuple mover may not be triggered in time leaving some data operations in row mode instead of the more efficient batch mode. For small to medium size fact tables that maybe not always exceed the 102,400 row batch medium it maybe necessary to execute an index REORG which immediately will trigger the tuple mover after the normal data loading process is complete. This disadvantage to his approach is that the batch sizes over time may not be optimal because they were forced to be smaller than the 102,400 row minimum. If performance starts to suffer then implementing a periodic index REDBUILD routine will rebuild the index/table with batches closer to the 1 million rows/batch goal.

Additionally, I still, from time to time, see a lot of confusion on the distinction between an Operational Data Store (ODS) and a Data Warehouse. James Serra recently wrote an excellent BLOG about Operational Data Stores that captures their most important  functions. In my opinion one of the easiest and most effective ways to create an ODS is to simply setup an asynchronous SQL Server AllwaysOn Availability Group secondary.  Several  of the Availability Group database could form the core of an ODS but would most likely be a poor substitute for a Data Warehouse.