SQLStarters

SQL Server Data Compression

Older compression tech
Compression has been around for a while…
SQL Server has had the Data Compression feature for a while, ever since the version 2008, so it is hardly a new thing. However as it has been Enterprise Edition feature until SQL Server 2016 Service Pack 1, it is not something you see employed very often. Technically speaking, you could also compress data before 2008 by using NTFS file level compression on a read only data. However with the implementation of SQL Server Data Compression you could now do it inside the database on a page or a row level.

Reasons for Data Compression

If you have a version and/or edition of SQL Server that supports Data Compression, I can’t think of a good reason not to use it. Especially today, when new data is generated in ever-increasing speed. You will not only save precious storage space but can also find other potential benefits as well, such as improved performance, without changing anything in your application. In theory, reading compressed data reduces the need for I/O with the cost of slightly increased use of CPU, so the faster query processing doesn’t come free. In practice however, I’ve also seen both I/O and CPU usage go down for some queries after enabling Data Compression.

It should be noted that Data Compression in SQL Server is highly flexible. You can mix both Row and Page compression and you can do the compression on one table at the time if you wish, to better monitor the impact it might have in your system.

Different types of Data Compression

SQL Server provides few different types of data compression methods, but in this post we will be looking at Page and Row Compressions. As I want to keep the length of this post at least bit limited, I will leave both Backup Compression and ColumnStore indexes to wait for another post. As the names suggest, these two compression methods have slightly different approach to compression and we will start by looking at these differences first.

Page Compression

Page compression is, in short, de-duplication of repeating data. When you enable page compression for a table, it will look at the values and for each repeating value and replace it with a point. Out of these two options, page compression is definitely more resource heavy. It will first apply row compression, then do prefix compression and finally dictionary compression. You are very likely to get higher compression ratio with page than row compression.

There is a good article in MSDN explaining page compression bit more thoroughly.

Row Compression

Row compression is the more lightweight option and will generally offer bit lower compression ratio than with page compression. Row compression isn’t necessarily the most accurate name, as there is no actual compression, but SQL Server rather removes unused bytes in some of the fixed data types. You can find the full list of data types and how they are affected from this MSDN article.

What to compress?

A good way to start determining what to compress is by looking at the largest tables in your database. After you have a list of the largest table, SQL Server does provide you with some tools to help you make the decision. The first one is a Stored Procedure called sp_estimate_data_compression_savings and the name really says it all. Running this Stored Procedure will give a good idea of how much space will you save by compressing the table. Using it is pretty straightforward.

EXEC sp_estimate_data_compression_savings 
    @schema_name = 'schema', 
    @object_name = 'table_name', 
    @index_id = NULL, 
    @partition_number = NULL, 
    @data_compression = N'row/page/none';

If you don’t specify any specific Index ID and use NULL instead, then it will do the estimation for all the indexes. The output will tell you the size with the current compression settings and what the size would be with requested compression setting. Besides compression efficiency, other factor would be to find out how the table is used. A rule of a thumb would be that less frequently updated tables that are read often are good candidates for Page compression. Where as tables that are updated often are perhaps better candidates for Row compression.

In short, consider the following: Compression ratio with both Page and Row, are you doing more Updates than Selects and finally, do you have CPU power to spare for the compression.

How to compress?

Once you know the tables you wish to compress, it is rather easy. You do it by running the following ALTER TABLE command.

ALTER TABLE [table_name] REBULD PARTITION = ALL
WITH
    (DATA_COMPRESSION = PAGE);

And after that you can just wait for the compression to complete. If you want to only compress specific index, you would do it with the ALTER INDEX command.

ALTER INDEX [index_name] ON [table_name] REBUILD PARTITION = ALL
WITH
    (DATA_COMPRESSION = PAGE);

While the compression itself is easy to implement, you should be aware that with ALTER TABLE and ALTER INDEX commands there is a risk of these operations affecting queries being run against the tables or indexes that are being compressed. Thus for very large tables and busy systems it is a good idea to schedule compression to happen during quiet hours or a maintenance break, if possible.

Published by

Leave a Reply

%d bloggers like this: