Home > Data Compression > Data Compression (SQL Server 2008, 2008R2 Enterprise Edition)

Data Compression (SQL Server 2008, 2008R2 Enterprise Edition)

Data Compression is “new” feature, introduced in SQL Server 2008 and carried onward to SQL Server 2008R2. It’s available only in Enterprise Edition, though.

There are tonnes of posts published on the web, so I will continue following up my old self-imposed policy: no unnecessary repetition of already well-thought and well-written stuff.

These posts / articles explain SQL Server 2008 Data Compression (one of it has some word about Backup Compression as well):

By going through these articles, you should have more or less comprehensive grasp over what Data Compression in SQL Server 2008/2008R2 is.

The only thing which neither above article mentions is why and when would you want to consider the data compression and what for. Here it is. Long story short, as you’ve figured it out by now, applying data compression you’re saving on storage space but losing on CPU and thus the processing speed, i.e. performance. So, which scenarios make this combination of “storage space winning vs performance loosing” ideal for data compression implementation? And the answer is:

  • landing area – no “procs and cons”, i.e. just go with it
  • staging area – the same as above
  • reporting database (or simply put – a data warehouse)

Well, in case with reporting database, it’s not as straight forward as with landing and staging. Instead, it’s a game of requirements and characteristics: how many people would be using it simultaneously, the complexity of reports, if there are going to be some “heavy” reports, which tables or views (which are tables at the bottom lines anyway) they are going to be selecting from, online vs batch reports, etc. Here, there is no a single universal recipe, yet after careful examination and giving it enough thought, you come up with data compression plan which, even not comprehensive, i.e. doesn’t include all the tables, can still save you tonnes of Gb of storage with either no or very limited impact on performance and execution timings respectively.

The scenarios are clear, but you might still wonder, why to use data compression, what are the reasons which all over a sudden would make you do something with your database, which you might probably have never done before – compress data, which goes along with the risk of performance degradation. Well, here it is.

Although disk storage is getting cheaper and cheaper nowadays, don’t count on this very much. Unfortunately, there are good number of factors, other than just financial, standing on the way of obtaining (read “buying”) more storage when needed. I don’t want to go that route, “don’t get me started on this” kind of thing, you will have to take my word for it (or not). No matter how ridiculous it may sound, this is what it is more often than you probably think or encountered so far in your career. And this is exactly where SQL Server Data Compression comes in not only handy, but sometimes as a savior, which makes all the difference between “yes, we can start with it now” and “no, we can’t do it until new storage is acquired and in place”. You may think “what the heck, it’s not my job, as a database administrator, to care about storage, purchases and other financial or you name it matters”. Technically you’re right, – it is not. But in real life, the game called “blame it all on DBA” is always going on to one or another extent. Don’t ask me why, please, because I would love to know the answer to this question myself. But this is what it is, one of those real life realities which you would never hear sitting at your desk in school, until you get out and start rolling. So, I do encourage you to use Data Compression whenever you might find it applicable. It does stand out from the rest of new features of SQL Server 2008. I’m sure we all love backup compression, and data compression, apparently, is falling a way back after the first one’s popularity and wide acceptance. That’s where it’s wrong. It’s not less important and useful than backup compression. Not less at all.

Now let’s go over some demo. The numbers would speak by themselves and I hope they would surprise you as much as they surprised me. If your wondering-degree is even half on mine, that too would be an awesome thing!

Let’s see the example. I do have a non-compressed table, which has only some of the records pooled from one of Oracle BI systems using SSMS’ Data Import.  So, the table is HEAP, no NIXes (non-clustered indexes), has 18043702  records, “weights” roughly 16Gb, which makes an average record length of roughly 913 bytes.

Now let’s get our estimates of how much space it would take once compressed

  • ROW Compression:

EXECUTE sp_estimate_data_compression_savings 'DATAW', 'TRAN_STATS_SRI_DLY', NULL, NULL, 'ROW';

  • PAGE Compression:

EXECUTE sp_estimate_data_compression_savings 'DATAW', 'TRAN_STATS_SRI_DLY', NULL, NULL, 'PAGE';

PAGE compression is usually more efficient than ROW compression because it has prefix and dictionary types of compression on top of what ROW compression does, and in general, IMHO, should be the default data compression method to go with. Here, it’s not much different, although still saves around 230Mb of space, which accounts to 16% of final ROW-compressed table size. The ratio of non-compressed to page-compressed is 13(!)

This is quite something – after the PAGE compression applied, the table will take 13 times less space from its original size. Quite something, isn’t it?

(this script allows to check all the tables sizes and their respective compression estimates)

Let’s do it now!

Let’s first look at the space taken by the database single datafile:

SELECT DB_NAME(database_id) AS database_name
, name AS logical_DATAFILE_name
, physical_name AS physical_DATAFILE_name
, size/128 AS size_Mb
, max_size/128 AS max_size_Mb
, ROUND(size*8.0,0) AS size_Kb
, ROUND(max_size*8.0,0) AS max_size_Kb
FROM sys.master_files
WHERE type_desc = 'ROWS' AND DB_NAME(database_id) = '[DB_NAME]'
ORDER BY DB_NAME(database_id)

This is, not obviously the single table in entire database but it is the largest one – alone taking 16 out of 27 gigabytes. Now, let’s run the PAGE compression.

ALTER TABLE DATAW.TRAN_STATS_SRI_DLY rebuild partition = ALL WITH (DATA_COMPRESSION = PAGE);

wait till it completes and query the table size now:

Vouala, 1Gb! But that’s not everything. Now we have to reclaim all that free space from the datafile by shrinking it:

DBCC SHRINKFILE(N'[DATAFILE_LOGICAL_FILENAME]’, 1);

(and give it some time to complete)

Now, let’s see what the datafile size is:
SELECT DB_NAME(database_id) AS database_name
, name AS logical_DATAFILE_name
, physical_name AS physical_DATAFILE_name
, size/128 AS size_Mb
, max_size/128 AS max_size_Mb
, ROUND(size*8.0,0) AS size_Kb
, ROUND(max_size*8.0,0) AS max_size_Kb
FROM sys.master_files
WHERE type_desc = 'ROWS' AND DB_NAME(database_id) = '[DB_NAME]'
ORDER BY DB_NAME(database_id)

Nice! It went back all the way from 27+Gb to slightly less than 13Gb. Almost twice from compressing data in a biggest single table in the database.

For development, testing and QA environments, this can really make the change when it comes down to the aspect of available storage.

To conclude.

In my strong opinion, using data compression should become a standard practice for use with any landing / staging area. A “must-use”,  as it would save you lots and lots of your storage for more databases, more projects or just other purposes, while its downside, such as performance degradation risks, are minimum if data compression is applied carefully enough and with a thought. As simple as that.

Advertisements
  1. Ben
    June 13, 2013 at 14:14

    Keep in mind that increased processor utilization does not necessarily mean slower. In fact most of the time compression leads to faster performance, despite the added processor utilization. It’s important to remember that I/O (reading from and writing to disk) is generally the slowest part of any process. And not just a little bit slower. It’s a full order of magnitude slower. So unless your processor is already maxed out, adding 5%-10% extra utilization actually speeds things up. You processor is the fastest part. By reading and writing 50% less data, you should be moving much faster.

  1. October 17, 2012 at 13:54

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: