Archive

Archive for September, 2012

Just another example of a bad reply and the real value of professional certification

September 28, 2012 Leave a comment

I don’t like forum or chat-like flame wars of any sort. As a matter of fact I feel disgusted by towards those people participating in this kind of activity. Besides, there are no real “winners” and “losers” in this kind of activity – both parties prove themselves to be morons.  But sometimes… Here is the good example. It’s not really a flame war, just looks like one. And one side is totally right. Here I’m completely on Tara Kizer’s side.

Please, do not be like that guy. Neither in participating in forums nor in life, as much as you can. Bad example indeed. And hey, here is “…MCP, MCTS” Another prove of what all that so called “professional certification” worth. Such a scam…

Advertisements
Categories: Uncategorized

Data Compression (SQL Server 2008, 2008R2 Enterprise Edition)

September 24, 2012 2 comments

Data Compression is “new” feature, introduced in SQL Server 2008 and carried onward to SQL Server 2008R2. It’s available only in Enterprise Edition, though.

There are tonnes of posts published on the web, so I will continue following up my old self-imposed policy: no unnecessary repetition of already well-thought and well-written stuff.

These posts / articles explain SQL Server 2008 Data Compression (one of it has some word about Backup Compression as well):

By going through these articles, you should have more or less comprehensive grasp over what Data Compression in SQL Server 2008/2008R2 is.

The only thing which neither above article mentions is why and when would you want to consider the data compression and what for. Here it is. Long story short, as you’ve figured it out by now, applying data compression you’re saving on storage space but losing on CPU and thus the processing speed, i.e. performance. So, which scenarios make this combination of “storage space winning vs performance loosing” ideal for data compression implementation? And the answer is:

  • landing area – no “procs and cons”, i.e. just go with it
  • staging area – the same as above
  • reporting database (or simply put – a data warehouse)

Well, in case with reporting database, it’s not as straight forward as with landing and staging. Instead, it’s a game of requirements and characteristics: how many people would be using it simultaneously, the complexity of reports, if there are going to be some “heavy” reports, which tables or views (which are tables at the bottom lines anyway) they are going to be selecting from, online vs batch reports, etc. Here, there is no a single universal recipe, yet after careful examination and giving it enough thought, you come up with data compression plan which, even not comprehensive, i.e. doesn’t include all the tables, can still save you tonnes of Gb of storage with either no or very limited impact on performance and execution timings respectively.

The scenarios are clear, but you might still wonder, why to use data compression, what are the reasons which all over a sudden would make you do something with your database, which you might probably have never done before – compress data, which goes along with the risk of performance degradation. Well, here it is.

Although disk storage is getting cheaper and cheaper nowadays, don’t count on this very much. Unfortunately, there are good number of factors, other than just financial, standing on the way of obtaining (read “buying”) more storage when needed. I don’t want to go that route, “don’t get me started on this” kind of thing, you will have to take my word for it (or not). No matter how ridiculous it may sound, this is what it is more often than you probably think or encountered so far in your career. And this is exactly where SQL Server Data Compression comes in not only handy, but sometimes as a savior, which makes all the difference between “yes, we can start with it now” and “no, we can’t do it until new storage is acquired and in place”. You may think “what the heck, it’s not my job, as a database administrator, to care about storage, purchases and other financial or you name it matters”. Technically you’re right, – it is not. But in real life, the game called “blame it all on DBA” is always going on to one or another extent. Don’t ask me why, please, because I would love to know the answer to this question myself. But this is what it is, one of those real life realities which you would never hear sitting at your desk in school, until you get out and start rolling. So, I do encourage you to use Data Compression whenever you might find it applicable. It does stand out from the rest of new features of SQL Server 2008. I’m sure we all love backup compression, and data compression, apparently, is falling a way back after the first one’s popularity and wide acceptance. That’s where it’s wrong. It’s not less important and useful than backup compression. Not less at all.

Now let’s go over some demo. The numbers would speak by themselves and I hope they would surprise you as much as they surprised me. If your wondering-degree is even half on mine, that too would be an awesome thing!

Let’s see the example. I do have a non-compressed table, which has only some of the records pooled from one of Oracle BI systems using SSMS’ Data Import.  So, the table is HEAP, no NIXes (non-clustered indexes), has 18043702  records, “weights” roughly 16Gb, which makes an average record length of roughly 913 bytes.

Now let’s get our estimates of how much space it would take once compressed

  • ROW Compression:

EXECUTE sp_estimate_data_compression_savings 'DATAW', 'TRAN_STATS_SRI_DLY', NULL, NULL, 'ROW';

  • PAGE Compression:

EXECUTE sp_estimate_data_compression_savings 'DATAW', 'TRAN_STATS_SRI_DLY', NULL, NULL, 'PAGE';

PAGE compression is usually more efficient than ROW compression because it has prefix and dictionary types of compression on top of what ROW compression does, and in general, IMHO, should be the default data compression method to go with. Here, it’s not much different, although still saves around 230Mb of space, which accounts to 16% of final ROW-compressed table size. The ratio of non-compressed to page-compressed is 13(!)

This is quite something – after the PAGE compression applied, the table will take 13 times less space from its original size. Quite something, isn’t it?

(this script allows to check all the tables sizes and their respective compression estimates)

Let’s do it now!

Let’s first look at the space taken by the database single datafile:

SELECT DB_NAME(database_id) AS database_name
, name AS logical_DATAFILE_name
, physical_name AS physical_DATAFILE_name
, size/128 AS size_Mb
, max_size/128 AS max_size_Mb
, ROUND(size*8.0,0) AS size_Kb
, ROUND(max_size*8.0,0) AS max_size_Kb
FROM sys.master_files
WHERE type_desc = 'ROWS' AND DB_NAME(database_id) = '[DB_NAME]'
ORDER BY DB_NAME(database_id)

This is, not obviously the single table in entire database but it is the largest one – alone taking 16 out of 27 gigabytes. Now, let’s run the PAGE compression.

ALTER TABLE DATAW.TRAN_STATS_SRI_DLY rebuild partition = ALL WITH (DATA_COMPRESSION = PAGE);

wait till it completes and query the table size now:

Vouala, 1Gb! But that’s not everything. Now we have to reclaim all that free space from the datafile by shrinking it:

DBCC SHRINKFILE(N'[DATAFILE_LOGICAL_FILENAME]’, 1);

(and give it some time to complete)

Now, let’s see what the datafile size is:
SELECT DB_NAME(database_id) AS database_name
, name AS logical_DATAFILE_name
, physical_name AS physical_DATAFILE_name
, size/128 AS size_Mb
, max_size/128 AS max_size_Mb
, ROUND(size*8.0,0) AS size_Kb
, ROUND(max_size*8.0,0) AS max_size_Kb
FROM sys.master_files
WHERE type_desc = 'ROWS' AND DB_NAME(database_id) = '[DB_NAME]'
ORDER BY DB_NAME(database_id)

Nice! It went back all the way from 27+Gb to slightly less than 13Gb. Almost twice from compressing data in a biggest single table in the database.

For development, testing and QA environments, this can really make the change when it comes down to the aspect of available storage.

To conclude.

In my strong opinion, using data compression should become a standard practice for use with any landing / staging area. A “must-use”,  as it would save you lots and lots of your storage for more databases, more projects or just other purposes, while its downside, such as performance degradation risks, are minimum if data compression is applied carefully enough and with a thought. As simple as that.

Check all database tables sizes and their Data Compression estimates

September 24, 2012 1 comment

This script comes very useful to run before making any decision and taking actions on Data Compression

DECLARE @tv_table_size TABLE (table_name SYSNAME
, num_rows INT
, reserved_size VARCHAR(20)
,
data_size VARCHAR(20)
,
index_size VARCHAR(20)
,
unused VARCHAR(20)
);
DECLARE @tv_compression_estimate_PAGE TABLE ([object_name] SYSNAME
, [schema_name] SYSNAME
, index_id tinyint
, partition_number tinyint
, size_with_current_compression_setting_KB INT
, size_with_requested_compression_setting_KB INT
, sample_size_with_current_compression_setting_KB INT
, sample_size_with_requested_compression_setting_KB INT
);
DECLARE @tv_compression_estimate_ROW TABLE ([object_name] SYSNAME
, [schema_name] SYSNAME
, index_id tinyint
, partition_number tinyint
, size_with_current_compression_setting_KB INT
, size_with_requested_compression_setting_KB INT
, sample_size_with_current_compression_setting_KB INT
, sample_size_with_requested_compression_setting_KB INT
);
DECLARE @v_sql NVARCHAR(500);
DECLARE cur CURSOR FOR
SELECT
s.name, t.name
FROM sys.schemas s INNER JOIN sys.tables t
ON s.schema_id = t.schema_id
ORDER BY s.name, t.name
DECLARE @cv_schema_name SYSNAME, @cv_table_name SYSNAME
OPEN
cur
FETCH cur INTO @cv_schema_name, @cv_table_name
WHILE @@FETCH_STATUS = 0
BEGIN
SET
@v_sql = @cv_schema_name + '.' + @cv_table_name;
INSERT INTO @tv_table_size EXEC sp_spaceused @v_sql;

SET @v_sql = N’execute sp_estimate_data_compression_savings ”’ + @cv_schema_name + ”’, ”’ + @cv_table_name + ”’, NULL, NULL, ”PAGE”’;
INSERT INTO @tv_compression_estimate_PAGE EXEC sp_executesql @v_sql

SET @v_sql = N’execute sp_estimate_data_compression_savings ”’ + @cv_schema_name + ”’, ”’ + @cv_table_name + ”’, NULL, NULL, ”ROW”’;
INSERT INTO @tv_compression_estimate_ROW EXEC sp_executesql @v_sql
FETCH cur INTO @cv_schema_name, @cv_table_name
END
CLOSE
cur
DEALLOCATE cur
SELECT s.name AS [schema]
, t.name AS table_name
, CONVERT(INT, REPLACE(data_size, ‘ KB’, ))/1024 AS table_size_Mb
—— PAGE ——–
, cep.size_with_requested_compression_setting_KB/1024 AS PAGE_compression_size_Mb
, (cep.size_with_requested_compression_setting_KB/1024) * 100
/
(
CASE CONVERT(INT, REPLACE(data_size, ‘ KB’, ))/1024
WHEN 0
THEN 1
ELSE CONVERT(INT, REPLACE(data_size, ‘ KB’, ))/1024
END) AS ratio_from_original_size_PAGE_pct
, 100 – (cep.size_with_requested_compression_setting_KB/1024) * 100
/
(
CASE CONVERT(INT, REPLACE(data_size, ‘ KB’, ))/1024
WHEN 0
THEN 1
ELSE CONVERT(INT, REPLACE(data_size, ‘ KB’, ))/1024
END) AS space_saving_PAGE_pct
——————-
—— ROW ——–
, cer.size_with_requested_compression_setting_KB/1024 AS ROW_compression_size_Mb
, (cer.size_with_requested_compression_setting_KB/1024) * 100
/
(
CASE CONVERT(INT, REPLACE(data_size, ‘ KB’, ))/1024
WHEN 0
THEN 1
ELSE CONVERT(INT, REPLACE(data_size, ‘ KB’, ))/1024
END) AS ratio_from_original_size_ROW_pct
, 100 – (cer.size_with_requested_compression_setting_KB/1024) * 100
/
(
CASE CONVERT(INT, REPLACE(data_size, ‘ KB’, ))/1024
WHEN 0
THEN 1
ELSE CONVERT(INT, REPLACE(data_size, ‘ KB’, ))/1024
END) AS space_saving_ROW_pct
——————-
FROM @tv_table_size tv
INNER JOIN sys.tables t ON tv.table_name = t.name
INNER JOIN sys.schemas s ON s.schema_id = t.schema_id
INNER JOIN @tv_compression_estimate_PAGE cep ON s.name = cep.schema_name AND t.name = cep.OBJECT_NAME
INNER JOIN @tv_compression_estimate_ROW cer ON s.name = cer.schema_name AND t.name = cer.OBJECT_NAME
ORDER BY CONVERT(INT, REPLACE(data_size, ‘ KB’, )) DESC, s.name, t.name

Find table sizes in all databases

September 24, 2012 Leave a comment

Getting table sizes, row numbers and average row size per table

--sorted by table size, schema, table
DECLARE @tv_table_size TABLE (table_name SYSNAME
, num_rows INT
, reserved_size VARCHAR(20)
,
data_size VARCHAR(20)
,
index_size VARCHAR(20)
,
unused VARCHAR(20)
);
DECLARE @v_sql VARCHAR(500);
DECLARE cur CURSOR FOR
SELECT
s.name, t.name
FROM sys.schemas s INNER JOIN sys.tables t
ON s.schema_id = t.schema_id
ORDER BY s.name, t.name
DECLARE @cv_schema_name SYSNAME, @cv_table_name SYSNAME
OPEN
cur
FETCH cur INTO @cv_schema_name, @cv_table_name
WHILE @@FETCH_STATUS = 0
BEGIN
SET
@v_sql = @cv_schema_name + '.' + @cv_table_name
INSERT INTO @tv_table_size EXEC sp_spaceused @v_sql
FETCH cur INTO @cv_schema_name, @cv_table_name
END
CLOSE
cur
DEALLOCATE cur;

SELECT s.name AS [schema]
, t.name AS [table_name]
, CONVERT(INT, REPLACE(tv.data_size, ' KB', '')) AS [table_size_Kb]
, tv.num_rows AS [number_rows]
, CASE
WHEN tv.num_rows <> 0 THEN ROUND(CONVERT(FLOAT, REPLACE(tv.data_size, ' KB', '')) * 1024 / tv.num_rows,2)
END AS [avg_row_size_Bytes]
, REPLACE(tv.index_size, ' KB', '') AS [index_size_Kb]
, REPLACE(tv.reserved_size, ' KB', '') AS [reserved_size_Kb]
, REPLACE(tv.unused, ' KB', '') AS [unused_space_Kb]
FROM @tv_table_size tv
INNER JOIN sys.tables t ON tv.table_name = t.name
INNER JOIN sys.schemas s ON s.schema_id = t.schema_id
ORDER BY [table_size_Kb] DESC, s.name, t.name;

To get a detailed data, replace the above select with this one:

SELECT *
FROM @tv_table_size tv
INNER JOIN sys.tables t ON tv.table_name = t.name
INNER JOIN sys.schemas s ON s.schema_id = t.schema_id
ORDER BY CONVERT(INT, REPLACE(data_size, ' KB', '')) DESC, s.name, t.name;

And another way – by using undocumented sp_msforeachtable SP

(This is a good article by Gregory A. Larsen in Database Journal: SQL Server Undocumented Stored Procedures sp_MSforeachtable and sp_MSforeachdb)

DECLARE @tv_table_size TABLE (table_name SYSNAME
, num_rows INT
, reserved_size VARCHAR(20)
,
data_size VARCHAR(20)
,
index_size VARCHAR(20)
,
unused VARCHAR(20)
);
INSERT INTO @tv_table_size EXEC sp_MSforeachtable @command1="EXEC sp_spaceused '?'";
SELECT * FROM @tv_table_size ORDER BY data_size DESC, table_name;