Welcome to the World of SQLServer.........: Data Compression: What, types of compression and how to configure...

Data Compression:

Data Compression is a new feature introduced in SQL2008, this method compress database objects table/cluster-nonclusterIndex/Indexed View/Partitioned Tables in sizable manner, selection of compression will definitely improves the performance.

Let us dig in to the deeper like…

How does it work….

How many types of methodologies…..

Would this impact performance of my server……

How does it work?

Data compression is entirely handled under the covers by the SQL Server Storage Engine. When data is passed to the Storage Engine, it is compressed and stored in the designated compressed format (on disk and in the Buffer Cache). When the Storage Engine passes the information to another component of SQL Server, then the Storage Engine has to uncompress it. In
other words, every time data has to be passed to or from the Storage Engine, it has to be compressed or uncompressed. While this does take extra CPU overhead to accomplish, in many cases, the amount of disk I/O saved by compression more than makes up for the CPU costs, boosting the overall performance of SQL Server

Let’s say that we want to update a row in a table, and that the row we want to update is currently stored on disk in a table that is using row-level data compression. When we execute the UPDATE statement, the Relational Engine (Query Processor) parses, compiles, and optimizes the UPDATE statement, ready to execute it. Before the statement can be executed, the Relational Engine needs the row of data that is currently stored on disk in the compressed format, so the Relational Engine requests the data by asking the Storage Engine to go get it. The Storage Engine (with the help of the SQLOS) goes and gets the compressed data from disk and brings it into the Buffer Cache, where the data continues to remain in its compressed format.

Once the data is in the Buffer Cache, the row is handed off to the Relational Engine from the Storage Engine. During this pass off, the compressed row is uncompressed and given to the Relational Engine to UPDATE. Once the row has been updated, it is then passed back to the Storage Engine, where is it again compressed and stored in the Buffer Cache. At some point, the row will be flushed to disk, where it is stored on disk in its compressed format.

Data compression offers many benefits. Besides the obvious one of reducing the amount of physical disk space required to store data—and the disk I/O needed to write and read it—it also reduces the amount of Buffer Cache memory needed to store data in the Buffer Cache. This in turn allows more data to be stored in the Buffer Cache, reducing the need for SQL Server to access the disk to get data, as the data is now more likely to be in memory than disk, further reducing disk I/O.

Just as data compression offers benefits, so it has some disadvantages. Using compression uses up additional CPU cycles. If your server has plenty to spare, then you have no problem. But if your server is already experiencing a CPU bottleneck, then perhaps compression is better left turned off.

How many types of methodologies:

Ø Row-level Data Compression: Row-level data compression is essentially turning fixed length data types into variable length data types, freeing up empty space. It also has the ability to ignore zero and null values, saving additional space. In turn, more rows can fit into a single data page.

o Reducing the amount of metadata used to store a row.

o Storing fixed length numeric data types as if they were variable-length data types. For example, if you store the value 1 in a bigint data type, storage will only take 1 byte, not 8 bytes, which the bigint data types normally takes.

o Storing CHAR data types as variable-length data types. For example, if you have a CHAR (100) data type, and only store 10 characters in it, blank characters are not stored, thus reducing the space needed to the store data.

o Not storing NULL or 0 values

o Row-level data compression offers less compression than page-level data compression, but it also incurs less overhead, reducing the amount of CPU resources required to implement it.

Ø Page-level Data Compression: Page-level data compression starts with row-level data compression, then adds two additional compression features: prefix and dictionary compression. We will take a look at what this means a little later in this chapter. As you can imagine, page-level compression offers increased data compression over row-level compression alone

Page-level data compression offers greater compression, but at the expense of greater CPU utilization. It works using these techniques:

It starts out by using row-level data compression to get as many rows as it can on a single page.
Next, prefix compression is run. Essentially, repeating patterns of data at the beginning of the values of a given column are removed and substituted with an abbreviated reference that is stored in the compression information (CI) structure that immediately follows the page header of a data page.
And last, dictionary compression is used. Dictionary compression searches for repeated values anywhere on a page and stores them in the CI. One of the major differences between prefix and dictionary compression is that prefix compression is restricted to one column, while dictionary compression works anywhere on a data page.
The amount of compression provided by page-level data compression is highly dependent on the data stored in a table or index. If a lot of the data repeats itself, then compression is more efficient. If the data is more random, then little benefits can be gained using page-level compression

Implementation:

Data compression can be performed using either SQL Server Management Studio (SSMS) or by using Transact-SQL.

First step is to perform estimation,

Out of the box there are two ways to estimate compression. I suspect the both use the procedure sp_estimate_data_compression_savings.

sp_estimate_data_compression_savings

[ @schema_name = ] 'schema_name'

, [ @object_name = ] 'object_name'

, [@index_id = ] index_id

, [@partition_number = ] partition_number

, [@data_compression = ] 'data_compression'

[;]

The following example estimates the size of the start.WorkOrder table if it is compressed by using ROW compression.

USE ABC;

EXEC sp_estimate_data_compression_savings 'start', 'WorkOrder', NULL, NULL, 'ROW' ;

Management studio can do an estimation. Right click on the table, choose Storage and Manage Compression:

Then choose the compression you are interested in and click calculate.

There is also a tool on Codeprex that will estimate all tables in a database. This is handier then a table by table estimate.

I just tried this tool. The estimation worked great. There isn’t support for these tools though.

This page is helpful. http://ssce.codeplex.com/documentation

http://msdn.microsoft.com/en-us/library/cc280574.aspx

SQL Server Compression Estimator http://ssce.codeplex.com/

Welcome to the World of SQLServer.........

Saturday, November 10, 2012

Data Compression: What, types of compression and how to configure...

No comments:

Post a Comment