Yet another SQL Server enthusiast

2013-08-02

Choosing the right data type

Filed under: data type,index,indexed view,sql server — admin @ 5:18 PM

Overview

Column’s data type influences the storage on disk (less important because of low price) and in memory (buffer pool) of table (clustered index or heap) but it affects, also, the storage of others objects, an aspect which is sometimes less obvious. Why is so important this aspect ? More storage space on disk means, also, many more 8K pages on disk and in memory (buffer pool), which means more LIO (logical reads) and more PIO (physical reads). Bellow diagram shows column’s dependencies on three levels: current object (table or indexed view), child tables (tables which have foreign key pointing towards current table.column) and indexed views. The main purpose of this diagram is to show how column’s type could influence the storage size of others objects (clustered and non-clustered indexes, child tables, indexed views).

Diagram01

Note #1: The expression “column is duplicated” means that the column’s values as a whole or as part (see full text indexes) are duplicated within another column (usually this means, also, another object) using the same data type or using a compatible data type (see full text indexes defined on varbinary(max) columns).

Note #2: Term “object” refers to a table or to an indexed view.

Note #3: All examples use AdventureWorks2008R2  database.

If an object (a table or an indexed view) has a clustered index then all non-clustered indexes will include the clustered index’s key.  An example is the clustered primary key SalesOrderID defined on Sales.SalesOrderHeader and a non-clustered index which is defined on SalesPersonID column. This means that following indexes are equivalent because SalesOrderID is clustered index’s key:

Code Snippet
CREATE INDEX ix1_v1
ON Sales.SalesOrderHeader(SalesPersonID);

CREATE INDEX ix1_v2
ON Sales.SalesOrderHeader(SalesPersonID)INCLUDE(SalesOrderID);

An UNIQUE or a PRIMARY KEY constraint will be enforced using an unique index but an unique index could exists without one of these constraints. Data type (including precision, scale, max. length) and collation for every column (child table) which is included in a foreign key constraint  and should be identical with those columns which are included in referenced unique index’s key (parent table).  An example:

Code Snippet
ALTER TABLE Purchasing.PurchaseOrderHeader
ADD CONSTRAINT FK_PurchaseOrderHeader_ShipMethod_ShipMethodID
FOREIGN KEY(ShipMethodID)
REFERENCES Purchasing.ShipMethod (ShipMethodID)

In this case ShipMethodID columns from [Purchasing].[PurchaseOrderHeader] (child table) and [Purchasing].[ShipMethod] (parent table) must have the same data type (which is INT).

Note #4: A foreign key constraint itself does not requires storage space except for metadata.

If the clustered index is table’s primary key then the clustered key will be duplicated in every primary and secondary XML (FOR PROPERTY, FOR VALUE, FOR PATH) index and, also, in every spatial index. This happens because these type of indexes requires an clustered primary key.

Full-text indexes requires a single column unique index:

Code Snippet
CREATE FULLTEXT INDEX ON table_name [({column_name…}[,…n])]
KEY INDEX index_name

If a regular column is part of index’s key (ex. MyColumn1) or is included as covered column (ex. MyColumn2)

Code Snippet
CREATE [UNIQUE] INDEX ix_name ON MySchema.MyTable(…MyColum1…)
INCLUDE [(…MyColumn2…)]

then the values of that column are “duplicated” within that index. 

A indexed view must have a clustered unique index but, also, it could have non-clustered indexes. These are other opportunities for SQL Server to “duplicate” the content of those columns.

Test

The following test shows how changing the data type for Purchasing.ShipMethod.ShipMethodID column (which is a clustered primary key) from INT to TINYINT, SMALLINT and DECIMAL(18,0) influences the size of others objects. This table has only five rows. The following objects are referenced by this column:

No. Object Rows
1 PK_ShipMethod_ShipMethodID 5
2 PK_PurchaseOrderHeader_PurchaseOrderID 4012
3 PK_SalesOrderHeader_SalesOrderID 31465
4 IX_SalesOrderHeader_ShipMethodID__OnlineOrderFlag_OrderDate* 31465
5 PK_vOrdersWithShipMethod_SalesOrderID* 31465
6 IX_vOrdersWithShipMethod_ShipMethodID_ShipDate* 31465

Note #5: * New objects.

Note #6: All PK_ objects are clustered unique indexes. All IX_ objects are non-clustered indexes.

Note #7: PurchaseOrderHeader and SalesOrderHeader tables include a ShipMethodID column (FK).

Test Methodology

For every data type (TINYINT, SMALLINT, INT, BIGINT and DECIMAL(18,0)) :

1. I deleted (DROP) some objects that reference ShipMethodID columns.

2. I changed the data type for the following columns: Purchasing.ShipMethod.ShipMethodID, Purchasing.PurchaseOrderHeader.ShipMethodID and Sales.SalesOrderHeader.ShipMethodID

3. I recreated objects.

4. Rebuilding indexes.

5. Page count.

Download script

Results

Results

Remark

Simply changing the data type of Purchasing.ShipMethod.ShipMethodID column from INT to TINYINT reduces the number of pages for all referenced objects in this test with 53 pages. This means that for 3 bytes (INT – TINYINT) x 5 rows = 15 bytes difference just for PK_ShipMethod_ShipMethodID clustered primary key the real impact of this change is 53 pages x 8K = 424 KB = 434176 bytes.

15 bytes vs. 434176 bytes

Revision History:

2013-08-18: Updated [Download script] link

Blog at WordPress.com.