Yet another SQL Server enthusiast

2015-03-01

NonSARGable predicates #2: about nullable columns, ISNULL, IS NULL and COALESCE

Filed under: index,optimization,predicate,SARG,scalar,sql server 2012 — sa @ 12:37 AM

This post discuss how query complexity can be reduced and query performance can be improved by simply using mandatory columns (NOT NULL) instead of optional (nullable) columns. Also, no query change is needed.

First, let’s create a table to store all changes of products prices:

Code Snippet
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE TABLE dbo.PriceHistory(
    ID            INT IDENTITY(1,1) NOT NULL
    CONSTRAINT    PK_PriceHistory_ID PRIMARY KEY (ID),
    ProductID    INT NOT NULL, — Foreign key
    Price        NUMERIC(9,2) NOT NULL,
    StartDate    DATETIME NULL, — nullable / optional column
    EndDate        DATETIME NULL — nullable / optional column
);
GO
CREATE    INDEX IX_PriceHistory_StartDate
ON        dbo.PriceHistory(StartDate)
INCLUDE    (EndDate)
GO

Our query should compute how many prices are active now. I’ve used three different methods to write conditions for row filtering and none of them follow the rules for SARG-able predicates ([1], [2]):

Code Snippet
DECLARE    @d DATETIME = GETDATE()

SELECT    COUNT(*)
FROM    dbo.PriceHistory d
WHERE    ISNULL(d.StartDate, @d) <= @d
AND        @d <= ISNULL(d.EndDate, @d)

SELECT    COUNT(*)
FROM    dbo.PriceHistory d
WHERE    (d.StartDate <= @d OR d.StartDate IS NULL)
AND        (@d <= d.EndDate OR d.EndDate IS NULL)

SELECT    COUNT(*)
FROM    dbo.PriceHistory d
WHERE    COALESCE(d.StartDate, @d) <= @d
AND        @d <= COALESCE(d.EndDate, @d)

For these queries the execution plans include an Index Scan which is the expected data access operator because all conditions are non SARG-able:

Nullable.Columns.Query.Execution.Plan

Second, we change StartDate and EndDate in order to be mandatory columns (NOT NULL) thus:

Code Snippet
DROP INDEX IX_PriceHistory_StartDate ON dbo.PriceHistory
GO
UPDATE    dbo.PriceHistory
SET        StartDate    = CASE WHEN StartDate IS NULL THEN {d '1753-01-01'} ELSE StartDate END,
        EndDate        = CASE WHEN EndDate IS NULL THEN {ts '9999-12-31 23:59:59.997'} ELSE EndDate END
        — 1753-01-01 and 9999-12-31 23:59:59.997
        — represents the minimum and maximum value for DATETIME data type
WHERE    StartDate IS NULL
OR        EndDate    IS NULL
GO
ALTER TABLE dbo.PriceHistory ALTER COLUMN StartDate DATETIME NOT NULL
ALTER TABLE dbo.PriceHistory ALTER COLUMN EndDate DATETIME NOT NULL
GO
CREATE    INDEX IX_PriceHistory_StartDate
ON        dbo.PriceHistory(StartDate)
INCLUDE    (EndDate)
GO

and then re-run the same queries. Second time, we get next execution plans:

Mandatory.Columns.Query.Execution.Plan

This time, for the first two queries, the execution plans include the Index Seek operator even though the conditions (ISNULL / OR IS NULL) aren’t SARG-able. Examining the properties of Index Seek operator for Query 1 and Query 2 reveals the SQL Server rewrote the filtering conditions (Query 1: @d <= ISNULL(d.EndDate, @d), Query 2: (d.StartDate <= @d OR d.StartDate IS NULL) AND (@d <= d.EndDate OR d.EndDate IS NULL)) into something much simple because StartDate and EndDate are mandatory columns (doesn’t allow NULLs). For example, the condition for Query 1 becomes StartDate >= @d AND EndDate <= @d:

Index.Seek Also, we can see that for the third query the condition (COALESCE) wasn’t changed and the execution plan includes Index Scan.

TLTR

When next predicates

  • ISNULL(Column, @variable) {=|>|>=|<|<=} @variable 
  • Column  {=|>|>=|<|<=} @variable OR Column IS NULL

are defined on null-able indexed columns SQL Server uses a Scan operators for data access. When nullability is changed to NOT NULL then SQL Server can choose a Seek without any change in source code. For example changing the source code from ISNULL(Column, @variable) <= @variable to Column <= @variable is not needed. I have observed this behavior on SQL Server 2008 and SQL Server 2012.

Advertisements

2014-01-13

Conditional joins

Let’s start with following example (based on Supertypes and Subtypes modeling approach) which creates three tables:

  1. dbo.BankAccountOwner (BankAccountOwnerID-PK, OwnerType, CreateDate)
  2. dbo.Person (BankAccountOwnerID – PK, FK, FirstName, LastName)
  3. dbo.Company (BankAccountOwnerID – PK, FK, CompanyName)

T-SQL Script:

Code Snippet
CREATE TABLE dbo.BankAccountOwner (
    BankAccountOwnerID INT PRIMARY KEY,
    OwnerType CHAR(1) NOT NULL CHECK( OwnerType IN ('P', 'C') ), — P=Person, C=Company
    CreateDate DATETIME NOT NULL DEFAULT (GETDATE())
);
CREATE TABLE dbo.Person (
    BankAccountOwnerID INT PRIMARY KEY
        REFERENCES dbo.BankAccountOwner(BankAccountOwnerID),
    FirstName NVARCHAR(50) NOT NULL,
    LastName NVARCHAR(50) NOT NULL
);
CREATE TABLE dbo.Company (
    BankAccountOwnerID INT PRIMARY KEY
        REFERENCES dbo.BankAccountOwner(BankAccountOwnerID),
    CompanyName NVARCHAR(100) NOT NULL
);

INSERT    dbo.BankAccountOwner (BankAccountOwnerID, OwnerType)
VALUES    (1, 'P'), (2, 'P'), (3, 'C'), (4, 'C'), (5, 'C');
INSERT    dbo.Person (BankAccountOwnerID, FirstName, LastName)
VALUES    (1, N'John', N'Doe'), (2, N'Mary', N'Doe');
INSERT    dbo.Company (BankAccountOwnerID, CompanyName)
VALUES (3, N'MyComputer'), (4, N'Control Panel'), (5, N'Device Manager');
GO

Problem: how can we get FirstName, LastName and CompanyName values for the following owners: 1, 2, 3, 4, 5 ?

First solution:

Code Snippet
SET NOCOUNT ON;
SET STATISTICS IO ON;
PRINT 'Test #1';
SELECT    bao.*, p.FirstName, p.LastName, c.CompanyName
FROM    dbo.BankAccountOwner bao
LEFT JOIN dbo.Person p ON bao.BankAccountOwnerID = p.BankAccountOwnerID
LEFT JOIN dbo.Company c ON bao.BankAccountOwnerID = c.BankAccountOwnerID
WHERE    bao.BankAccountOwnerID IN (1,2,3,4,5);

My solution:

Because

  1. dbo.Person table contains only rows with OwnerType = ‘P’ and dbo.Company table contains only rows with OwnerType = ‘C’ and
  2. SQL Server doesn’t knows this

I added these predicates to every left join thus:

Code Snippet
PRINT 'Test #2';
SELECT    bao.*, p.FirstName, p.LastName, c.CompanyName
FROM    dbo.BankAccountOwner bao
LEFT JOIN dbo.Person p ON bao.OwnerType = 'P' AND bao.BankAccountOwnerID = p.BankAccountOwnerID
LEFT JOIN dbo.Company c ON bao.OwnerType = 'C' AND bao.BankAccountOwnerID = c.BankAccountOwnerID
WHERE    bao.BankAccountOwnerID IN (1,2,3,4,5);

These are the execution plans:

Conditional.Joins.Execution.Plans

First solution: as you can see from the properties of Index Seek operators SQL Server will try to find every bank account owner (1, 2, …, 5) within dbo.Person table and within dbo.Company table (Number of Execution  = 5).

Second solution: this time because SQL Server knows that dbo.Person table contains only OwnerType = ‘P’ rows it will read from dbo.Person table using an Index Seek (on Person.PK_Person_…) just two times (Number of Executions = 2) because there are only two persons among those five owners. This was possible because of Filter operator which will verify before executing Index Seek if the current owner is ‘P’:

Filter.Operator

Also, SQL Server will read from dbo.Company table using an Index Seek (on Company.PK_Company_…) three times (Number of Executions = 3) because there are three companies.

Because we have reduced the number of execution for Index Seek operators, this is means that we have reduced also the number of logical reads as we can see from the output of STATISTICS IO ON:

Code Snippet
Test #1
Table 'Company'. Scan count 0, logical reads 10
Table 'Person'. Scan count 0, logical reads 10
Table 'BankAccountOwner'. Scan count 5, logical reads 10
Test #2
Table 'Company'. Scan count 0, logical reads 6
Table 'Person'. Scan count 0, logical reads 4
Table 'BankAccountOwner'. Scan count 5, logical reads 10

2013-12-09

Non SARG-able predicates #1

Sometimes isn’t enough to create an index. to optimize a query. Even we create the proper index we may discover by examining the execution plan that SQL Server doesn’t use that an Index Seek to find the rows  using instead an Index Scan.

For example, if we want to count the number of sales order for December 2005 a first attempt can use following query and an index on OrderDate:

Code Snippet
CREATE INDEX IX_SalesOrderHeader_OrderDate
ON Sales.SalesOrderHeader (OrderDate);
GO
SET STATISTICS IO ON;
DBCC FREEPROCCACHE;
GO
SELECT COUNT(*)
FROM Sales.SalesOrderHeader h
WHERE YEAR(h.OrderDate) = 2005 AND MONTH(h.OrderDate) = 12
GO
/*
SET STATISTICS IO ON output:
Table 'SalesOrderHeader'. Scan count 1, logical reads 73
*/

Execution.Plan.Index.Scan

We can see that for this query the execution plan includes an Index Scan instead of Index Seek. The reason is these two predicates aren’t SARG-able. A SARG-able predicates allows an Index Seek if an index is available.

The second attempt uses a BETWEEN predicate (which is SARG-able) thus:

Code Snippet
DBCC FREEPROCCACHE;
GO
SELECT    COUNT(*)
FROM    Sales.SalesOrderHeader h
WHERE    h.OrderDate BETWEEN '20051201' AND '20051231 23:59:59.997'
— or WHERE    h.OrderDate >= '20051201' AND h.OrderDate < '20060101'
GO
/*
SET STATISTICS IO ON output:
Table 'SalesOrderHeader'. Scan count 1, logical reads 3
*/

Execution.Plan.Index.Seek

This time the execution plan includes an Index Seek and the performance (Logical Reads) is better: just 3 pages / logical reads (Seek) instead of 73 pages logical reads (Index Scan).

More information on SARG-able predicates can be found here.

Note: For this test I used Adventure Works for SQL Server 2012 (data file) database

Create a free website or blog at WordPress.com.