Yet another SQL Server enthusiast

2015-03-01

NonSARGable predicates #2: about nullable columns, ISNULL, IS NULL and COALESCE

Filed under: index,optimization,predicate,SARG,scalar,sql server 2012 — sa @ 12:37 AM

This post discuss how query complexity can be reduced and query performance can be improved by simply using mandatory columns (NOT NULL) instead of optional (nullable) columns. Also, no query change is needed.

First, let’s create a table to store all changes of products prices:

Code Snippet
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE TABLE dbo.PriceHistory(
    ID            INT IDENTITY(1,1) NOT NULL
    CONSTRAINT    PK_PriceHistory_ID PRIMARY KEY (ID),
    ProductID    INT NOT NULL, — Foreign key
    Price        NUMERIC(9,2) NOT NULL,
    StartDate    DATETIME NULL, — nullable / optional column
    EndDate        DATETIME NULL — nullable / optional column
);
GO
CREATE    INDEX IX_PriceHistory_StartDate
ON        dbo.PriceHistory(StartDate)
INCLUDE    (EndDate)
GO

Our query should compute how many prices are active now. I’ve used three different methods to write conditions for row filtering and none of them follow the rules for SARG-able predicates ([1], [2]):

Code Snippet
DECLARE    @d DATETIME = GETDATE()

SELECT    COUNT(*)
FROM    dbo.PriceHistory d
WHERE    ISNULL(d.StartDate, @d) <= @d
AND        @d <= ISNULL(d.EndDate, @d)

SELECT    COUNT(*)
FROM    dbo.PriceHistory d
WHERE    (d.StartDate <= @d OR d.StartDate IS NULL)
AND        (@d <= d.EndDate OR d.EndDate IS NULL)

SELECT    COUNT(*)
FROM    dbo.PriceHistory d
WHERE    COALESCE(d.StartDate, @d) <= @d
AND        @d <= COALESCE(d.EndDate, @d)

For these queries the execution plans include an Index Scan which is the expected data access operator because all conditions are non SARG-able:

Nullable.Columns.Query.Execution.Plan

Second, we change StartDate and EndDate in order to be mandatory columns (NOT NULL) thus:

Code Snippet
DROP INDEX IX_PriceHistory_StartDate ON dbo.PriceHistory
GO
UPDATE    dbo.PriceHistory
SET        StartDate    = CASE WHEN StartDate IS NULL THEN {d '1753-01-01'} ELSE StartDate END,
        EndDate        = CASE WHEN EndDate IS NULL THEN {ts '9999-12-31 23:59:59.997'} ELSE EndDate END
        — 1753-01-01 and 9999-12-31 23:59:59.997
        — represents the minimum and maximum value for DATETIME data type
WHERE    StartDate IS NULL
OR        EndDate    IS NULL
GO
ALTER TABLE dbo.PriceHistory ALTER COLUMN StartDate DATETIME NOT NULL
ALTER TABLE dbo.PriceHistory ALTER COLUMN EndDate DATETIME NOT NULL
GO
CREATE    INDEX IX_PriceHistory_StartDate
ON        dbo.PriceHistory(StartDate)
INCLUDE    (EndDate)
GO

and then re-run the same queries. Second time, we get next execution plans:

Mandatory.Columns.Query.Execution.Plan

This time, for the first two queries, the execution plans include the Index Seek operator even though the conditions (ISNULL / OR IS NULL) aren’t SARG-able. Examining the properties of Index Seek operator for Query 1 and Query 2 reveals the SQL Server rewrote the filtering conditions (Query 1: @d <= ISNULL(d.EndDate, @d), Query 2: (d.StartDate <= @d OR d.StartDate IS NULL) AND (@d <= d.EndDate OR d.EndDate IS NULL)) into something much simple because StartDate and EndDate are mandatory columns (doesn’t allow NULLs). For example, the condition for Query 1 becomes StartDate >= @d AND EndDate <= @d:

Index.Seek Also, we can see that for the third query the condition (COALESCE) wasn’t changed and the execution plan includes Index Scan.

TLTR

When next predicates

  • ISNULL(Column, @variable) {=|>|>=|<|<=} @variable 
  • Column  {=|>|>=|<|<=} @variable OR Column IS NULL

are defined on null-able indexed columns SQL Server uses a Scan operators for data access. When nullability is changed to NOT NULL then SQL Server can choose a Seek without any change in source code. For example changing the source code from ISNULL(Column, @variable) <= @variable to Column <= @variable is not needed. I have observed this behavior on SQL Server 2008 and SQL Server 2012.

Advertisements

2014-04-08

SQL Server: SARG-able predicates and Scan operators

Filed under: index,optimization,SARG,scan,seek,sql server — sa @ 12:15 AM

I will give a presentation at .Net Bucharest software developers meeting about SARG-able predicates and those reasons that can cause scans even there are proper indexes. For every example I will present the problem and the solutions.

Files: Net_User_Group_Bucharest_2014-04-08.zip

 

2013-12-25

Notes regarding SQL Server and BETWEEN operator

Filed under: between,optimization,SARG,sql server — sa @ 12:49 AM

Note #1

According to ANSI SQL ’92 "X BETWEEN Y AND Z" is equivalent to "X>=Y AND X<=Z". This is the reason why "WHERE h.OrderDate BETWEEN @StartDate AND @EndDate" is translated into "WHERE h.OrderDate >= @StartDate AND h.OrderDate <= @EndDate":Sql.Serevr.Between.Execution.Plan.1

This means also that Y must be less or equal than Z (Y <= Z) and @StartDate must be less or equal than @EndDate (@StartDate <= @EndDate). Now, if we run following query having @StartDate = Dec 31, 2005 and @EndDate = Jan 1, 2005,

Code Snippet
  1. SET STATISTICS IO ON;
  2.  
  3. DECLARE @StartDate DATETIME = '20051231 23:59:59.997',
  4.         @EndDate DATETIME = '20050101 00:00:00.000';
  5.         
  6. SELECT    COUNT(*) AS OrderCount1
  7. FROM    Sales.SalesOrderHeader h
  8. WHERE    h.OrderDate BETWEEN @StartDate AND @EndDate
  9. /*
  10. STATISTICS IO ON output:
  11. Table 'SalesOrderHeader'. Scan count 1, logical reads 686
  12. */

the result of COUNT(*) is 0 sales orders and the output of STATISTICS IO ON (the number of logical reads = the number of 8K data pages read from buffer pool) is 686 [logical reads]. This means that SQL Server reads data from Sales.SalesOrderHeader table though @StartDate <= @EndDate predicate is False for current values ("WHERE h.OrderDate BETWEEN ‘20051231 23:59:59.997’ AND ‘20050101 00:00:00.000’" or "WHERE h.OrderDate >= ‘20051231 23:59:59.997’ AND h.OrderDate <= ‘20050101 00:00:00.000’").

What options do we have in such cases ?

  1. We validate the values of parameters
  2. We tell SQL Server that Y must be less or equal than Z:
Code Snippet
  1. SET STATISTICS IO ON;
  2. DECLARE @StartDate DATETIME = '20051231 23:59:59.997',
  3.         @EndDate DATETIME = '20050101 00:00:00.000';
  4.         
  5. SELECT    COUNT(*) AS OrderCount1
  6. FROM    Sales.SalesOrderHeader h
  7. WHERE    h.OrderDate BETWEEN @StartDate AND @EndDate
  8. AND        @StartDate <= @EndDate
  9. /*
  10. STATISTICS IO ON output:
  11. Table 'Worktable'. Scan count 0, logical reads 0
  12. */

This time, because SQL Server knows that @StartDate must be less or equal than @EndDate the output of STATISTICS IO ON shows that it never reads data from Sales.SalesOrderHeader table (0 logical reads). The execution plan of this query includes a Filter operator which checks  @StartDate <= @EndDate predicate (‘20051231 23:59:59.997’ <= ‘20050101 00:00:00.000’) preventing (in this case) the execution of Index Scan / Index Seek operator and thus preventing reading data from Sales.SalesOrderHeader table:

Sql.Serevr.Between.Execution.Plan.3

Note #2

What if we want to count sales orders when @StartDate <= @EndDate but also when @StartDate > @EndDate thus avoiding the limitation of BETWEEN operator from ANSI SQL ? One solution is to rewrite the BETWEEN predicate in such way that BETWEEN always will bet BETWEEN @MinimumValue AND @MaximumValue.

Example:

Code Snippet
  1. SET STATISTICS IO ON;
  2. DECLARE @StartDate DATETIME = '20051231 23:59:59.997',
  3.         @EndDate DATETIME = '20050101 00:00:00.000';
  4.         
  5. SELECT    COUNT(*) AS OrderCount2
  6. FROM    Sales.SalesOrderHeader h
  7. WHERE    h.OrderDate
  8.         BETWEEN
  9.         — Minimum value
  10.         (CASE WHEN @StartDate <= @EndDate THEN @StartDate ELSE @EndDate END)
  11.         AND
  12.         — Maximum value
  13.         (CASE WHEN @StartDate <= @EndDate THEN @EndDate ELSE @StartDate END)

This solution has an advantage: is SARG-able and if there is an index on OrderDate column

Code Snippet
  1. CREATE INDEX IX_SalesOrderHeader_OrderDate
  2. ON Sales.SalesOrderHeader (OrderDate);

then the execution plan will include an Index Seek operator:

Sql.Serevr.Between.Execution.Plan.2

2013-12-09

Non SARG-able predicates #1

Sometimes isn’t enough to create an index. to optimize a query. Even we create the proper index we may discover by examining the execution plan that SQL Server doesn’t use that an Index Seek to find the rows  using instead an Index Scan.

For example, if we want to count the number of sales order for December 2005 a first attempt can use following query and an index on OrderDate:

Code Snippet
CREATE INDEX IX_SalesOrderHeader_OrderDate
ON Sales.SalesOrderHeader (OrderDate);
GO
SET STATISTICS IO ON;
DBCC FREEPROCCACHE;
GO
SELECT COUNT(*)
FROM Sales.SalesOrderHeader h
WHERE YEAR(h.OrderDate) = 2005 AND MONTH(h.OrderDate) = 12
GO
/*
SET STATISTICS IO ON output:
Table 'SalesOrderHeader'. Scan count 1, logical reads 73
*/

Execution.Plan.Index.Scan

We can see that for this query the execution plan includes an Index Scan instead of Index Seek. The reason is these two predicates aren’t SARG-able. A SARG-able predicates allows an Index Seek if an index is available.

The second attempt uses a BETWEEN predicate (which is SARG-able) thus:

Code Snippet
DBCC FREEPROCCACHE;
GO
SELECT    COUNT(*)
FROM    Sales.SalesOrderHeader h
WHERE    h.OrderDate BETWEEN '20051201' AND '20051231 23:59:59.997'
— or WHERE    h.OrderDate >= '20051201' AND h.OrderDate < '20060101'
GO
/*
SET STATISTICS IO ON output:
Table 'SalesOrderHeader'. Scan count 1, logical reads 3
*/

Execution.Plan.Index.Seek

This time the execution plan includes an Index Seek and the performance (Logical Reads) is better: just 3 pages / logical reads (Seek) instead of 73 pages logical reads (Index Scan).

More information on SARG-able predicates can be found here.

Note: For this test I used Adventure Works for SQL Server 2012 (data file) database

Create a free website or blog at WordPress.com.