Yet another SQL Server enthusiast

2015-03-01

NonSARGable predicates #2: about nullable columns, ISNULL, IS NULL and COALESCE

Filed under: index,optimization,predicate,SARG,scalar,sql server 2012 — admin @ 12:37 AM

This post discuss how query complexity can be reduced and query performance can be improved by simply using mandatory columns (NOT NULL) instead of optional (nullable) columns. Also, no query change is needed.

First, let’s create a table to store all changes of products prices:

Code Snippet
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE TABLE dbo.PriceHistory(
    ID            INT IDENTITY(1,1) NOT NULL
    CONSTRAINT    PK_PriceHistory_ID PRIMARY KEY (ID),
    ProductID    INT NOT NULL, — Foreign key
    Price        NUMERIC(9,2) NOT NULL,
    StartDate    DATETIME NULL, — nullable / optional column
    EndDate        DATETIME NULL — nullable / optional column
);
GO
CREATE    INDEX IX_PriceHistory_StartDate
ON        dbo.PriceHistory(StartDate)
INCLUDE    (EndDate)
GO

Our query should compute how many prices are active now. I’ve used three different methods to write conditions for row filtering and none of them follow the rules for SARG-able predicates ([1], [2]):

Code Snippet
DECLARE    @d DATETIME = GETDATE()

SELECT    COUNT(*)
FROM    dbo.PriceHistory d
WHERE    ISNULL(d.StartDate, @d) <= @d
AND        @d <= ISNULL(d.EndDate, @d)

SELECT    COUNT(*)
FROM    dbo.PriceHistory d
WHERE    (d.StartDate <= @d OR d.StartDate IS NULL)
AND        (@d <= d.EndDate OR d.EndDate IS NULL)

SELECT    COUNT(*)
FROM    dbo.PriceHistory d
WHERE    COALESCE(d.StartDate, @d) <= @d
AND        @d <= COALESCE(d.EndDate, @d)

For these queries the execution plans include an Index Scan which is the expected data access operator because all conditions are non SARG-able:

Nullable.Columns.Query.Execution.Plan

Second, we change StartDate and EndDate in order to be mandatory columns (NOT NULL) thus:

Code Snippet
DROP INDEX IX_PriceHistory_StartDate ON dbo.PriceHistory
GO
UPDATE    dbo.PriceHistory
SET        StartDate    = CASE WHEN StartDate IS NULL THEN {d '1753-01-01'} ELSE StartDate END,
        EndDate        = CASE WHEN EndDate IS NULL THEN {ts '9999-12-31 23:59:59.997'} ELSE EndDate END
        — 1753-01-01 and 9999-12-31 23:59:59.997
        — represents the minimum and maximum value for DATETIME data type
WHERE    StartDate IS NULL
OR        EndDate    IS NULL
GO
ALTER TABLE dbo.PriceHistory ALTER COLUMN StartDate DATETIME NOT NULL
ALTER TABLE dbo.PriceHistory ALTER COLUMN EndDate DATETIME NOT NULL
GO
CREATE    INDEX IX_PriceHistory_StartDate
ON        dbo.PriceHistory(StartDate)
INCLUDE    (EndDate)
GO

and then re-run the same queries. Second time, we get next execution plans:

Mandatory.Columns.Query.Execution.Plan

This time, for the first two queries, the execution plans include the Index Seek operator even though the conditions (ISNULL / OR IS NULL) aren’t SARG-able. Examining the properties of Index Seek operator for Query 1 and Query 2 reveals the SQL Server rewrote the filtering conditions (Query 1: @d <= ISNULL(d.EndDate, @d), Query 2: (d.StartDate <= @d OR d.StartDate IS NULL) AND (@d <= d.EndDate OR d.EndDate IS NULL)) into something much simple because StartDate and EndDate are mandatory columns (doesn’t allow NULLs). For example, the condition for Query 1 becomes StartDate >= @d AND EndDate <= @d:

Index.Seek Also, we can see that for the third query the condition (COALESCE) wasn’t changed and the execution plan includes Index Scan.

TLTR

When next predicates

  • ISNULL(Column, @variable) {=|>|>=|<|<=} @variable 
  • Column  {=|>|>=|<|<=} @variable OR Column IS NULL

are defined on null-able indexed columns SQL Server uses a Scan operators for data access. When nullability is changed to NOT NULL then SQL Server can choose a Seek without any change in source code. For example changing the source code from ISNULL(Column, @variable) <= @variable to Column <= @variable is not needed. I have observed this behavior on SQL Server 2008 and SQL Server 2012.

2014-10-13

One solution to optimize SQL queries that use scalar functions

Filed under: function,optimization,scalar,UDF — admin @ 7:05 AM

Scenario

  1. There is a query that uses big tables,
  2. It’s executed rarely (ex. it’s executed once or twice per month),
  3. This query has many conditions, some of them uses UDF scalar functions. The combined selectivity of non-UDF conditions is high (ex. 3.295 rows from 31.263.601).

Problem

The query is slow. How can be optimized quickly this query without rewriting scalar function ?

Scalar UDF used for this test:

Code Snippet
CREATE FUNCTION dbo.SlowScalarFunction(@Name NVARCHAR(250))
RETURNS NVARCHAR(250)
AS
BEGIN
    DECLARE @Result NVARCHAR(250), @Index TINYINT = 1, @Len TINYINT = LEN(@Name + '$') 1;
    SET @Result = CASE WHEN @Name IS NOT NULL THEN '' END
    WHILE @Index <= @Len
    BEGIN
        SET @Result += UPPER(SUBSTRING(@Name, @Index, 1))
        SET @Index += 1
    END
    RETURN @Result
END;

Note: this scalar function was wrote intentionally to be slow.

I’ve used following query for this test:

Code Snippet
— Test #1
SELECT    dbo.SlowScalarFunction(h.ProductName) AS UpperName, h.Quantity
FROM    [dbo].[BigProduct] p JOIN BigTransactionHistory h ON p.ProductID = h.ProductID
WHERE    p.ListPrice > 3550 AND h.Quantity >= 100 AND dbo.SlowScalarFunction(h.ProductName) LIKE N'ROAD%'
GO

Execution plan

Serial.Execution.Plan.Scalar.User.Defined.Function.WHERE.01

As you can see,

  • The WHERE clause contains one condition that calls a scalar UDF ( dbo.SlowScalarFunction(h.ProductName) LIKE N’ROAD%’ ).
  • The same scalar UDF is used also by SELECT clause.
  • The execution plan is serial.
  • There is a “missing index” warning that tells we should create a non-clustered index:
Code Snippet
CREATE NONCLUSTERED INDEX [<Name of Missing Index, sysname,>]
ON [dbo].[BigTransactionHistory] ([ProductID],[Quantity])
INCLUDE ([ProductName])

[One] Solution

1) Because this query is executed rarely, the suggested “missing” index will be / could be used rarely (for read operations). This means that after creation of this index will be an overhead due to maintenance of this index after every INSER, UPDATE or DELETE operation.

2) The usage of scalar function forces SQL Server to choose a serial plan.

Because of these two reasons I decided to not create the suggested index and to decompose the initial query thus:

  • One query without scalar functions calls. This query inserts data that into a temporary table (#Results). This table will contain, usually, a small amount of data. Because of lack of scalar functions, the execution plan will use parallelism.
  • For that small amount of rows inserted into temporary table I call the scalar UDF.
Code Snippet
SELECT    ProductName, Quantity
INTO    #Results
FROM    [dbo].[BigProduct] p JOIN BigTransactionHistory h ON p.ProductID = h.ProductID
WHERE    ListPrice > 3550 AND Quantity >= 100

ALTER TABLE #Results
ADD UpperName NVARCHAR(250)

UPDATE    #Results
SET        UpperName = dbo.SlowScalarFunction(ProductName)

SELECT    UpperName, Quantity
FROM    #Results
WHERE    UpperName LIKE N'ROAD%'

DROP TABLE #Results

Execution plan:

Paralelism.Plan.Scalar.User.Defined.Function.WHERE.02

Using a warm buffer, after these changes, the execution time dropped from 10 seconds to 1 second .

Blog at WordPress.com.