Yet another SQL Server enthusiast

2015-04-23

SQL Server: autocommit transactions vs. explicit transactions. A simple test.

Hypothesis

Encapsulation of several SQL DML statements like INSERT, UPDATE, DELETE, MERGE in a single transaction can be beneficial for performance compared with approach which doesn’t uses explicit transactions.

Test

To test above hypothesis I have created a small table named dbo.SalesOrder and two batches that insert 100.000 rows (per batch):

  • First batch uses autocommit transactions (in this context, for every INSERT statement SQL Server will create a transaction that will be committed or rolled back when statement execution completes).
  • Second batch uses explicit transactions to encapsulate every five INSERT statements in a single transaction.
Code Snippet
CREATE TABLE dbo.SalesOrder (
    SalesOrderID    INT IDENTITY(1,1) PRIMARY KEY,
    OrderDate        DATE NOT NULL,
    FillerColumn    VARCHAR(100) NOT NULL
);
GO

TRUNCATE TABLE dbo.SalesOrder
GO

— First batch
DECLARE @i INT = 1;
WHILE @i <= 100000
BEGIN
    INSERT    dbo.SalesOrder (OrderDate, FillerColumn)
    VALUES    (GETDATE(), REPLICATE('*', 100));

    SET @i += 1;
END;
GO

— Second batch
DECLARE @i INT = 0;
BEGIN TRANSACTION;

WHILE @i < 100000
BEGIN
    INSERT    dbo.SalesOrder (OrderDate, FillerColumn)
    VALUES    (GETDATE(), REPLICATE('*', 100));

    SET @i += 1;
    IF @i % 5 = 0
    BEGIN
        COMMIT TRANSACTION;
        BEGIN TRANSACTION;
    END
END;
IF @@TRANCOUNT > 0
    COMMIT
GO

Test’s results

To measure the performance of every batch I have used two performance counters:

  • Log Bytes Flushed/sec
  • Log Flushes/sec (number of I/O operations per second)

TX.performance.AutoCommit.vs.Explicit.Transactions. As you can see in this simple test, encapsulation of several DML statement in a single transaction increased the volume of data written to disk / second (Log Bytes Flushed/sec) while the number of physical write operations into database log (Log Flushes/sec) and the execution time decreased.

Note #1: the main reasons for using transactions should be the ACID properties but the way we manage the D(durability) can have significant impact on database performance.

Note #2: this test is simple. Wasn’t my goal to do a full performance test. You should do your own tests.

2015-03-01

NonSARGable predicates #2: about nullable columns, ISNULL, IS NULL and COALESCE

Filed under: index,optimization,predicate,SARG,scalar,sql server 2012 — admin @ 12:37 AM

This post discuss how query complexity can be reduced and query performance can be improved by simply using mandatory columns (NOT NULL) instead of optional (nullable) columns. Also, no query change is needed.

First, let’s create a table to store all changes of products prices:

Code Snippet
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE TABLE dbo.PriceHistory(
    ID            INT IDENTITY(1,1) NOT NULL
    CONSTRAINT    PK_PriceHistory_ID PRIMARY KEY (ID),
    ProductID    INT NOT NULL, — Foreign key
    Price        NUMERIC(9,2) NOT NULL,
    StartDate    DATETIME NULL, — nullable / optional column
    EndDate        DATETIME NULL — nullable / optional column
);
GO
CREATE    INDEX IX_PriceHistory_StartDate
ON        dbo.PriceHistory(StartDate)
INCLUDE    (EndDate)
GO

Our query should compute how many prices are active now. I’ve used three different methods to write conditions for row filtering and none of them follow the rules for SARG-able predicates ([1], [2]):

Code Snippet
DECLARE    @d DATETIME = GETDATE()

SELECT    COUNT(*)
FROM    dbo.PriceHistory d
WHERE    ISNULL(d.StartDate, @d) <= @d
AND        @d <= ISNULL(d.EndDate, @d)

SELECT    COUNT(*)
FROM    dbo.PriceHistory d
WHERE    (d.StartDate <= @d OR d.StartDate IS NULL)
AND        (@d <= d.EndDate OR d.EndDate IS NULL)

SELECT    COUNT(*)
FROM    dbo.PriceHistory d
WHERE    COALESCE(d.StartDate, @d) <= @d
AND        @d <= COALESCE(d.EndDate, @d)

For these queries the execution plans include an Index Scan which is the expected data access operator because all conditions are non SARG-able:

Nullable.Columns.Query.Execution.Plan

Second, we change StartDate and EndDate in order to be mandatory columns (NOT NULL) thus:

Code Snippet
DROP INDEX IX_PriceHistory_StartDate ON dbo.PriceHistory
GO
UPDATE    dbo.PriceHistory
SET        StartDate    = CASE WHEN StartDate IS NULL THEN {d '1753-01-01'} ELSE StartDate END,
        EndDate        = CASE WHEN EndDate IS NULL THEN {ts '9999-12-31 23:59:59.997'} ELSE EndDate END
        — 1753-01-01 and 9999-12-31 23:59:59.997
        — represents the minimum and maximum value for DATETIME data type
WHERE    StartDate IS NULL
OR        EndDate    IS NULL
GO
ALTER TABLE dbo.PriceHistory ALTER COLUMN StartDate DATETIME NOT NULL
ALTER TABLE dbo.PriceHistory ALTER COLUMN EndDate DATETIME NOT NULL
GO
CREATE    INDEX IX_PriceHistory_StartDate
ON        dbo.PriceHistory(StartDate)
INCLUDE    (EndDate)
GO

and then re-run the same queries. Second time, we get next execution plans:

Mandatory.Columns.Query.Execution.Plan

This time, for the first two queries, the execution plans include the Index Seek operator even though the conditions (ISNULL / OR IS NULL) aren’t SARG-able. Examining the properties of Index Seek operator for Query 1 and Query 2 reveals the SQL Server rewrote the filtering conditions (Query 1: @d <= ISNULL(d.EndDate, @d), Query 2: (d.StartDate <= @d OR d.StartDate IS NULL) AND (@d <= d.EndDate OR d.EndDate IS NULL)) into something much simple because StartDate and EndDate are mandatory columns (doesn’t allow NULLs). For example, the condition for Query 1 becomes StartDate >= @d AND EndDate <= @d:

Index.Seek Also, we can see that for the third query the condition (COALESCE) wasn’t changed and the execution plan includes Index Scan.

TLTR

When next predicates

  • ISNULL(Column, @variable) {=|>|>=|<|<=} @variable 
  • Column  {=|>|>=|<|<=} @variable OR Column IS NULL

are defined on null-able indexed columns SQL Server uses a Scan operators for data access. When nullability is changed to NOT NULL then SQL Server can choose a Seek without any change in source code. For example changing the source code from ISNULL(Column, @variable) <= @variable to Column <= @variable is not needed. I have observed this behavior on SQL Server 2008 and SQL Server 2012.

2014-10-13

One solution to optimize SQL queries that use scalar functions

Filed under: function,optimization,scalar,UDF — admin @ 7:05 AM

Scenario

  1. There is a query that uses big tables,
  2. It’s executed rarely (ex. it’s executed once or twice per month),
  3. This query has many conditions, some of them uses UDF scalar functions. The combined selectivity of non-UDF conditions is high (ex. 3.295 rows from 31.263.601).

Problem

The query is slow. How can be optimized quickly this query without rewriting scalar function ?

Scalar UDF used for this test:

Code Snippet
CREATE FUNCTION dbo.SlowScalarFunction(@Name NVARCHAR(250))
RETURNS NVARCHAR(250)
AS
BEGIN
    DECLARE @Result NVARCHAR(250), @Index TINYINT = 1, @Len TINYINT = LEN(@Name + '$') 1;
    SET @Result = CASE WHEN @Name IS NOT NULL THEN '' END
    WHILE @Index <= @Len
    BEGIN
        SET @Result += UPPER(SUBSTRING(@Name, @Index, 1))
        SET @Index += 1
    END
    RETURN @Result
END;

Note: this scalar function was wrote intentionally to be slow.

I’ve used following query for this test:

Code Snippet
— Test #1
SELECT    dbo.SlowScalarFunction(h.ProductName) AS UpperName, h.Quantity
FROM    [dbo].[BigProduct] p JOIN BigTransactionHistory h ON p.ProductID = h.ProductID
WHERE    p.ListPrice > 3550 AND h.Quantity >= 100 AND dbo.SlowScalarFunction(h.ProductName) LIKE N'ROAD%'
GO

Execution plan

Serial.Execution.Plan.Scalar.User.Defined.Function.WHERE.01

As you can see,

  • The WHERE clause contains one condition that calls a scalar UDF ( dbo.SlowScalarFunction(h.ProductName) LIKE N’ROAD%’ ).
  • The same scalar UDF is used also by SELECT clause.
  • The execution plan is serial.
  • There is a “missing index” warning that tells we should create a non-clustered index:
Code Snippet
CREATE NONCLUSTERED INDEX [<Name of Missing Index, sysname,>]
ON [dbo].[BigTransactionHistory] ([ProductID],[Quantity])
INCLUDE ([ProductName])

[One] Solution

1) Because this query is executed rarely, the suggested “missing” index will be / could be used rarely (for read operations). This means that after creation of this index will be an overhead due to maintenance of this index after every INSER, UPDATE or DELETE operation.

2) The usage of scalar function forces SQL Server to choose a serial plan.

Because of these two reasons I decided to not create the suggested index and to decompose the initial query thus:

  • One query without scalar functions calls. This query inserts data that into a temporary table (#Results). This table will contain, usually, a small amount of data. Because of lack of scalar functions, the execution plan will use parallelism.
  • For that small amount of rows inserted into temporary table I call the scalar UDF.
Code Snippet
SELECT    ProductName, Quantity
INTO    #Results
FROM    [dbo].[BigProduct] p JOIN BigTransactionHistory h ON p.ProductID = h.ProductID
WHERE    ListPrice > 3550 AND Quantity >= 100

ALTER TABLE #Results
ADD UpperName NVARCHAR(250)

UPDATE    #Results
SET        UpperName = dbo.SlowScalarFunction(ProductName)

SELECT    UpperName, Quantity
FROM    #Results
WHERE    UpperName LIKE N'ROAD%'

DROP TABLE #Results

Execution plan:

Paralelism.Plan.Scalar.User.Defined.Function.WHERE.02

Using a warm buffer, after these changes, the execution time dropped from 10 seconds to 1 second .

2014-09-28

Simple solution to optimize bulk insert operations

Filed under: optimization,sql server,sql server 2012 — admin @ 11:44 PM

What can be done to optimize (minimize execution time of) bulk insert operations of large *.csv files into a SQL Server database ? Here you may find an excellent article on this subject.

But there is also a simple thing you should consider: autogrowth events. These operations can slow down some operations executed on a SQL Server database (including bulk insert operations). This means that if we eliminate these auto growth events the execution time will decrease.

To test this hypothesis I’ve done two sets of tests using a *.csv file (4.366.563 rows) which has a size of 462 MB uncompressed:

  1. One test using default database options (for size and auto growth) and
  2. The other test use a database with an initial size of 1GB and a log file with an initial size of 500 MB. Because source file uses UTF-8 encoding the text columns are defined using NVARCHAR data type, for every char SQL Server uses two bytes (without data compression).

I’ve used following scripts:

  1. First test:

    Code Snippet
    CREATE DATABASE TestBulkInsert
    ON PRIMARY  (NAME = 'TestBulkInsert_01_Data', FILENAME = 'E:\BD\TestBulkInsert.mdf')
    LOG ON (NAME = 'TestBulkInsert_01_Log', FILENAME = 'E:\BD\TestBulkInsert.ldf')
    GO
    ALTER DATABASE TestBulkInsert
    SET RECOVERY SIMPLE
    GO
    USE TestBulkInsert;
    GO
    SELECT    file_id,  name, size * 8 AS [size KB],
            CASE
                WHEN growth = 0 THEN 'fixed size'
                WHEN is_percent_growth = 1 THEN STR(growth, 10, 0) + ' %'
                ELSE STR(growth * 8, 10, 0) + ' KB'
            END  AS growth_description
    FROM    sys.database_files
    GO
    /*
    file_id name                   size KB     growth_description
    ——- ———————- ———– ——————
    1       TestBulkInsert_01_Data 3136              1024 KB
    2       TestBulkInsert_01_Log  1024                10 %
    */

  2. Second test:
    Code Snippet
    CREATE DATABASE TestBulkInsert
    ON PRIMARY  (NAME = 'TestBulkInsert_01_Data', FILENAME = 'E:\BD\TestBulkInsert.mdf', SIZE = 2GB, FILEGROWTH = 500MB)
    LOG ON (NAME = 'TestBulkInsert_01_Log', FILENAME = 'E:\BD\TestBulkInsert.ldf', SIZE = 100MB, FILEGROWTH = 100MB)
    GO
    ALTER DATABASE TestBulkInsert
    SET RECOVERY SIMPLE
    GO
    USE TestBulkInsert;
    GO
    SELECT    file_id,  name, size * 8 AS [size KB],
            CASE
                WHEN growth = 0 THEN 'fixed size'
                WHEN is_percent_growth = 1 THEN STR(growth, 10, 0) + ' %'
                ELSE STR(growth * 8, 10, 0) + ' KB'
            END  AS growth_description
    FROM    sys.database_files
    GO
    /*
    file_id name                   size KB     growth_description
    ——- ———————- ———– ——————
    1       TestBulkInsert_01_Data 2097152         512000 KB
    2       TestBulkInsert_01_Log  102400          102400 KB
    */

Target table was created using the next script:

Code Snippet
CREATE TABLE [dbo].[currenttmp](
    [ColA] [bigint] NOT NULL PRIMARY KEY,
    [ColB] [nvarchar](20) NULL,
    [ColC] [nvarchar](4000) NULL
) ON [PRIMARY]
GO

To import data I’ve used an SSIS package:

ssis.dataflowtask

Bellow are my results:

  Test 1 Test 2 Difference
Average execution time (sec.) 326 222 104
Average execution time (mm:ss) 5:26 3:42 1:44
Autogrowth events 946 0 946

As you can see, simply eliminating auto growth events can reduce significantly the execution  time when importing large files. This is possible because autogrowth events are expensive.

Note: It’s worth reading about Instant File Initialization (not used in these tests).

2014-06-22

Common table expressions and the number of executions

Filed under: common table expression,CTE,optimization,sql server — admin @ 10:53 PM

A common usage pattern for common table expressions till SQL Server 2012 is to get the previous row values using a combination of ROW_NUMBER, CTE and self-join thus:

Code Snippet
CREATE TABLE dbo.MyTable (
    MyID INT  PRIMARY KEY,
    Filler NCHAR(3500) NOT NULL
);
GO
INSERT  MyTable (MyID, Filler)
VALUES  (11, N'A'), (22, N'B'), (33, N'C'), (44, N'D'), (55, N'E');
GO

— Test #1
WITH MyCTE
AS (
    SELECT  *, ROW_NUMBER()OVER(ORDER BY MyID) AS RowNum
    FROM    dbo.MyTable
)
SELECT  *
FROM    MyCTE crt
LEFT JOIN MyCTE prev ON crt.RowNum=prev.RowNum+1;
GO

In this case, for every row is displayed the previous row values:

Code Snippet
— current row  — — previous row —
MyID Filler RowNum MyID Filler RowNum
—- —— —— —- —— ——
11   A      1      NULL NULL   NULL
22   B      2      11   A      1
33   C      3      22   B      2
44   D      4      33   C      3
55   E      5      44   D      4

Because a common table expression is a local view, the source code is expanded within the source of caller (ex. a INSERT, UPDATE, DELETE or a SELECT statement) and the execution plan for above SELECT statement looks like this:

CTE.Outer.Table.Inner.Table

As you can see from above execution plan, the common table expression (MyCTE) is expanded two times: crt and prev. If we want to see how many time is executed every “instance” of MyCTE then we can inspect the value of Number of Executions property for every operator. Bellow, you can see that for Number of Executions for Clustered Index Scan for crt is 1 and for prev is 5. Basically, in this example, the prev part is executed five times: one time for every row from crt part and overall the MyCTE is executed six times:

CTE.Execution.Plans.LOOP.HASH.MERGE.Join.#1 

It’s possible to change this behavior ? Yes, it’s possible and the answer (at least one answer) comes from changing the physical type of JOIN which in this case is LOOP / NESTED LOOPS.

I’ve changed the physical join from LOOP to HASH and MERGE JOIN and the result was that the prev part is executed, in my example, just one time and overall the MyCTE is executed two times:

Code Snippet
— Test #2
WITH MyCTE
AS (
    SELECT  *, ROW_NUMBER()OVER(ORDER BY MyID) AS RowNum
    FROM    dbo.MyTable
)
SELECT  *
FROM    MyCTE crt
LEFT HASH JOIN MyCTE prev ON crt.RowNum=prev.RowNum+1;
GO

— Test #3
WITH MyCTE
AS (
    SELECT  *, ROW_NUMBER()OVER(ORDER BY MyID) AS RowNum
    FROM    dbo.MyTable
)
SELECT  *
FROM    MyCTE crt
LEFT MERGE JOIN MyCTE prev ON crt.RowNum=prev.RowNum+1;
GO

CTE.Execution.Plans.LOOP.HASH.MERGE.Join.#2&3

Another aspect is, also, interesting: in this small test, the number of logical reads is smaller for HASH and MERGE JOIN than for LOOP JOIN:

SQLTrace.SQLProfiler.Logical.Reads 

Note #1: because of join hints (LOOP, MERGE, HASH) the join order is enforced. Don’t use these hints without proper testing your queries. The usage of these hints without carefully analyzing the execution plans can lead to suboptimal plans.

Note #2: Starting with SQL Server 2012 you can use LAG function to get previous row’s values. See also LEAD function.

2014-04-08

SQL Server: SARG-able predicates and Scan operators

Filed under: index,optimization,SARG,scan,seek,sql server — admin @ 12:15 AM

I will give a presentation at .Net Bucharest software developers meeting about SARG-able predicates and those reasons that can cause scans even there are proper indexes. For every example I will present the problem and the solutions.

Files: Net_User_Group_Bucharest_2014-04-08.zip

 

2014-01-13

Conditional joins

Filed under: conditional,execution plan,join,optimization,predicate,sql server — admin @ 12:36 AM

Let’s start with following example (based on Supertypes and Subtypes modeling approach) which creates three tables:

  1. dbo.BankAccountOwner (BankAccountOwnerID-PK, OwnerType, CreateDate)
  2. dbo.Person (BankAccountOwnerID – PK, FK, FirstName, LastName)
  3. dbo.Company (BankAccountOwnerID – PK, FK, CompanyName)

T-SQL Script:

Code Snippet
CREATE TABLE dbo.BankAccountOwner (
    BankAccountOwnerID INT PRIMARY KEY,
    OwnerType CHAR(1) NOT NULL CHECK( OwnerType IN ('P', 'C') ), — P=Person, C=Company
    CreateDate DATETIME NOT NULL DEFAULT (GETDATE())
);
CREATE TABLE dbo.Person (
    BankAccountOwnerID INT PRIMARY KEY
        REFERENCES dbo.BankAccountOwner(BankAccountOwnerID),
    FirstName NVARCHAR(50) NOT NULL,
    LastName NVARCHAR(50) NOT NULL
);
CREATE TABLE dbo.Company (
    BankAccountOwnerID INT PRIMARY KEY
        REFERENCES dbo.BankAccountOwner(BankAccountOwnerID),
    CompanyName NVARCHAR(100) NOT NULL
);

INSERT    dbo.BankAccountOwner (BankAccountOwnerID, OwnerType)
VALUES    (1, 'P'), (2, 'P'), (3, 'C'), (4, 'C'), (5, 'C');
INSERT    dbo.Person (BankAccountOwnerID, FirstName, LastName)
VALUES    (1, N'John', N'Doe'), (2, N'Mary', N'Doe');
INSERT    dbo.Company (BankAccountOwnerID, CompanyName)
VALUES (3, N'MyComputer'), (4, N'Control Panel'), (5, N'Device Manager');
GO

Problem: how can we get FirstName, LastName and CompanyName values for the following owners: 1, 2, 3, 4, 5 ?

First solution:

Code Snippet
SET NOCOUNT ON;
SET STATISTICS IO ON;
PRINT 'Test #1';
SELECT    bao.*, p.FirstName, p.LastName, c.CompanyName
FROM    dbo.BankAccountOwner bao
LEFT JOIN dbo.Person p ON bao.BankAccountOwnerID = p.BankAccountOwnerID
LEFT JOIN dbo.Company c ON bao.BankAccountOwnerID = c.BankAccountOwnerID
WHERE    bao.BankAccountOwnerID IN (1,2,3,4,5);

My solution:

Because

  1. dbo.Person table contains only rows with OwnerType = ‘P’ and dbo.Company table contains only rows with OwnerType = ‘C’ and
  2. SQL Server doesn’t knows this

I added these predicates to every left join thus:

Code Snippet
PRINT 'Test #2';
SELECT    bao.*, p.FirstName, p.LastName, c.CompanyName
FROM    dbo.BankAccountOwner bao
LEFT JOIN dbo.Person p ON bao.OwnerType = 'P' AND bao.BankAccountOwnerID = p.BankAccountOwnerID
LEFT JOIN dbo.Company c ON bao.OwnerType = 'C' AND bao.BankAccountOwnerID = c.BankAccountOwnerID
WHERE    bao.BankAccountOwnerID IN (1,2,3,4,5);

These are the execution plans:

Conditional.Joins.Execution.Plans

First solution: as you can see from the properties of Index Seek operators SQL Server will try to find every bank account owner (1, 2, …, 5) within dbo.Person table and within dbo.Company table (Number of Execution  = 5).

Second solution: this time because SQL Server knows that dbo.Person table contains only OwnerType = ‘P’ rows it will read from dbo.Person table using an Index Seek (on Person.PK_Person_…) just two times (Number of Executions = 2) because there are only two persons among those five owners. This was possible because of Filter operator which will verify before executing Index Seek if the current owner is ‘P’:

Filter.Operator

Also, SQL Server will read from dbo.Company table using an Index Seek (on Company.PK_Company_…) three times (Number of Executions = 3) because there are three companies.

Because we have reduced the number of execution for Index Seek operators, this is means that we have reduced also the number of logical reads as we can see from the output of STATISTICS IO ON:

Code Snippet
Test #1
Table 'Company'. Scan count 0, logical reads 10
Table 'Person'. Scan count 0, logical reads 10
Table 'BankAccountOwner'. Scan count 5, logical reads 10
Test #2
Table 'Company'. Scan count 0, logical reads 6
Table 'Person'. Scan count 0, logical reads 4
Table 'BankAccountOwner'. Scan count 5, logical reads 10

2013-12-25

Notes regarding SQL Server and BETWEEN operator

Filed under: between,optimization,SARG,sql server — admin @ 12:49 AM

Note #1

According to ANSI SQL ’92 "X BETWEEN Y AND Z" is equivalent to "X>=Y AND X<=Z". This is the reason why "WHERE h.OrderDate BETWEEN @StartDate AND @EndDate" is translated into "WHERE h.OrderDate >= @StartDate AND h.OrderDate <= @EndDate":Sql.Serevr.Between.Execution.Plan.1

This means also that Y must be less or equal than Z (Y <= Z) and @StartDate must be less or equal than @EndDate (@StartDate <= @EndDate). Now, if we run following query having @StartDate = Dec 31, 2005 and @EndDate = Jan 1, 2005,

Code Snippet
  1. SET STATISTICS IO ON;
  2.  
  3. DECLARE @StartDate DATETIME = '20051231 23:59:59.997',
  4.         @EndDate DATETIME = '20050101 00:00:00.000';
  5.         
  6. SELECT    COUNT(*) AS OrderCount1
  7. FROM    Sales.SalesOrderHeader h
  8. WHERE    h.OrderDate BETWEEN @StartDate AND @EndDate
  9. /*
  10. STATISTICS IO ON output:
  11. Table 'SalesOrderHeader'. Scan count 1, logical reads 686
  12. */

the result of COUNT(*) is 0 sales orders and the output of STATISTICS IO ON (the number of logical reads = the number of 8K data pages read from buffer pool) is 686 [logical reads]. This means that SQL Server reads data from Sales.SalesOrderHeader table though @StartDate <= @EndDate predicate is False for current values ("WHERE h.OrderDate BETWEEN ‘20051231 23:59:59.997’ AND ‘20050101 00:00:00.000’" or "WHERE h.OrderDate >= ‘20051231 23:59:59.997’ AND h.OrderDate <= ‘20050101 00:00:00.000’").

What options do we have in such cases ?

  1. We validate the values of parameters
  2. We tell SQL Server that Y must be less or equal than Z:
Code Snippet
  1. SET STATISTICS IO ON;
  2. DECLARE @StartDate DATETIME = '20051231 23:59:59.997',
  3.         @EndDate DATETIME = '20050101 00:00:00.000';
  4.         
  5. SELECT    COUNT(*) AS OrderCount1
  6. FROM    Sales.SalesOrderHeader h
  7. WHERE    h.OrderDate BETWEEN @StartDate AND @EndDate
  8. AND        @StartDate <= @EndDate
  9. /*
  10. STATISTICS IO ON output:
  11. Table 'Worktable'. Scan count 0, logical reads 0
  12. */

This time, because SQL Server knows that @StartDate must be less or equal than @EndDate the output of STATISTICS IO ON shows that it never reads data from Sales.SalesOrderHeader table (0 logical reads). The execution plan of this query includes a Filter operator which checks  @StartDate <= @EndDate predicate (‘20051231 23:59:59.997’ <= ‘20050101 00:00:00.000’) preventing (in this case) the execution of Index Scan / Index Seek operator and thus preventing reading data from Sales.SalesOrderHeader table:

Sql.Serevr.Between.Execution.Plan.3

Note #2

What if we want to count sales orders when @StartDate <= @EndDate but also when @StartDate > @EndDate thus avoiding the limitation of BETWEEN operator from ANSI SQL ? One solution is to rewrite the BETWEEN predicate in such way that BETWEEN always will bet BETWEEN @MinimumValue AND @MaximumValue.

Example:

Code Snippet
  1. SET STATISTICS IO ON;
  2. DECLARE @StartDate DATETIME = '20051231 23:59:59.997',
  3.         @EndDate DATETIME = '20050101 00:00:00.000';
  4.         
  5. SELECT    COUNT(*) AS OrderCount2
  6. FROM    Sales.SalesOrderHeader h
  7. WHERE    h.OrderDate
  8.         BETWEEN
  9.         — Minimum value
  10.         (CASE WHEN @StartDate <= @EndDate THEN @StartDate ELSE @EndDate END)
  11.         AND
  12.         — Maximum value
  13.         (CASE WHEN @StartDate <= @EndDate THEN @EndDate ELSE @StartDate END)

This solution has an advantage: is SARG-able and if there is an index on OrderDate column

Code Snippet
  1. CREATE INDEX IX_SalesOrderHeader_OrderDate
  2. ON Sales.SalesOrderHeader (OrderDate);

then the execution plan will include an Index Seek operator:

Sql.Serevr.Between.Execution.Plan.2

2013-12-09

Non SARG-able predicates #1

Sometimes isn’t enough to create an index. to optimize a query. Even we create the proper index we may discover by examining the execution plan that SQL Server doesn’t use that an Index Seek to find the rows  using instead an Index Scan.

For example, if we want to count the number of sales order for December 2005 a first attempt can use following query and an index on OrderDate:

Code Snippet
CREATE INDEX IX_SalesOrderHeader_OrderDate
ON Sales.SalesOrderHeader (OrderDate);
GO
SET STATISTICS IO ON;
DBCC FREEPROCCACHE;
GO
SELECT COUNT(*)
FROM Sales.SalesOrderHeader h
WHERE YEAR(h.OrderDate) = 2005 AND MONTH(h.OrderDate) = 12
GO
/*
SET STATISTICS IO ON output:
Table 'SalesOrderHeader'. Scan count 1, logical reads 73
*/

Execution.Plan.Index.Scan

We can see that for this query the execution plan includes an Index Scan instead of Index Seek. The reason is these two predicates aren’t SARG-able. A SARG-able predicates allows an Index Seek if an index is available.

The second attempt uses a BETWEEN predicate (which is SARG-able) thus:

Code Snippet
DBCC FREEPROCCACHE;
GO
SELECT    COUNT(*)
FROM    Sales.SalesOrderHeader h
WHERE    h.OrderDate BETWEEN '20051201' AND '20051231 23:59:59.997'
— or WHERE    h.OrderDate >= '20051201' AND h.OrderDate < '20060101'
GO
/*
SET STATISTICS IO ON output:
Table 'SalesOrderHeader'. Scan count 1, logical reads 3
*/

Execution.Plan.Index.Seek

This time the execution plan includes an Index Seek and the performance (Logical Reads) is better: just 3 pages / logical reads (Seek) instead of 73 pages logical reads (Index Scan).

More information on SARG-able predicates can be found here.

Note: For this test I used Adventure Works for SQL Server 2012 (data file) database

2013-09-05

Misinterpretation of the missing indexes

When we analyze the execution plans (estimated: Ctrl+L; actual:Ctrl + M, F5) we could see sometimes information regarding the missing indexes thus:

ActualExecutionPlanMissingIndexes And when we open the contextual menu and then select the “Missing Index Details …” submenu

ActualExecutionPlanMissingIndexes#2 we get the script to create the index:

Code Snippet
/*
Missing Index Details from SQLQuery33.sql – (local)\SQL2008R2.AdventureWorks2008R2 (N4FG80D9FGDFJGO\sqlservr.exe (53))
The Query Processor estimates that implementing the following index could improve the query cost by 94.1312%.
*/

/*
USE [AdventureWorks2008R2]
GO
CREATE NONCLUSTERED INDEX [<Name of Missing Index, sysname,>]
ON [Sales].[SalesOrderHeader] ([OrderDate])
INCLUDE ([SalesOrderID],[DueDate])
GO
*/

Now, when we see the CREATE INDEX statement some of us would think that this is an imperative command: “you should create this index”. Wrong. In my opinion, this is the way to say what index could improve the performance of current query. For example, the performance of this query could be improved if you have an index with the following key (OrderDate) and having the following covering columns (SalesOrderID and DueDate). What we could do is to check if you have a similar index. In my example, we have the following index having the same key (OrderDate) with the missing index:

Code Snippet
CREATE NONCLUSTERED INDEX [IX_SalesOrderHeader_OrderDate]
ON [Sales].[SalesOrderHeader]
(
    [OrderDate] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
GO

In this case, what we should do is to simply change the existing index by adding SalesOrderID and DueDate columns thus:

Code Snippet
CREATE NONCLUSTERED INDEX [IX_SalesOrderHeader_OrderDate]
ON [Sales].[SalesOrderHeader]
(
    [OrderDate] ASC
)
INCLUDE ([SalesOrderID],[DueDate])
WITH (DROP_EXISTING = ON);
GO
EXEC sp_rename  
    @objname = 'Sales.SalesOrderHeader.IX_SalesOrderHeader_OrderDate',
    @newname = 'IX_SalesOrderHeader_OrderDate#DueDate_SalesOrderID',
    @objtype = 'INDEX';
GO    

Finally, if we rerun the query the execution plan is changed and it includes the Index Seek operator instead of Clustered Index Scan:

ActualExecutionPlanMissingIndexes#3

Next Page »

Create a free website or blog at WordPress.com.