Yet another SQL Server enthusiast

2015-05-08

How to find execution plans with warnings and missing indexes

Starting with SQL Server 2005, execution plans include information about missing indexes and staring with SQL Server 2012 execution plans could include information regarding few warnings (ex. implicit conversions, no join predicates, ). Because both types of information are included in [estimated] execution plans and these plans are stored as XML data we can use following queries that find execution plans with warnings and with missing indexes for TOP(100) queries sorted by average logical reads:

Code Snippet
SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED;
SET NOCOUNT ON;

IF OBJECT_ID('tempdb.dbo.#QueryPlans') IS NOT NULL
    DROP TABLE #QueryPlans;

SELECT    pa.*,
        SUBSTRING(st.[text], (qs.statement_start_offset/2) + 1, 100) AS statement_text,

        qs.execution_count, qs.creation_time, qs.last_execution_time,

        min_elapsed_time    = qs.min_elapsed_time,
        avg_elapsed_time    = CONVERT(NUMERIC(38, 4), qs.total_elapsed_time * 1. / qs.execution_count),
        max_elapsed_time    = qs.max_elapsed_time,

        min_worker_time        = qs.min_worker_time,
        avg_worker_time        = CONVERT(NUMERIC(38, 4), qs.total_worker_time * 1. / qs.execution_count),
        max_worker_time        = qs.max_worker_time,

        min_logical_reads    = qs.min_logical_reads,
        avg_logical_reads    = CONVERT(NUMERIC(38, 4), qs.total_logical_reads * 1. / qs.execution_count),
        max_logical_reads    = qs.max_logical_reads,

        min_logical_writes    = qs.min_logical_writes,
        avg_logical_writes    = CONVERT(NUMERIC(38, 4), qs.total_logical_writes * 1. / qs.execution_count),
        max_logical_writes    = qs.max_logical_writes,

        min_clr_time        = qs.min_clr_time,
        avg_clr_time        = CONVERT(NUMERIC(38, 4), qs.total_clr_time * 1. / qs.execution_count),
        max_clr_time        = qs.max_clr_time,

        min_rows            = qs.min_rows,
        avg_rows            = CONVERT(NUMERIC(38, 4), qs.total_rows * 1. / qs.execution_count),
        max_rows            = qs.max_rows,

        qs.[sql_handle], qs.plan_handle, qs.query_hash, qs.query_plan_hash
INTO    #QueryPlans
FROM    sys.dm_exec_query_stats qs
CROSS APPLY sys.dm_exec_sql_text(qs.[sql_handle]) st
CROSS APPLY (
    SELECT    database_name = DB_NAME(rpvt.[dbid]), object_name = OBJECT_NAME(rpvt.objectid, rpvt.[dbid])
    FROM (
        SELECT    xpa.attribute, int_value = CONVERT(INT, xpa.value)
        FROM    sys.dm_exec_plan_attributes(qs.plan_handle) xpa
        WHERE    xpa.attribute IN (N'dbid', N'objectid')
    ) spvt
    PIVOT( MAX(spvt.int_value) FOR spvt.attribute IN ([dbid], [objectid])) rpvt
) pa
OPTION(RECOMPILE);

WITH XMLNAMESPACES (
    'http://www.w3.org/2001/XMLSchema-instance' AS xsi,
    'http://www.w3.org/2001/XMLSchema' AS xsd,
    DEFAULT 'http://schemas.microsoft.com/sqlserver/2004/07/showplan'
)
SELECT    TOP(100)
        XmlMissingIndexes    = qp.query_plan.query('//MissingIndexes'),
        XmlWarnings            = qp.query_plan.query('//Warnings'),
        XmlExecutionPlan    = qp.query_plan,
        t.*
FROM    #QueryPlans t
OUTER APPLY sys.dm_exec_query_plan(t.plan_handle) qp
WHERE    qp.query_plan.exist('//Warnings') = 1
OR        qp.query_plan.exist('//MissingIndexes') = 1
ORDER BY t.avg_logical_reads DESC — You may change default sort criteria thus: avg_elapsed_time, avg_worker_time, avg_logical_writes, etc.
OPTION(RECOMPILE)

Results:

SQL Server Execution Plans - Warnings - Missing Indexes

You may change default sort criteria to ORDER BY avg_elapsed_time /  avg_worker_time / another column from #QueryPlans table.

Download script

Advertisements

2015-03-01

NonSARGable predicates #2: about nullable columns, ISNULL, IS NULL and COALESCE

Filed under: index,optimization,predicate,SARG,scalar,sql server 2012 — sa @ 12:37 AM

This post discuss how query complexity can be reduced and query performance can be improved by simply using mandatory columns (NOT NULL) instead of optional (nullable) columns. Also, no query change is needed.

First, let’s create a table to store all changes of products prices:

Code Snippet
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE TABLE dbo.PriceHistory(
    ID            INT IDENTITY(1,1) NOT NULL
    CONSTRAINT    PK_PriceHistory_ID PRIMARY KEY (ID),
    ProductID    INT NOT NULL, — Foreign key
    Price        NUMERIC(9,2) NOT NULL,
    StartDate    DATETIME NULL, — nullable / optional column
    EndDate        DATETIME NULL — nullable / optional column
);
GO
CREATE    INDEX IX_PriceHistory_StartDate
ON        dbo.PriceHistory(StartDate)
INCLUDE    (EndDate)
GO

Our query should compute how many prices are active now. I’ve used three different methods to write conditions for row filtering and none of them follow the rules for SARG-able predicates ([1], [2]):

Code Snippet
DECLARE    @d DATETIME = GETDATE()

SELECT    COUNT(*)
FROM    dbo.PriceHistory d
WHERE    ISNULL(d.StartDate, @d) <= @d
AND        @d <= ISNULL(d.EndDate, @d)

SELECT    COUNT(*)
FROM    dbo.PriceHistory d
WHERE    (d.StartDate <= @d OR d.StartDate IS NULL)
AND        (@d <= d.EndDate OR d.EndDate IS NULL)

SELECT    COUNT(*)
FROM    dbo.PriceHistory d
WHERE    COALESCE(d.StartDate, @d) <= @d
AND        @d <= COALESCE(d.EndDate, @d)

For these queries the execution plans include an Index Scan which is the expected data access operator because all conditions are non SARG-able:

Nullable.Columns.Query.Execution.Plan

Second, we change StartDate and EndDate in order to be mandatory columns (NOT NULL) thus:

Code Snippet
DROP INDEX IX_PriceHistory_StartDate ON dbo.PriceHistory
GO
UPDATE    dbo.PriceHistory
SET        StartDate    = CASE WHEN StartDate IS NULL THEN {d '1753-01-01'} ELSE StartDate END,
        EndDate        = CASE WHEN EndDate IS NULL THEN {ts '9999-12-31 23:59:59.997'} ELSE EndDate END
        — 1753-01-01 and 9999-12-31 23:59:59.997
        — represents the minimum and maximum value for DATETIME data type
WHERE    StartDate IS NULL
OR        EndDate    IS NULL
GO
ALTER TABLE dbo.PriceHistory ALTER COLUMN StartDate DATETIME NOT NULL
ALTER TABLE dbo.PriceHistory ALTER COLUMN EndDate DATETIME NOT NULL
GO
CREATE    INDEX IX_PriceHistory_StartDate
ON        dbo.PriceHistory(StartDate)
INCLUDE    (EndDate)
GO

and then re-run the same queries. Second time, we get next execution plans:

Mandatory.Columns.Query.Execution.Plan

This time, for the first two queries, the execution plans include the Index Seek operator even though the conditions (ISNULL / OR IS NULL) aren’t SARG-able. Examining the properties of Index Seek operator for Query 1 and Query 2 reveals the SQL Server rewrote the filtering conditions (Query 1: @d <= ISNULL(d.EndDate, @d), Query 2: (d.StartDate <= @d OR d.StartDate IS NULL) AND (@d <= d.EndDate OR d.EndDate IS NULL)) into something much simple because StartDate and EndDate are mandatory columns (doesn’t allow NULLs). For example, the condition for Query 1 becomes StartDate >= @d AND EndDate <= @d:

Index.Seek Also, we can see that for the third query the condition (COALESCE) wasn’t changed and the execution plan includes Index Scan.

TLTR

When next predicates

  • ISNULL(Column, @variable) {=|>|>=|<|<=} @variable 
  • Column  {=|>|>=|<|<=} @variable OR Column IS NULL

are defined on null-able indexed columns SQL Server uses a Scan operators for data access. When nullability is changed to NOT NULL then SQL Server can choose a Seek without any change in source code. For example changing the source code from ISNULL(Column, @variable) <= @variable to Column <= @variable is not needed. I have observed this behavior on SQL Server 2008 and SQL Server 2012.

2014-09-28

Simple solution to optimize bulk insert operations

Filed under: optimization,sql server,sql server 2012 — sa @ 11:44 PM

What can be done to optimize (minimize execution time of) bulk insert operations of large *.csv files into a SQL Server database ? Here you may find an excellent article on this subject.

But there is also a simple thing you should consider: autogrowth events. These operations can slow down some operations executed on a SQL Server database (including bulk insert operations). This means that if we eliminate these auto growth events the execution time will decrease.

To test this hypothesis I’ve done two sets of tests using a *.csv file (4.366.563 rows) which has a size of 462 MB uncompressed:

  1. One test using default database options (for size and auto growth) and
  2. The other test use a database with an initial size of 1GB and a log file with an initial size of 500 MB. Because source file uses UTF-8 encoding the text columns are defined using NVARCHAR data type, for every char SQL Server uses two bytes (without data compression).

I’ve used following scripts:

  1. First test:

    Code Snippet
    CREATE DATABASE TestBulkInsert
    ON PRIMARY  (NAME = 'TestBulkInsert_01_Data', FILENAME = 'E:\BD\TestBulkInsert.mdf')
    LOG ON (NAME = 'TestBulkInsert_01_Log', FILENAME = 'E:\BD\TestBulkInsert.ldf')
    GO
    ALTER DATABASE TestBulkInsert
    SET RECOVERY SIMPLE
    GO
    USE TestBulkInsert;
    GO
    SELECT    file_id,  name, size * 8 AS [size KB],
            CASE
                WHEN growth = 0 THEN 'fixed size'
                WHEN is_percent_growth = 1 THEN STR(growth, 10, 0) + ' %'
                ELSE STR(growth * 8, 10, 0) + ' KB'
            END  AS growth_description
    FROM    sys.database_files
    GO
    /*
    file_id name                   size KB     growth_description
    ——- ———————- ———– ——————
    1       TestBulkInsert_01_Data 3136              1024 KB
    2       TestBulkInsert_01_Log  1024                10 %
    */

  2. Second test:
    Code Snippet
    CREATE DATABASE TestBulkInsert
    ON PRIMARY  (NAME = 'TestBulkInsert_01_Data', FILENAME = 'E:\BD\TestBulkInsert.mdf', SIZE = 2GB, FILEGROWTH = 500MB)
    LOG ON (NAME = 'TestBulkInsert_01_Log', FILENAME = 'E:\BD\TestBulkInsert.ldf', SIZE = 100MB, FILEGROWTH = 100MB)
    GO
    ALTER DATABASE TestBulkInsert
    SET RECOVERY SIMPLE
    GO
    USE TestBulkInsert;
    GO
    SELECT    file_id,  name, size * 8 AS [size KB],
            CASE
                WHEN growth = 0 THEN 'fixed size'
                WHEN is_percent_growth = 1 THEN STR(growth, 10, 0) + ' %'
                ELSE STR(growth * 8, 10, 0) + ' KB'
            END  AS growth_description
    FROM    sys.database_files
    GO
    /*
    file_id name                   size KB     growth_description
    ——- ———————- ———– ——————
    1       TestBulkInsert_01_Data 2097152         512000 KB
    2       TestBulkInsert_01_Log  102400          102400 KB
    */

Target table was created using the next script:

Code Snippet
CREATE TABLE [dbo].[currenttmp](
    [ColA] [bigint] NOT NULL PRIMARY KEY,
    [ColB] [nvarchar](20) NULL,
    [ColC] [nvarchar](4000) NULL
) ON [PRIMARY]
GO

To import data I’ve used an SSIS package:

ssis.dataflowtask

Bellow are my results:

  Test 1 Test 2 Difference
Average execution time (sec.) 326 222 104
Average execution time (mm:ss) 5:26 3:42 1:44
Autogrowth events 946 0 946

As you can see, simply eliminating auto growth events can reduce significantly the execution  time when importing large files. This is possible because autogrowth events are expensive.

Note: It’s worth reading about Instant File Initialization (not used in these tests).

Blog at WordPress.com.