Yet another SQL Server enthusiast

2013-10-04

Finding Scans

Filed under: index,scan,sql server 2008 r2 — sa @ 8:19 AM

Starting with SQL Server 2005, the execution plans are stored as XML. Combining this fact with powerful XQueries the result is endless (well, almost) possibilities. For example, somebody could write a simple SELECT statement to find out all execution plans including [ [ Clustered] Index | Table ] Scan operators.

Step #1 Run this script to create a stored procedure:

Code Snippet
USE AdventureWorks2008R2;
GO
CREATE PROCEDURE Sales.spTest
AS
SELECT  soh.SalesOrderID
FROM    Sales.SalesOrderHeader soh
WHERE   soh.ShipDate>='20070101';
GO

Step #2 Execute the newly created stored procedure (EXEC Sales.spTest) using Actual Execution Plan option activated (see here how). The result will be the following execution plan:

XmlExecutionPlan

Step #3: Now, run the following SELECT statement to find out all execution plans which include an [ [ Clustered] Index | Table ] Scan operator:

Code Snippet
SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED;
WITH XMLNAMESPACES
(
    DEFAULT 'http://schemas.microsoft.com/sqlserver/2004/07/showplan'
)
SELECT    TOP(10)
        crt_db.[dbid],
        DB_NAME(crt_db.[dbid]) AS database_name,
        pl.objectid,
        QUOTENAME(OBJECT_SCHEMA_NAME(pl.objectid, crt_db.[dbid]))+'.'+QUOTENAME(OBJECT_NAME(pl.objectid, crt_db.[dbid])) AS object_name,
        pl.query_plan,
        src.text AS source_code,
        pl.query_plan.exist('//RelOp[@PhysicalOp="Clustered Index Scan"]') AS [Has Clustered Index Scan],
        pl.query_plan.exist('//RelOp[@PhysicalOp="Index Scan"]') AS [Has Index Scan],
        pl.query_plan.exist('//RelOp[@PhysicalOp="Table Scan"]') AS [Has Table Scan],
        cp.size_in_bytes,
        cp.usecounts
FROM    sys.dm_exec_cached_plans cp
CROSS APPLY sys.dm_exec_query_plan(cp.plan_handle) pl
CROSS APPLY sys.dm_exec_sql_text(cp.plan_handle) src
CROSS APPLY (
    SELECT    CONVERT(INT, plat.value) AS [dbid]
    FROM    sys.dm_exec_plan_attributes(cp.plan_handle) plat
    WHERE    plat.attribute = 'dbid'
) crt_db
WHERE   pl.query_plan.exist('//RelOp[
    @PhysicalOp = "Clustered Index Scan"
    or @PhysicalOp = "Index Scan"
    or @PhysicalOp = "Table Scan"
]')=1
ORDER BY cp.usecounts DESC;
SET TRANSACTION ISOLATION LEVEL READ COMMITTED;

Results:

QueryCache

Revision History

2013-10-07 Update #1 (Changes: step 2 and 3)

2013-09-05

Misinterpretation of the missing indexes

When we analyze the execution plans (estimated: Ctrl+L; actual:Ctrl + M, F5) we could see sometimes information regarding the missing indexes thus:

ActualExecutionPlanMissingIndexes And when we open the contextual menu and then select the “Missing Index Details …” submenu

ActualExecutionPlanMissingIndexes#2 we get the script to create the index:

Code Snippet
/*
Missing Index Details from SQLQuery33.sql – (local)\SQL2008R2.AdventureWorks2008R2 (N4FG80D9FGDFJGO\sqlservr.exe (53))
The Query Processor estimates that implementing the following index could improve the query cost by 94.1312%.
*/

/*
USE [AdventureWorks2008R2]
GO
CREATE NONCLUSTERED INDEX [<Name of Missing Index, sysname,>]
ON [Sales].[SalesOrderHeader] ([OrderDate])
INCLUDE ([SalesOrderID],[DueDate])
GO
*/

Now, when we see the CREATE INDEX statement some of us would think that this is an imperative command: “you should create this index”. Wrong. In my opinion, this is the way to say what index could improve the performance of current query. For example, the performance of this query could be improved if you have an index with the following key (OrderDate) and having the following covering columns (SalesOrderID and DueDate). What we could do is to check if you have a similar index. In my example, we have the following index having the same key (OrderDate) with the missing index:

Code Snippet
CREATE NONCLUSTERED INDEX [IX_SalesOrderHeader_OrderDate]
ON [Sales].[SalesOrderHeader]
(
    [OrderDate] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
GO

In this case, what we should do is to simply change the existing index by adding SalesOrderID and DueDate columns thus:

Code Snippet
CREATE NONCLUSTERED INDEX [IX_SalesOrderHeader_OrderDate]
ON [Sales].[SalesOrderHeader]
(
    [OrderDate] ASC
)
INCLUDE ([SalesOrderID],[DueDate])
WITH (DROP_EXISTING = ON);
GO
EXEC sp_rename  
    @objname = 'Sales.SalesOrderHeader.IX_SalesOrderHeader_OrderDate',
    @newname = 'IX_SalesOrderHeader_OrderDate#DueDate_SalesOrderID',
    @objtype = 'INDEX';
GO    

Finally, if we rerun the query the execution plan is changed and it includes the Index Seek operator instead of Clustered Index Scan:

ActualExecutionPlanMissingIndexes#3

2013-08-20

Seek vs. Scan (II)

Filed under: index,optimization,scan,seek,sql server,sql server 2008 r2 — sa @ 7:45 PM

This blog post tries to explain the results of the first series of tests. To accomplish this goal I will do a new series of tests using (almost) the same methodology. The only difference is how I select the rows (164 rows or 1640 rows) used  for testing. Please remember that the clustered index of Sales.Customer table (PK_Customer_CustomerID) has 121 leaf pages and 19820 rows. This is the script used for this:

Code Snippet
— Test Series #2: Seek vs. Scan (II)
INSERT INTO dbo.[Rows](CustomerID)
SELECT x.CustomerID
FROM(
SELECT    c.CustomerID,(ROW_NUMBER() OVER(ORDER BY c.CustomerID)-1) % 121 AS RowNum
FROM    Sales.Customer c
) x
WHERE x.RowNum = 0; /* x.RowNum BETWEEN 0 AND 9; */

This time I got these results:

TestSeries#2_current

To compare both results in a simple manner I added the previous results:

TestSeries#1_previous

As you can see, the current  test series (#2) has some interesting facts:

  • Merge Join and Scan Merge Join operators give the best performance only from the point of view of logical reads.
  • However the best overall performance (Duration, CPU) is achieved by Nested Loops and Index Seek (which isn’t all surprising).
  • And, again, these series of tests show that [Logical] Reads shouldn’t be the only key performance indicator used when we evaluate different execution plans for the same query.

Why ?

Visually, the main difference between these two series of tests is as follows:

TestSeries#1vsTestSeries#2Test series #1 has to find 164 records which are physical stored in the “first” leaf page of PK_Customer_CustomerID clustered index (or it has to find 1640 records stored in the “first” 10 leaf pages of the same index). On the other hand test series #2 reads reads the same number of records (164 records, 1640 records) which are equally distributed in all those 121 leaf pages of PK_Customer_CustomerID clustered index. As you can see, this small detail (the physically distribution of records in the clustered index: PK_Customer_CustomerID) has a huge impact on the query performance.

The last question

The last question and the main question is (I think) why Merge Join + Index Scan (Query 3; all 164 records are stored in the “first” leaf page) give the best results for the Test series #1 ?

ActualExecutionPlansTestSeries#1Query3We would expect that Clustered Index Scan operator to read all records from all leaf pages:

ClusteredindexScan1

But, the output of SET STATISTICS IO ON shows something unexpected:

Code Snippet
Table ‘Customer’. Scan count 1, logical reads 3
Table ‘Rows’. Scan count 1, logical reads 2

As you can see, instead of showing 1 (root page) + 121 leaf pages + 1 page IAM =  the output of STATISTICS IO shows only 3 logical reads representing 1 root page + 1 leaf page + 1 IAM = 123 logical reads. This output shows that after reading 164 records from the “first” leaf page the Clustered Index Scan stops and doesn’t tries to read following leaf pages:

ClusteredindexScan2 From this point of view, the behavior of Scan operator in this context is partially similar to behavior of Index Seek – range scan :

Code Snippet
SELECT *
FROM Sales.Customer c
WHERE c.CustomerID <= 164
ORDER BY c.CustomerID;

The behavior of this Scan operator is also partially similar to behavior of Clustered Index Scan in the following query:

Code Snippet
SELECT TOP(164) *
FROM Sales.Customer c
ORDER BY c.CustomerID;

ActualExecutionPlansTopClusteredIndexScan

Code Snippet
— SET STATISTICS IO ON
Table ‘Customer’. Scan count 1, logical reads 2

Remarks

  1. A Scan operator is not always something “bad”. In some cases, Scan will give better results than Index Seek (at least than Index Seek – singleton seek).
  2. Not always Merge Join gives the best results even if there are proper indexes.
  3. Logical reads shouldn’t be the only query performance metric used to measure the query performance. You should look at CPU and, of course, Duration.

Note: To understand why I used all these table/index hints

Code Snippet
MyTable WITH(FORCESEEK(PK_Customer_CustomerID(CustomerID)))
— index_id 1 is the clustered index (PK_Customer_CustomerID)
MyTable WITH(INDEX(1),FORCESCAN)

please read this blog post.

Create a free website or blog at WordPress.com.