There is a query that uses big tables,
It’s executed rarely (ex. it’s executed once or twice per month),
This query has many conditions, some of them uses UDF scalar functions. The combined selectivity of non-UDF conditions is high (ex. 3.295 rows from 31.263.601).
The query is slow. How can be optimized quickly this query without rewriting scalar function ?
Scalar UDF used for this test:
Note: this scalar function was wrote intentionally to be slow.
I’ve used following query for this test:
As you can see,
The WHERE clause contains one condition that calls a scalar UDF ( dbo.SlowScalarFunction(h.ProductName) LIKE N’ROAD%’ ).
The same scalar UDF is used also by SELECT clause.
The execution plan is serial.
There is a “missing index” warning that tells we should create a non-clustered index:
1) Because this query is executed rarely, the suggested “missing” index will be / could be used rarely (for read operations). This means that after creation of this index will be an overhead due to maintenance of this index after every INSER, UPDATE or DELETE operation.
2) The usage of scalar function forces SQL Server to choose a serial plan.
Because of these two reasons I decided to not create the suggested index and to decompose the initial query thus:
One query without scalar functions calls. This query inserts data that into a temporary table (#Results). This table will contain, usually, a small amount of data. Because of lack of scalar functions, the execution plan will use parallelism.
For that small amount of rows inserted into temporary table I call the scalar UDF.
Using a warm buffer, after these changes, the execution time dropped from 10 seconds to 1 second .