Encapsulation of several SQL DML statements like INSERT, UPDATE, DELETE, MERGE in a single transaction can be beneficial for performance compared with approach which doesn’t uses explicit transactions.
To test above hypothesis I have created a small table named dbo.SalesOrder and two batches that insert 100.000 rows (per batch):
First batch uses autocommit transactions (in this context, for every INSERT statement SQL Server will create a transaction that will be committed
or rolled backwhen statement execution completes).
Second batch uses explicit transactions to encapsulate every five INSERT statements in a single transaction.
To measure the performance of every batch I have used two performance counters:
Log Bytes Flushed/sec
Log Flushes/sec (number of I/O operations per second)
As you can see in this simple test, encapsulation of several DML statement in a single transaction increased the volume of data written to disk / second (Log Bytes Flushed/sec) while the number of physical write operations into database log (Log Flushes/sec) and the execution time decreased.
Note #1: the main reasons for using transactions should be the ACID properties but the way we manage the D(durability) can have significant impact on database performance.
Note #2: this test is simple. Wasn’t my goal to do a full performance test. You should do your own tests.
SQL Server supports a subset of the XQuery language. One of features implemented within SQL Server XQuery are comparison operators. There are two kinds of comparison operators:
Operators for sequences comparison (=, !=, <, <=, >, >=) and
Operators for singletons comparison (named also scalar values; eq, ne, lt, le, gt, ge).
Two operators are available for equality comparison: = and eq .
First operator (=) can be used to compare sequences and / or singleton values / operands. Second operator (eq) can be used only with singleton values / operands. When two sequences are compared and both sequences have at least one common value the result is true, otherwise (there is no common value) the result is false.
Example 1: sequences and = operator
Example 2: singleton values and = operator
Example 3: sequences and eq operator
Example 4 singleton values and eq operator
Solution: one solution is to use DATEDIFF function to compute difference in weeks between two dates:
In this example, the result is 1 week and can be interpreted also as 1 full weekend (Saturday + Sunday).
Why it works ? DATEDIFF function has an interesting behavior when it comes about @@DATEFIRST setting and weeks:
Specifying SET DATEFIRST has no effect on DATEDIFF. DATEDIFF always uses Sunday as the first day of the week to ensure the function is deterministic.
Because for DATEDIFF function, Sunday is always the first day of week, a simple difference in weeks between Saturday (ex. 2015-03-28) and next day which is Sunday (ex. 2015-03-29) will give 1 [week] and also this result can be interpreted as 1 weekend:
More, difference in weeks between Saturday (ex. 2015-03-28) and the next Saturday (ex. 2015-04-04) will give 1 week which can be considered to be 1 full weekend even there are 1 Saturday + 1 Sunday and another Saturday:
SQL Profiler output:
Note: this solution should be used only for debugging.
Note #2: It won’t work with user defined functions.
What can be done to optimize (minimize execution time of) bulk insert operations of large *.csv files into a SQL Server database ? Here you may find an excellent article on this subject.
But there is also a simple thing you should consider: autogrowth events. These operations can slow down some operations executed on a SQL Server database (including bulk insert operations). This means that if we eliminate these auto growth events the execution time will decrease.
To test this hypothesis I’ve done two sets of tests using a *.csv file (4.366.563 rows) which has a size of 462 MB uncompressed:
One test using default database options (for size and auto growth) and
The other test use a database with an initial size of 1GB and a log file with an initial size of 500 MB. Because source file uses UTF-8 encoding the text columns are defined using NVARCHAR data type, for every char SQL Server uses two bytes (without data compression).
I’ve used following scripts:
Target table was created using the next script:
To import data I’ve used an SSIS package:
Bellow are my results:
|Test 1||Test 2||Difference|
|Average execution time (sec.)||326||222||104|
|Average execution time (mm:ss)||5:26||3:42||1:44|
As you can see, simply eliminating auto growth events can reduce significantly the execution time when importing large files. This is possible because autogrowth events are expensive.
Note: It’s worth reading about Instant File Initialization (not used in these tests).
sp_DeadlockAnalyzer is a stored procedure created with one goal: to help DBAs to debug deadlocks in SQL Server. It has one parameter: @xdl which is the XML content from [deadlock graph event] intercepted with SQL Trace (SQL Profiler). It shows information about isolation level, blocked resources, connections and T-SQL statements with their execution plans. The stored procedure should be executed in the context of the original database (ex. SimpleDeadlock in my example).
- Information about execution plan operators (and corresponding T-SQL statements) which cause every lock (see section [blocked resources]).
- To correlate information about statement’s type (INSERT, UPDATE, DELETE, MERGE, SELECT), table hints (ex. XLOCK), isolation levels, foreign keys.
- To show information about how can be deadlock solved.
- To test using deadlock graph events generated by SQL Trace/SQL Profiler != SQL Server 2012 (I’ve used only this version for tests) and by Extended Events sessions (including here system_health; >= SQL Server 2008).
2014-06-10 Small revision of source code (SUBSTRING: ISNULL, NULLIF)