Yet another SQL Server enthusiast

2014-01-22

UPSERT – safe solutions for SQL Server [DRAFT]

Filed under: concurrency,EXISTS,MERGE,sql server,UPSERT — sa @ 9:39 PM

This article tackles how to implement safe UPSERTs from concurrency point of view. A previous blog has presented a common solution for UPSERT and it has demonstrated that this solution isn’t safe. Current blog specifies common solutions for UPSERT operations with a single row.

To discuss these solutions I will use the same table dbo.Customer:

Code Snippet
IF OBJECT_ID(N'dbo.Customer') IS NOT NULL
    DROP TABLE dbo.Customer;
CREATE TABLE dbo.Customer (
    CustomerID INT IDENTITY(1,1) PRIMARY KEY,
    Email NVARCHAR(100) NOT NULL,
    Note NVARCHAR(50) NULL,    
    OrdersCount INT NOT NULL DEFAULT (1)
);
GO

If we want to insert a new customer then the identifier used for customers is the email address. If there is no email address stored within dbo.Customer then we need to insert a new customer with 1 order. Otherwise (the customer’s email already exists) we need to increment by 1 the OrdersCount column.

From this point of view we need also an unique index on Email column:

Code Snippet
CREATE UNIQUE INDEX IUN_Customer_Email
ON dbo.Customer(Email);

This index has two important roles:

  1. It prevents duplicated emails and
  2. It helps those queries which tries to find customers using the email address as identifier (WHERE Email = constant / @variable).

Solutions for a single row

Code Snippet
— Solution #1: IF EXISTS
— Parameters
DECLARE @Email NVARCHAR(100)
SET @Email = 'b@b.c';
— End of Parameters
BEGIN TRANSACTION;
    IF EXISTS(SELECT * FROM dbo.Customer WITH(XLOCK, HOLDLOCK) WHERE Email = @Email)
    BEGIN
        UPDATE  dbo.Customer
        SET        OrdersCount = OrdersCount + 1
        WHERE   Email = @Email;
    END
    ELSE
    BEGIN
        INSERT    dbo.Customer(Email)
        VALUES    (@Email) — OrdersCount = Default 1
    END    
COMMIT; — Use ROLLBACK for testing
GO

— Solution #2: MERGE
— Parameters
DECLARE @Email NVARCHAR(100)
SET @Email = 'b@b.b';
— End of Parameters
BEGIN TRANSACTION;
    MERGE    dbo.Customer WITH(HOLDLOCK) trg
    USING    (VALUES(@Email)) src(Email) ON trg.Email = src.Email
    WHEN MATCHED
        THEN    
        UPDATE
        SET        OrdersCount = trg.OrdersCount + 1
    WHEN NOT MATCHED BY TARGET
        THEN    
        INSERT (Email)
        VALUES    (@Email);
COMMIT;
GO

— Solution #3: UPDATE + INSERT
— Parameters
DECLARE @Email NVARCHAR(100)
— End of Parameters
SET @Email = 'b@b.b';
BEGIN TRANSACTION;
    UPDATE    trg
    SET        OrdersCount = OrdersCount + 1
    FROM    dbo.Customer trg WITH(HOLDLOCK)
    WHERE    trg.Email = @Email;

    IF @@ROWCOUNT = 0
        INSERT    dbo.Customer(Email)
        VALUES    (@Email);        
COMMIT;
GO

Solutions for multiple rows

Code Snippet
— Solution #1: EXISTS
— Parameters
DECLARE @CustomerEmail TABLE( Email NVARCHAR(100) NOT NULL PRIMARY KEY );
INSERT    @CustomerEmail (Email) VALUES (N'a@a.a');
INSERT    @CustomerEmail (Email) VALUES (N'c@c.c');
INSERT    @CustomerEmail (Email) VALUES (N'd@d.d');
— End of Parameters
DECLARE @CustomerEmailForUPDATE TABLE(
    Email NVARCHAR(100) NOT NULL PRIMARY KEY
);

BEGIN TRANSACTION;
    INSERT    @CustomerEmailForUPDATE (Email)
    SELECT    ce.Email
    FROM    @CustomerEmail ce
    WHERE    EXISTS(SELECT * FROM dbo.Customer c WITH(XLOCK, HOLDLOCK) WHERE c.Email = ce.Email);

    UPDATE  dbo.Customer
    SET        OrdersCount = OrdersCount + 1
    FROM    dbo.Customer c
    INNER JOIN @CustomerEmailForUPDATE cefu ON c.Email = cefu.Email
    
    INSERT  dbo.Customer(Email)
    SELECT    ce.Email — OrdersCount = Default 1
    FROM    @CustomerEmail ce
    WHERE    NOT EXISTS(SELECT * FROM @CustomerEmailForUPDATE cefu WHERE cefu.Email = ce.Email)
COMMIT; — Use ROLLBACK for testing
GO

— Solution #2: MERGE
— Parameters
DECLARE @CustomerEmail TABLE( Email NVARCHAR(100) NOT NULL PRIMARY KEY );
INSERT    @CustomerEmail (Email) VALUES (N'a@a.a');
INSERT    @CustomerEmail (Email) VALUES (N'c@c.c');
INSERT    @CustomerEmail (Email) VALUES (N'd@d.d');
— End of Parameters
BEGIN TRANSACTION;
    MERGE    dbo.Customer WITH(HOLDLOCK) trg
    USING    @CustomerEmail src ON trg.Email = src.Email
    WHEN MATCHED
        THEN    
        UPDATE
        SET        OrdersCount = trg.OrdersCount + 1
    WHEN NOT MATCHED BY TARGET
        THEN    
        INSERT (Email)
        VALUES    (src.Email);
COMMIT; — Use ROLLBACK for testing
GO

— Solution #3: UPDATE + INSERT
— Parameters
DECLARE @CustomerEmail TABLE( Email NVARCHAR(100) NOT NULL PRIMARY KEY );
INSERT    @CustomerEmail (Email) VALUES (N'a@a.a');
INSERT    @CustomerEmail (Email) VALUES (N'c@c.c');
INSERT    @CustomerEmail (Email) VALUES (N'd@d.d');
— End of Parameters

DECLARE @CustomerEmailUPDATED TABLE(
    Email NVARCHAR(100) NOT NULL PRIMARY KEY
);
BEGIN TRANSACTION;
    UPDATE    trg
    SET        OrdersCount = OrdersCount + 1
    OUTPUT    inserted.Email INTO @CustomerEmailUPDATED
    FROM    dbo.Customer trg WITH(HOLDLOCK)
    INNER JOIN @CustomerEmail src ON trg.Email = src.Email;

    INSERT    dbo.Customer(Email)
    SELECT    ce.Email
    FROM    @CustomerEmail ce
    WHERE    NOT EXISTS(SELECT * FROM @CustomerEmailUPDATED upd WHERE upd.Email = ce.Email);
COMMIT;
GO

The next blog

  • will present solutions for scenarios where we need to insert or update multiple rows (2014-02-18) and
  • will discuss pros and cons of these solutions.

Revision History:

2014-02-08: First update. I have added a new section for multiple rows. Solutions for a single row: some minor changes.

Advertisements

2014-01-05

Is it safe IF EXISTS – UPDATE ELSE INSERT [UPSERT] ?

Filed under: concurrency,EXISTS,index,MERGE,safe,sql server,UPSERT — sa @ 10:47 PM

Is it safe the following approach for UPSERT ?

Code Snippet
IF EXISTS (SELECT FROM MyTable WHERE ID = @param1)
    UPDATE    MyTable
    SET        Col1 = NewValue
    WHERE    ID = @param1
ELSE
    INSERT    MyTable(ID, Col1)
    VALUES    (@param1, @param2)

The short answer is no!

Example:

  1. The transaction isolation level is READ COMMITTED which is the default isolation level.
  2. For this demo I’ve used a customer table to store the customer’s email and the number of sales orders (for every customer).
  3. I’ve inserted two customers (1 – a@a.a; 2 – b@b.b).
  4. I’ve created a stored procedure for UPSERT (dbo.InsertOrder). This stored procedure receive the customer’s email, it checks if exists this email and if not then it inserts the new customer (sales orders 1). Otherwise it increments the number of sales orders by 1.

T-SQL script:

Code Snippet
IF OBJECT_ID(N'dbo.Customer') IS NOT NULL
    DROP TABLE dbo.Customer;
IF OBJECT_ID(N'dbo.InsertOrder') IS NOT NULL
    DROP PROCEDURE dbo.InsertOrder;
GO

CREATE TABLE dbo.Customer (
    CustomerID INT IDENTITY(1,1) PRIMARY KEY,
    Email NVARCHAR(100) NOT NULL,
    Note NVARCHAR(50) NULL,    
    OrdersCount INT NOT NULL DEFAULT (1)
);
GO

INSERT    dbo.Customer (Email)
VALUES    (N'a@a.a'), — CustomerID 1
        (N'b@b.b'); — CustomerID 2
GO

CREATE PROCEDURE dbo.InsertOrder (
    @Email NVARCHAR(100)
)
AS
BEGIN
    IF EXISTS(SELECT * FROM dbo.Customer WHERE Email = @Email)
    BEGIN
        UPDATE    dbo.Customer
        SET        OrdersCount = OrdersCount + 1
        WHERE    Email = @Email;
    END
    ELSE
    BEGIN
        WAITFOR DELAY '00:00:03';
        INSERT    dbo.Customer (Email)
        VALUES    (@Email) — OrdersCount = Default 1
    END
END;
GO

Note: WAITFOR DELAY is used to delay the INSERTion of the new customers with just 3 seconds.

Now we can start this test executing following statements in separate windows:

Is.NOT.Safe.If.Exists.Update.Insert.UPSERT

Query window #1 (Step 1):

Code Snippet
BEGIN TRANSACTION
    UPDATE    dbo.Customer
    SET        Note = N'Note#2'
    WHERE    Email = N'b@b.b';
— ROLLBACK — or COMMIT

At this moment the CustomerID = 2 is locked X (exclusive) and no other transaction can read (excepting NOLOCK and READ UNCOMMITTED), update or delete this record. Because this transaction remains open (no COMMIT or ROLLBACK) this record will be locked till the end of transaction.

Query window #2 & #3 (Step 2 & Step 3):

Code Snippet
EXEC dbo.InsertOrder N'e@e.e';
SELECT * FROM dbo.Customer;

At this moment the IF EXISTS(SELECT * FROM dbo.Customer WHERE Email = @Email) statement will try to read the record with CustomerID = 1 and then the record with CustomerID = 2 requesting a S lock (shared). Because X and S locks aren’t compatible (see Lock Compatibility) both (Step 2 & 3) SELECT * FROM dbo.Customer WHERE Email = @Email statements will be blocked. Note: you can see the “Executing ….” message within tab title.

Query window #1 (Step 4):

I execute the ROLLBACK or COMMIT statement which ends the initial transaction. The X lock for CustomerID = 2 record is released and the SELECT statements (SELECT * FROM dbo.Customer WHERE Email = @Email; Step 2 & 3)  continue to be executed. Every EXISTS(SELECT …. WHERE Email = ‘e@e.e’) statement will return FALSE and both stored procedures will INSERT the customer `e@e.e`.

This is the reason why I get duplicated emails at the end (Step 5) of my example:

Results.Step5Solutions ?

Simplest solution is to create an unique index/constraint on Email column.

Code Snippet
CREATE UNIQUE INDEX IUN_Customer_Email
ON dbo.Customer(Email);

This UNIQUE index/constraint guarantees that we can’t have duplicated emails. But this doesn’t means this UPSERT approach is 100% safe because dbo.InsertOrder stored procedure still tries to INSERT duplicated emails (at Step 5):

Results.Step5.With.UNIQUE.index

Next blog On Jan 27 I will discuss what solutions do we have for safe UPSERTs.

See also:

Is it safe IF EXISTS – UPDATE ELSE INSERT [UPSERT] ? #2

Is it safe IF EXISTS – UPDATE ELSE INSERT [UPSERT] ? #3

Revision History:

2014-01-12: I changed the last paragraph.

2014-01-27: I changed the last paragraph.

2014-04-06: I added the [See also] section

Create a free website or blog at WordPress.com.