Structured Query Language/MERGE

Hint: Be carefull and deactivate AUTOCOMMIT.

In many cases, applications want to store rows in the database without knowing whether these rows previously exist in the database or not. If the rows exist, they must use the UPDATE command, if not, the INSERT command. To do so, the following construct is often used:

-- pseudocode
IF (SELECT COUNT(*) = 0 ...) THEN
  INSERT ...
ELSE
  UPDATE ...
;

This situation is unpleasant in many ways:

There are two roundtrips between application and DBMS, either SELECT + INSERT or SELECT + UPDATE.
The application must transfer one row after the other. A 'bulk storing' is not possible because the evaluation of the criterion which decides between INSERT and UPDATE may lead to different results from row to row.
The syntax is spread across three SQL statements. This is error-prone.

To overcome the disadvantages, the SQL standard defines a MERGE command, which contains the complete code shown above in one single statement. The MERGE performs an INSERT or an UPDATE depending on the existence of individual rows at the target table.

-- Define target, source, match criterion, INSERT and UPDATE within one single command
MERGE INTO  <target_table>      <target_table_alias>  -- denote the target table
      USING <syntaxhighlight_table>      <syntaxhighlight_table_alias>  -- denote the source table
      ON    (<match_criterion>)                       -- define the 'match criterion' which compares the source and 
                                                      -- target rows with the same syntax as in any WHERE clause  
  WHEN MATCHED THEN
UPDATE SET column1 = value1 [, column2 = value2 ...]  -- a variant of the regular UPDATE command
  WHEN NOT MATCHED THEN
INSERT (column1 [, column2 ...]) VALUES (value1 [, value2 ...]) -- a variant of the regular INSERT command
;

Description

The target table is named after the MERGE INTO keyword, the source table after the USING keyword.

The comparison between target rows and source rows, which is necessary to decide between INSERT and UPDATE, is specified after the ON keyword with a syntax, which is identical to the syntax of a WHERE clause. If this comparison matches, the UPDATE will be performed, else the INSERT. In simple cases, the comparison compares Primary Key or Foreign Key columns. But it is also possible to use very sophisticated conditions on any column.

In the 'MATCHED' case, a variant of the UPDATE follows. It differs from the regular UPDATE command in that it has no table name (the table name is already denoted after the MERGE INTO) and no WHERE clause (it uses the match criterion after the ON keyword).

In the 'NOT MATCHED' case, a variant of the INSERT follows. For the same reason as before, the target table is not named within the INSERT.

Example

Create a table 'hobby_shadow' to store some of the 'hobby' rows. The subsequent MERGE command shall perform an INSERT or an UPDATE depending on the existence of correlating rows.

-- store every second row in a new table 'hobby_shadow'
CREATE TABLE hobby_shadow AS SELECT * FROM hobby WHERE MOD(id, 2) = 0;
SELECT * FROM hobby_shadow;

-- INSERT / UPDATE depending on the column 'id'.
MERGE INTO  hobby_shadow                   t   -- the target
      USING (SELECT id, hobbyname, remark
             FROM   hobby)                 s   -- the source
      ON    (t.id = s.id)                      -- the 'match criterion'
  WHEN MATCHED THEN
UPDATE SET remark = concat(s.remark, ' Merge / Update')
  WHEN NOT MATCHED THEN
INSERT (id, hobbyname, remark) VALUES (s.id, s.hobbyname, concat(s.remark, ' Merge / Insert'))
;
COMMIT;

-- Check the result
SELECT * FROM hobby_shadow;

The MERGE command handles all rows, but there is only 1 round-trip between the application and the DBMS. Some of the rows are handled by the INSERT part of MERGE, others by its UPDATE part. This distinction may be observed by the last part of the column 'remark'.

Use Case

Typical use cases for the MERGE command are ETL processes. Often those processes have to aggregate some values for a grouping criterion (eg: a product line) over a time period. The first access per product line and period has to insert new rows with given values, subsequent accesses have to update them by increasing values.

Extentions

The SQL standard defines some more features within the MERGE command.

WHEN clause
The WHEN MATCHED and WHEN NOT MATCHED clauses may be extended by an optional query expression like AND (place_of_birth = 'Dallas'). As a consequence, it's possible to use a series of WHEN MATCHED / WHEN NOT MATCHED clauses.

... 
  WHEN MATCHED AND (t.hobby_name IN ('Fishing', 'Underwater Diving')) THEN
UPDATE SET remark = concat('Water sports: ', t.remark)
  WHEN MATCHED AND (t.hobby_name IN ('Astronomy', 'Microscopy', 'Literature')) THEN
UPDATE SET remark = concat('Semi-professional leisure activity: ', t.remark)
  WHEN MATCHED THEN
UPDATE SET remark = concat('Leisure activity: ', t.remark)
...
-- The same is possible with WHEN NOT MATCHED in combination with INSERT

DELETE
Within a WHEN MATCHED clause, it is possible to use a DELETE command instead of an UPDATE to remove the matched row. This feature may be combined with the previously presented extension by an optional query expression. In the SQL standard, the DELETE command is not applicable to the WHEN NOT MATCHED clause.

-- Update 'Fishing' and 'Underwater Diving'. Delete all others which have a match between source and target.
...
  WHEN MATCHED AND (t.hobby_name IN ('Fishing', 'Underwater Diving')) THEN
UPDATE SET remark = concat('Water sports: ', t.remark)
  WHEN MATCHED THEN
DELETE
...

Caveat

The MERGE command is clearly defined by standard SQL. The command itself, as well as the extensions described before, are implemented by a lot of DBMS. Deviating from the standard most implementations unfortunately use different and/or additional keywords and - sometimes - different concepts. Even the introductive keywords MERGE INTO may differ from the standard.

Exercises

A) Create a new table 'contact_merge' with the same structure as 'contact'.
B) Copy row number 3 from 'contact' to 'contact_merge'.
C) Use the MERGE command to insert/update all E-Mail-adresses from 'contact' to 'contact_merge' and add the e-mail-protocol name to the contact values (prepend column contact_value by the string 'mailto:').

Click to see solution

-- Create table and copy one row
CREATE TABLE contact_merge AS SELECT * FROM contact WHERE id = 3;
SELECT * FROM contact_merge;

-- INSERT / UPDATE depending on the column 'id'.
MERGE INTO  contact_merge                             t   -- the target
      USING (SELECT id, person_id, contact_type, contact_value
             FROM   contact
             WHERE  contact_type = 'email')           s   -- the source
      ON    (t.id = s.id)                                 -- the 'match criterion'
  WHEN MATCHED THEN
UPDATE SET contact_value = concat('mailto:', t.contact_value)
  WHEN NOT MATCHED THEN
INSERT (id, person_id, contact_type, contact_value) VALUES (s.id, s.person_id, s.contact_type, concat('mailto:', s.contact_value))
;
COMMIT;
 
-- Check the result
SELECT * FROM contact_merge;

DELETE 2

Structured Query Language
MERGE

TRUNCATE