Structured Query Language/MERGE
Hint: Be carefull and deactivate AUTOCOMMIT.
In many cases, applications want to store rows in the database without knowing whether these rows previously exist in the database or not. If the rows exist, they must use the UPDATE command, if not, the INSERT command. To do so, the following construct is often used:
-- pseudocode
IF (SELECT COUNT(*) = 0 ...) THEN
INSERT ...
ELSE
UPDATE ...
;
This situation is unpleasant in many ways:
- There are two roundtrips between application and DBMS, either SELECT + INSERT or SELECT + UPDATE.
- The application must transfer one row after the other. A 'bulk storing' is not possible because the evaluation of the criterion which decides between INSERT and UPDATE may lead to different results from row to row.
- The syntax is spread across three SQL statements. This is error-prone.
To overcome the disadvantages, the SQL standard defines a MERGE command, which contains the complete code shown above in one single statement. The MERGE performs an INSERT or an UPDATE depending on the existence of individual rows at the target table.
-- Define target, source, match criterion, INSERT and UPDATE within one single command
MERGE INTO <target_table> <target_table_alias> -- denote the target table
USING <syntaxhighlight_table> <syntaxhighlight_table_alias> -- denote the source table
ON (<match_criterion>) -- define the 'match criterion' which compares the source and
-- target rows with the same syntax as in any WHERE clause
WHEN MATCHED THEN
UPDATE SET column1 = value1 [, column2 = value2 ...] -- a variant of the regular UPDATE command
WHEN NOT MATCHED THEN
INSERT (column1 [, column2 ...]) VALUES (value1 [, value2 ...]) -- a variant of the regular INSERT command
;
Description
[edit | edit source]The target table is named after the MERGE INTO keyword, the source table after the USING keyword.
The comparison between target rows and source rows, which is necessary to decide between INSERT and UPDATE, is specified after the ON keyword with a syntax, which is identical to the syntax of a WHERE clause. If this comparison matches, the UPDATE will be performed, else the INSERT. In simple cases, the comparison compares Primary Key or Foreign Key columns. But it is also possible to use very sophisticated conditions on any column.
In the 'MATCHED' case, a variant of the UPDATE follows. It differs from the regular UPDATE command in that it has no table name (the table name is already denoted after the MERGE INTO) and no WHERE clause (it uses the match criterion after the ON keyword).
In the 'NOT MATCHED' case, a variant of the INSERT follows. For the same reason as before, the target table is not named within the INSERT.
Example
[edit | edit source]Create a table 'hobby_shadow' to store some of the 'hobby' rows. The subsequent MERGE command shall perform an INSERT or an UPDATE depending on the existence of correlating rows.
-- store every second row in a new table 'hobby_shadow'
CREATE TABLE hobby_shadow AS SELECT * FROM hobby WHERE MOD(id, 2) = 0;
SELECT * FROM hobby_shadow;
-- INSERT / UPDATE depending on the column 'id'.
MERGE INTO hobby_shadow t -- the target
USING (SELECT id, hobbyname, remark
FROM hobby) s -- the source
ON (t.id = s.id) -- the 'match criterion'
WHEN MATCHED THEN
UPDATE SET remark = concat(s.remark, ' Merge / Update')
WHEN NOT MATCHED THEN
INSERT (id, hobbyname, remark) VALUES (s.id, s.hobbyname, concat(s.remark, ' Merge / Insert'))
;
COMMIT;
-- Check the result
SELECT * FROM hobby_shadow;
The MERGE command handles all rows, but there is only 1 round-trip between the application and the DBMS. Some of the rows are handled by the INSERT part of MERGE, others by its UPDATE part. This distinction may be observed by the last part of the column 'remark'.
Use Case
[edit | edit source]Typical use cases for the MERGE command are ETL processes. Often those processes have to aggregate some values for a grouping criterion (eg: a product line) over a time period. The first access per product line and period has to insert new rows with given values, subsequent accesses have to update them by increasing values.
Extentions
[edit | edit source]The SQL standard defines some more features within the MERGE command.
WHEN clause
The WHEN MATCHED and WHEN NOT MATCHED clauses may be extended by an optional query expression like AND (place_of_birth = 'Dallas')
. As a consequence, it's possible to use a series of WHEN MATCHED / WHEN NOT MATCHED clauses.
...
WHEN MATCHED AND (t.hobby_name IN ('Fishing', 'Underwater Diving')) THEN
UPDATE SET remark = concat('Water sports: ', t.remark)
WHEN MATCHED AND (t.hobby_name IN ('Astronomy', 'Microscopy', 'Literature')) THEN
UPDATE SET remark = concat('Semi-professional leisure activity: ', t.remark)
WHEN MATCHED THEN
UPDATE SET remark = concat('Leisure activity: ', t.remark)
...
-- The same is possible with WHEN NOT MATCHED in combination with INSERT
DELETE
Within a WHEN MATCHED clause, it is possible to use a DELETE command instead of an UPDATE to remove the matched row. This feature may be combined with the previously presented extension by an optional query expression. In the SQL standard, the DELETE command is not applicable to the WHEN NOT MATCHED clause.
-- Update 'Fishing' and 'Underwater Diving'. Delete all others which have a match between source and target.
...
WHEN MATCHED AND (t.hobby_name IN ('Fishing', 'Underwater Diving')) THEN
UPDATE SET remark = concat('Water sports: ', t.remark)
WHEN MATCHED THEN
DELETE
...
Caveat
[edit | edit source]The MERGE command is clearly defined by standard SQL. The command itself, as well as the extensions described before, are implemented by a lot of DBMS. Deviating from the standard most implementations unfortunately use different and/or additional keywords and - sometimes - different concepts. Even the introductive keywords MERGE INTO may differ from the standard.
Exercises
[edit | edit source]A) Create a new table 'contact_merge' with the same structure as 'contact'.
B) Copy row number 3 from 'contact' to 'contact_merge'.
C) Use the MERGE command to insert/update all E-Mail-adresses from 'contact' to 'contact_merge' and add the e-mail-protocol name to the contact values (prepend column contact_value by the string 'mailto:').
-- Create table and copy one row
CREATE TABLE contact_merge AS SELECT * FROM contact WHERE id = 3;
SELECT * FROM contact_merge;
-- INSERT / UPDATE depending on the column 'id'.
MERGE INTO contact_merge t -- the target
USING (SELECT id, person_id, contact_type, contact_value
FROM contact
WHERE contact_type = 'email') s -- the source
ON (t.id = s.id) -- the 'match criterion'
WHEN MATCHED THEN
UPDATE SET contact_value = concat('mailto:', t.contact_value)
WHEN NOT MATCHED THEN
INSERT (id, person_id, contact_type, contact_value) VALUES (s.id, s.person_id, s.contact_type, concat('mailto:', s.contact_value))
;
COMMIT;
-- Check the result
SELECT * FROM contact_merge;