PostgreSQL/Select without elimination
One common problem faced with SQL is using boolean conditionals to narrow down a set without any desire of eliminating all possible results. In this example you seek a best match, if one can be given or NULL. This can be easily and cleanly achieved relationally with a slight abuse of UNION ALL. There are many ways to achieve this procedurally such as fuzzy matching. [1]
Description
[edit | edit source]The basic idea is to SELECT *
eliminating nothing, UNION ALL
against a SELECT
on that same set (different instance) such that there is a possibility of tuples being eliminated in the UNION'ed set. Then you are going to count()
the occurrence of all tuples, each tuple will have an occurrence of at least one (the LHS of the first UNION ALL
) this will result in no eliminations. This resulting set you will select the max()
of the count()
.
Synopsis
[edit | edit source]Source table
[edit | edit source]db=# select * from foo;
bar ----- 5 4 3 6 (4 rows)
SQL Non-elimination conditional
[edit | edit source]SELECT count(foo.bar), bar
FROM (
SELECT *
FROM foo
UNION ALL
SELECT *
FROM foo
WHERE bar = 4 -- Conditional that will not eliminate
)
AS foo
GROUP BY bar
;
Result
[edit | edit source]The key here, is if the conditional bar = 50
was used, you would still be left with the original dataset.
count | bar -------+----- 1 | 5 2 | 4 1 | 3 1 | 6 (3 rows)
Example
[edit | edit source]A third party data provider has given you a set of data that you wish to select the best match from. You're equipped with an array of things that might help you identify a best match, but you want to always return one tuple, that is to say you don't ever want to eliminate all tuples returned. Your data that is supposed to only trim down the superfluous matches.
Let's use for instance a vin 1GCEC14C57Z590788:
- They call a trim "Work Truck" you call the trim "1WT".
You don't want to eliminate all vehicles because of this. So here using a conditional trim LIKE '%1WT%'
, will not work, because 1WT is not contained in "Work Truck".
Now let's assume that other information provided would help you better identify what style your vehicle matches up against. As is the case of the "MSRP", if is within a range of 1.00 USD of 18,123 you can successfully eliminate all but one style.
Observe one way we could do this:
Footnotes
[edit | edit source]- ^ Fuzzy matching often uses a simple Levenshtein edit distance formula. Using plperl (Perl), you can easily accomplish this using
String::Approx
module'sadist()
.