SPARQL/SELECT
The SELECT
clause consists of 2 or 3 parts.
SELECT ... query result variables ...
WHERE {
... query pattern ...
}
... optional query modifiers ...
GROUP BY ...
HAVING ...
ORDER BY ...
LIMIT ...
The first part is the query result variables. These will be shown when the query is executed. The second part is the WHERE
clause with the query pattern. This defines the data selection and generates the variables, The last part are the optional modifiers.
Example
SELECT ?child ?childLabel
WHERE
{
# ?child father Bach
?child wdt:P22 wd:Q1339.
SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
Query result variables
[edit | edit source]SELECT ?child ?childLabel ...
In the above example ?child
and ?childLabel
are the variables. The variables are separated by spaces. A variable can be displayed with another name (Alias) by using AS
, like (?child AS ?Child_of_Bach)
. Notice that the alias again should be a variable, and the combination should start and end with brackets (?a AS ?b)
.
PS. For aliases of Labels the label should be defined explicitly in the SERVICE
. Alternatively the variable could be named as requested there as well.
SELECT (?child AS ?Child_of_Bach) (?childLabel AS ?Name)
WHERE
{
# ?child father Bach
?child wdt:P22 wd:Q1339.
SERVICE wikibase:label { bd:serviceParam wikibase:language "en".
?child rdfs:label ?childLabel.
}
}
Query pattern
[edit | edit source]The query pattern specifies the data selection and generates the variables,
# ?child father Bach ?child wdt:P22 wd:Q1339.
In this case the triple ?child wdt:p22 wd:Q1339
specifies that the variable ?child
must have the parent/father Bach.
Any of the triple parts Subject, Predicate and Object may be variables. This makes this selection very versatile.
Additional triples can be added, for instance to show gender, birth date and date of death. Each sentence should end with a period. The new variable should be added at the top (query result variables) to display them.
SELECT ?child ?childLabel ?genderLabel ?birth_date ?date_of_death
WHERE
{
?child wdt:P22 wd:Q1339.# ?child has father Bach
?child wdt:P21 ?gender.
?child wdt:P569 ?birth_date.
?child wdt:P570 ?date_of_death.
SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
Optional query modifiers
[edit | edit source]See the chapter modifiers for a full description.
The above query can be sorted by the birth date by adding ORDER BY ?birth_date
after the last curly bracket of the WHERE { }
clause
SELECT ?child ?childLabel ?genderLabel ?birth_date ?date_of_death
WHERE
{
?child wdt:P22 wd:Q1339.# ?child has father Bach
?child wdt:P21 ?gender.
?child wdt:P569 ?birth_date.
?child wdt:P570 ?date_of_death.
SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
ORDER BY ?birth_date
Removing duplicates
[edit | edit source]You might have noticed that the above query resulted in 21 records, while Bach had only 20 children. This is caused by 2 entries of Johann Christoph Friedrich Bach, because there are 2 different birth dates, 21 and 23 of June 1732.
Generally it is advised to use DISTINCT
in the result variables like
SELECT DISTINCT ?child ?childLabel ?genderLabel ?birth_date ?date_of_death
WHERE
{
?child wdt:P22 wd:Q1339.# ?child has father Bach
?child wdt:P21 ?gender.
?child wdt:P569 ?birth_date.
?child wdt:P570 ?date_of_death.
SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
ORDER BY ?child
DISTINCT
removes duplicate entries, but in this case that does not help as the entries are different. To be helpful ALL parameters should have been equal.
What helps is to group by child and combine the values of ?birth_date
. Also ?date_of_death
and ?genderLabel
are grouped to be consistent
SELECT ?child ?childLabel
(GROUP_CONCAT(DISTINCT ?genderLabel; SEPARATOR=", ") AS ?genderLabels)
(GROUP_CONCAT(DISTINCT ?birth_date; SEPARATOR=", ") AS ?birth_dates)
(GROUP_CONCAT(DISTINCT ?date_of_death; SEPARATOR=", ") AS ?dates_of_death)
WHERE
{
?child wdt:P22 wd:Q1339.# ?child has father Bach
?child wdt:P21 ?gender.
?child wdt:P569 ?birth_date.
?child wdt:P570 ?date_of_death.
SERVICE wikibase:label { bd:serviceParam wikibase:language "en".
?child rdfs:label ?childLabel.
?gender rdfs:label ?genderLabel.
}
}
GROUP BY ?child ?childLabel
ORDER BY ?birth_dates
The optional query modifier used is GROUP BY
, and the variables are combined by using (GROUP_CONCAT(DISTINCT ?var1; SEPARATOR=", ") AS ?var2)
. The ORDER BY
has been modified by using the combined variable ?birth_dates
, instead of ?birth_date
. All labels should be defined explicitly in the SERVICE
.
See the chapter modifiers for a full description.
Another way to remove duplicates it so show only one of the possible values, by using one of the Aggregate functions MIN
, MAX
, SUM
or AVG
.
SELECT ?child ?childLabel
(MIN(?genderLabel) AS ?genderlabel1)
(MIN(?birth_date) AS ?birth_date1)
(MAX(?date_of_death) AS ?date_of_death1)
WHERE
{
?child wdt:P22 wd:Q1339.# ?child has father Bach
?child wdt:P21 ?gender.
?child wdt:P569 ?birth_date.
?child wdt:P570 ?date_of_death.
SERVICE wikibase:label { bd:serviceParam wikibase:language "en".
?child rdfs:label ?childLabel.
?gender rdfs:label ?genderLabel.
}
}
GROUP BY ?child ?childLabel
ORDER BY ?birth_date1
The optional query modifier used is also GROUP BY
, and the variables are combined by using (MIN(?var) AS ?var1)
. The ORDER BY
has been modified by using the first birth date ?birth_date1
, instead of ?birth_date
. All labels should also be defined explicitly in the SERVICE
.
Adding missing entries
[edit | edit source]Let's try one of the above queries for Obama (Q76):
SELECT DISTINCT ?child ?childLabel ?genderLabel ?birth_date ?date_of_death
WHERE
{
?child wdt:P22 wd:Q76.# ?child has father Obama
?child wdt:P21 ?gender.
?child wdt:P569 ?birth_date.
?child wdt:P570 ?date_of_death.
SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
ORDER BY ?child
This results in NO results although Obama has 2 children. The reason is that to match this query, a potential result (a child) must match all the triples we listed: it must have a gender, and a birth data, and a date of death. If one or more of those properties don't exist, it won’t match. And that’s not what we want in this case: we primarily want a list of all the children – if additional data is available, we’d like to include it, but we don’t want that to limit our list of results.
The solution is to tell WDQS that those triples are OPTIONAL
:
SELECT DISTINCT ?child ?childLabel ?genderLabel ?birth_date ?date_of_death
WHERE
{
?child wdt:P22 wd:Q76.# ?child has father Obama
OPTIONAL{ ?child wdt:P21 ?gender. }
OPTIONAL{ ?child wdt:P569 ?birth_date. }
OPTIONAL{ ?child wdt:P570 ?date_of_death. }
SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
ORDER BY ?child
Now both children are shown, and it is clear that the date of death is not filled in yet, as these children are still alive.
See the chapter OPTIONAL for a full description.
Fool proof: Add missing and remove duplicate entries
[edit | edit source]A fool proof list of children and their details of any given father would be
SELECT ?child ?childLabel
(GROUP_CONCAT(DISTINCT ?genderLabel; SEPARATOR=", ") AS ?genderLabels)
(GROUP_CONCAT(DISTINCT ?birth_date; SEPARATOR=", ") AS ?birth_dates)
(GROUP_CONCAT(DISTINCT ?date_of_death; SEPARATOR=", ") AS ?dates_of_death)
WHERE
{
?child wdt:P22 wd:Q76.# ?child has father Obama
OPTIONAL{ ?child wdt:P21 ?gender. }
OPTIONAL{ ?child wdt:P569 ?birth_date. }
OPTIONAL{ ?child wdt:P570 ?date_of_death. }
SERVICE wikibase:label { bd:serviceParam wikibase:language "en".
?child rdfs:label ?childLabel.
?gender rdfs:label ?genderLabel.
}
}
GROUP BY ?child ?childLabel
ORDER BY ?birth_dates
This combines the OPTIONAL
clause in case a property is not present and the GROUP_CONCAT
clause in case of multiple properties.