SPARQL/Expressions and Functions
Expressions
[edit | edit source]BIND
[edit | edit source]The BIND( expression AS ?variable ).
clause can be used to assign the result of an expression to a variable (usually a new variable, but you can also overwrite existing ones).
# Persons died by capital punishment
SELECT ?person ?personLabel ?personDescription ?age
WHERE
{
?person wdt:P31 wd:Q5;
wdt:P569 ?born;
wdt:P570 ?died;
wdt:P1196 wd:Q8454.
BIND(?died - ?born AS ?ageInDays).
BIND(?ageInDays/365.2425 AS ?ageInYears).
BIND(FLOOR(?ageInYears) AS ?age).
# or, as one expression:
#BIND(FLOOR((?died - ?born)/365.2425) AS ?age).
SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
BIND
can also be used to simply bind constant values to variables in order to increase readability. For example, a query that finds all female priests:
SELECT ?woman ?womanLabel
WHERE
{
?woman ?instanceOf ?human;
?sexOrGender ?female;
?occupation ?priest.
SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
# linking to wikidata Q and P codes
BIND(wdt:P31 AS ?instanceOf).
BIND(wd:Q5 AS ?human).
BIND(wdt:P21 AS ?sexOrGender).
BIND(wd:Q6581072 AS ?female).
BIND(wdt:P106 AS ?occupation).
BIND(wd:Q42603 AS ?priest).
}
The above gives the same result as
SELECT ?woman ?womanLabel
WHERE
{
?woman wdt:P31 wd:Q5; # instance of human
wdt:P21 wd:Q6581072; # sex or gender female
wdt:P106 wd:Q42603. # occupation priest
SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
IF
[edit | edit source]The IF( condition, thenExpression, elseExpression )
expressions are used to assign different values based on a condition.
For instance to calculate the age, precisely to the day. In the expression BIND( .... - (IF( condition,1,0 ) AS ?age)
the condition determines to subtract 1 or 0 (nothing) from the ?age
based on the month and the day of the birth and the death.
# Query to find all musicians who have already died
# calculate their age (full years) at death
# count how many of them died at each age
#defaultView:LineChart
SELECT ?age (COUNT (DISTINCT ?a) AS ?count)
WHERE {
?a wdt:P31 wd:Q5. #instance of human
?a wdt:P106/wdt:P279 wd:Q639669. #occupation a subclass of musician
?a p:P569/psv:P569 ?birth_date_node.
?a p:P570/psv:P570 ?death_date_node.
?birth_date_node wikibase:timeValue ?birth_date.
?death_date_node wikibase:timeValue ?death_date.
BIND( YEAR(?death_date) - YEAR(?birth_date) -
IF(MONTH(?death_date)<MONTH(?birth_date) ||
(MONTH(?death_date)=MONTH(?birth_date) && DAY(?death_date)<DAY(?birth_date)),1,0) AS ?age )
# calculate the age, precisely to the day (times and timezones ignored)
FILTER(?age > 10 && ?age < 100). #ignore outlyers, several of which are probably errors
}
GROUP BY ?age
ORDER BY ?age
Here an example to divide population in groups. The ?layer
in the #defaultView:Map
shows it in different colors.
#Big cities, grouped into map layers by population
#defaultView:Map
SELECT DISTINCT ?city ?cityLabel (SAMPLE(?location) AS ?location) (MAX(?population) AS ?population) (SAMPLE(?layer) AS ?layer)
WHERE
{
?city wdt:P31/wdt:P279* wd:Q515;
wdt:P625 ?location;
wdt:P1082 ?population.
FILTER(?population >= 500000).
BIND(
IF(?population < 1000000, "<1M",
IF(?population < 2000000, "1M-2M",
IF(?population < 5000000, "2M-5M",
IF(?population < 10000000, "5M-10M",
IF(?population < 20000000, "10M-20M",
">20M")))))
AS ?layer).
SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
GROUP BY ?city ?cityLabel
ORDER BY ?population
Operators
[edit | edit source]The familiar mathematical operators are available: +
, -
, *
, /
to add, subtract, multiply or divide numbers.
The operators <
, >
, =
, <=
, >=
are available to compare values. Comparison is also defined for other types; for example, "abc" < "abd"
is true (lexical comparison), as is "2016-01-01"^^xsd:dateTime > "2015-12-31"^^xsd:dateTime
and wd:Q4653 != wd:Q283111
. The inequality test ≠ is written !=
. The !
can also be used as a prefix to functions that results a boolean like !BOUND
and !REGEX
.
Boolean conditions can be combined with &&
(logical and: a && b
is true if both a
and b
are true) and ||
(logical or: a || b
is true if either (or both) of a
and b
is true).
Examples IF( ?a != ?b, ... , ... )
and IF( ?a = ?b && ?c = ?d, ... , ... )
.
VALUES
[edit | edit source]The VALUES ?var { val1 ... }
clause generates a variable with one or more values. The values can be of any type, like numeric, strings, or even items, like VALUES ?number { 1 2 3 }
, VALUES ?abc { "a" "b" "c" }
or VALUES ?city { wd:Q84 wd:Q90 }.
SELECT ?city ?cityLabel WHERE {
VALUES ?city { wd:Q84 wd:Q90 }. # London, Paris
SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
Values can have more dimensions too. The dimensions are grouped by ( ) like VALUES ( ?varA ?varB ... ) { ( valA1 valB1 ... ) ... }
.
SELECT ?country ?capital WHERE {
VALUES ( ?country ?capital ) {
( "United Kingdom" "London" )
( "France" "Paris" )
( "Etc" UNDEF ) # Use UNDEF for an undefined value
}
}
Functions
[edit | edit source]General functions
[edit | edit source]DATATYPE
[edit | edit source]Each value in SPARQL has a type, which tells you what kind of value it is and what you can do with it. The most important types are:
- item, like
wd:Q42
for Douglas Adams (Q42). - boolean, with the two possible values
true
andfalse
. Boolean values aren’t stored in statements, but many expressions return a boolean value, e. g.2 < 3
(true
) or"a" = "b"
(false
). - string, a piece of text. String literals are written in double quotes.
- monolingual text, a string with a language tag attached. In a literal, you can add the language tag after the string with an
@
sign, e. g."Douglas Adams"@en
. - numbers, either integers (
1
) or decimals (1.23
). - dates. Date literals can be written by adding
^^xsd:dateTime
(case sensitive –^^xsd:datetime
won’t work!) to an ISO 8601 date string:"2012-10-29"^^xsd:dateTime
.
# Date related properties of Bach
SELECT ?predicate ?pLabel ?object
WHERE
{
wd:Q1339 ?predicate ?object. # Bach
FILTER(DATATYPE(?object) = xsd:dateTime).
BIND( IRI(REPLACE( STR(?predicate),"prop/direct/","entity/" )) AS ?p).
# or ?p wikibase:directClaim ?predicate.
SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
ORDER BY ?object
Some datatypes are rdf:langString, xsd:string, xsd:integer, xsd:dateTime, xsd:decimal or "<http://www.opengis.net/ont/geosparql#wktLiteral>".
STR
[edit | edit source]The STR( value )
function converts values to a string. It also resolves prefixes, for instance wd:Q1339 will be converted into http://www.wikidata.org/entity/Q1339. The reversal of this is the IRI( string )
, which will convert a string to an IRI.
SELECT ?item ?itemLabel ?string ?stringLabel ?iri ?iriLabel
WHERE {
VALUES ?item { wd:Q1339 }.
BIND( STR(?item) AS ?string ).
BIND( IRI(?string) AS ?iri ).
SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
IRI
[edit | edit source]the IRI( string )
, which will convert a string to an IRI, by applying the prefixes.
In the example below a Predicate like wdt:P569
get converted to a string http://www.wikidata.org/prop/direct/P569. To get a Label for the predicate it first needs to be replaced to get an entity (http://www.wikidata.org/entity/P569), which with IRI
gets converted to wd:P569
. The label of this can then be displayed as "date of birth (P569)".
# Date related properties of Bach
SELECT ?object ?predicate ?string ?entity ?p ?pLabel
WHERE
{
wd:Q1339 ?predicate ?object. # Bach
FILTER(DATATYPE(?object) = xsd:dateTime).
BIND( STR(?predicate) AS ?string ).
BIND( REPLACE( ?string,"prop/direct/","entity/" ) AS ?entity ).
BIND( IRI(?entity) AS ?p ).
# or all on one line:
# BIND( IRI(REPLACE( STR(?predicate),"prop/direct/","entity/" )) AS ?p).
# This can also be written as: ?p wikibase:directClaim ?predicate.
SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
ORDER BY ?object
LANG
[edit | edit source]The LANG( string )
function retrieves the language tag of Labels, Descriptions, Aliases and of Monolingual texts.
#Countries in European Union with native name and language
SELECT ?country ?countryLabel ?nativename ?language
WHERE
{
wd:Q458 wdt:P150 ?country. # European Union contains administrative territorial entity
OPTIONAL { ?country wdt:P1705 ?nativename.
BIND( LANG(?nativename) AS ?language). }
SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
ORDER BY DESC(?language)
BOUND
[edit | edit source]The function BOUND( variable )
returns true if variable is bound to a value. It returns false otherwise. The BOUND
expression is used often with IF(condition,thenExpression,elseExpression)
expressions. For example, suppose you have a query that shows some humans, and instead of just showing their label, you’d like to display their pseudonym (P742) if they have one, and only use the label if a pseudonym doesn’t exist. For this, you select the pseudonym in an OPTIONAL
clause (it has to be optional – you don’t want to throw out results that don’t have a pseudonym), and then use BIND(IF(BOUND(…
to select either the pseudonym or the label.
SELECT ?writer ?label
WHERE
{
?writer wdt:P31 wd:Q5; # French writer
wdt:P27 wd:Q142;
wdt:P106 wd:Q36180;
wdt:P569 ?dob.
FILTER("1751-01-01"^^xsd:dateTime <= ?dob && ?dob < "1801-01-01"^^xsd:dateTime) # born in the second half of the 18th century
?writer rdfs:label ?writerLabel. # get the English label
FILTER(LANG(?writerLabel) = "en")
OPTIONAL { ?writer wdt:P742 ?pseudonym. } # get the pseudonym, if it exists
BIND(IF(BOUND(?pseudonym),?pseudonym,?writerLabel) AS ?label). # bind the pseudonym, or if it doesn’t exist the English label, as ?label
}
IN and NOT IN
[edit | edit source]The function value IN( list of values )
checks if a value is found in a list of values. For example 2 IN( 1, 2, 3 )
is true. A similar function value NOT IN( list of values )
checks if a value is not found in a list of values.
# Railway stations in London or Paris
SELECT ?cityLabel ?station ?stationLabel ?location
WHERE {
?station wdt:P31 wd:Q55488. # is a railway station
?station wdt:P131* ?city.
?station wdt:P625 ?location
FILTER( ?city IN(wd:Q84, wd:Q90) ). # in London or Paris
SERVICE wikibase:label { bd:serviceParam wikibase:language "en,fr". }
}
isBLANK
[edit | edit source]The isBLANK( variable )
expression checks if a value is an "unknown value".
#Demonstrates "unknown value" handling
SELECT ?human ?humanLabel
WHERE
{
?human wdt:P21 ?gender
FILTER isBLANK(?gender) .
SERVICE wikibase:label { bd:serviceParam wikibase:language "en" }
}
Wikidata value can also have special value "no value". With normal triples this simply results in the absence of a result.
# persons who were stateless (country of citizenship: no value) for some time (start time and end time qualifiers)
SELECT ?person ?personLabel ?start ?end
WHERE {
?person wdt:P31 wd:Q5;
p:P27 [
a wdno:P27; # no value for country of citizenship
pq:P580 ?start;
pq:P582 ?end
].
SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
ORDER BY DESC(?end - ?start)
The prefix wdno:
selects the "no value" node.
The code a
is a special word.
Functions on strings
[edit | edit source]STRLEN
[edit | edit source]The function STRLEN( string )
returns the length of a string. For instance STRLEN(?name)
where ?name
is "ABC" will return 3.
SUBSTR
[edit | edit source]The function SUBSTR( string, beginposition, stringlength )
returns a substring of a string beginning at the position marked by beginposition, and will have a length as indicated. For instance SUBSTR( ?name,2,3 )
where ?name
= "ABCDEFGH" will return "BCD". A substring without length like SUBSTR( ?name,2 )
will return "BCDEFGH".
UCASE
[edit | edit source]The function UCASE( string )
returns the string in Upper case. For instance a string "Abc" will be returned as "ABC".
LCASE
[edit | edit source]The function LCASE( string )
returns the string in Lower case. For instance a string "Abc" will be returned as "abc".
STRSTARTS
[edit | edit source]The function STRSTARTS( string, comparestring )
checks if the string starts with the compare string. For instance STRSTARTS( "ABCDEFGH", "ABC" )
returns true, indicating that the string starts with "ABC".
STRENDS
[edit | edit source]The function STRENDS( string, comparestring )
checks if the string ends with the compare string. For instance STRENDS( "ABCDEFGH", "FGH" )
returns true, indicating that the string ends with "FGH".
CONTAINS
[edit | edit source]The function CONTAINS( string, comparestring )
checks if the string contains the compare string. For instance CONTAINS( "ABCDEFGH", "DEF" )
returns true, indicating that the string contains "DEF".
STRBEFORE
[edit | edit source]The function STRBEFORE( string, comparestring )
returns the part of the string before the compare string. For instance STRBEFORE( "ABCDEFGH", "DEF" )
returns "ABC". If the compare string is not found it returns "".
STRAFTER
[edit | edit source]The function STRAFTER( string, comparestring )
returns the part of the string after the compare string. For instance STRAFTER( "ABCDEFGH", "DEF" )
returns "GH". If the compare string is not found, it returns "".
ENCODE_FOR_URI
[edit | edit source]The function ENCODE_FOR_URI( string )
converts the special characters in the string, to be able to use it in an web URL. For instance ENCODE_FOR_URI( "ABC DËFGH" )
returns "ABC%20D%C3%8BFGH". The reverse of this function is wikibase:decodeUri( string )
.
CONCAT
[edit | edit source]The function CONCAT( string1, string2 ... )
returns the concatenation of 2 or more strings. For instance CONCAT( "ABCDEFGH", "XYZ" )
returns "ABCDEFGHXYZ".
LANGMATCHES
[edit | edit source]The function LANGMATCHES( languagetag, languagerange )
checks if the language tag matches the language range. For example this code FILTER LANGMATCHES( LANG(?label), "fr" ).
filters labels in the french language. It is similar to FILTER (LANG(?label) = "fr").
with the difference that the LANGMATCHES filter will also output language tags with regions, e.g. "fr-BE" while (FILTER (LANG(?label) = "fr").
will only output the strings with the exact tag "fr". A language range of "*" matches any non-empty language-tag string.
REGEX
[edit | edit source]The function REGEX( string, pattern, flag )
checks if the string matches the pattern. It is mostly used in a FILTER
clause like FILTER REGEX( string, pattern ).
.
The pattern may contain different special characters. Blazegraph instances such as the Wikidata Query Service interpret the pattern as a Java Pattern
,[1] which is a subset of ICU regular expressions. The table below lists the most common special characters:
|
|
|
The flag is optional. Flag "i" means the match is case-insensitive.
# Names of human settlements ending in "-ow" or "-itz" in Germany
#defaultView:Map
SELECT DISTINCT ?item ?itemLabel ?coord
WHERE
{
?item wdt:P31/wdt:P279* wd:Q486972; # instance/subclass of human settlement
wdt:P17 wd:Q183; # Germany
rdfs:label ?itemLabel;
wdt:P625 ?coord;
FILTER (LANG(?itemLabel) = "de").
FILTER REGEX (?itemLabel, "(ow|itz)$").
}
LIMIT 1000
Examples
- title might be a rhyme
FILTER(REGEX(?title, "^\\w*(\\w{3})(\\W+\\w*\\1)+$", "i") && !REGEX(?title, "^(\\w+)(\\W+\\1)+$", "i")).
- title is an alliteration
FILTER(REGEX(STR(?title), "^(\\p{L})\\w+(?:\\W+\\1\\w+){2,}$", "i")).
PS: A single \ is used as an escape symbol in strings, so \\ is used to indicate a single \.
REPLACE
[edit | edit source]The function REPLACE( string, pattern, replacement, flag )
returns the string after replacing all occurrences of pattern
in string
with replacement
. pattern
is interpreted the same way as in REGEX
. The replacement
can contain $n
or ${name}
, which are replaced by the corresponding numbered or named capture group in the pattern.[1] An optional flag
affects the regular expression pattern, just as with the flag
argument to REGEX()
. For example, REPLACE( "ABCDEFGH", "DEF", "_def_" )
returns "ABC_def_GH". REPLACE( "ABCDEFGH", "[AEIOU]", "" )
removes all the vowels from the original string.
Functions on numbers
[edit | edit source]ABS
[edit | edit source]The function ABS( number )
returns the absolute value of a number. For instance, ABS( -1 )
returns 1.
ROUND
[edit | edit source]The function ROUND( number )
returns the rounded value of a number. For instance, ROUND( 1.4 )
returns 1, while ROUND( 1.6 )
returns 2.
CEIL
[edit | edit source]The function CEIL( number )
returns the largest number (round up). For instance, both CEIL( 1.4 )
and CEIL( 1.6 )
return 2.
FLOOR
[edit | edit source]The function FLOOR( number )
returns the smallest number (round down). For instance, both FLOOR( 1.4 )
and FLOOR( 1.6 )
return 1.
RAND
[edit | edit source]The function RAND( )
returns a random value between 0 and 1. For instance, RAND( )
returns 0.7156405780739334.
COUNT, MIN, MAX, AVG and SUM
[edit | edit source]The functions COUNT
, MIN
, MAX
, AVG
, and SUM
can only be used as Aggregate functions.
# average age of painters by century
SELECT ?century (AVG(?age) AS ?average_age) (ROUND(AVG(?age)) AS ?rounded_average_age)
WHERE
{
?item wdt:P31 wd:Q5. # is a human
?item wdt:P106 wd:Q1028181. # occupation painter
?item wdt:P569 ?born.
?item wdt:P570 ?died.
FILTER( ?died > ?born ).
BIND( (?died - ?born)/365.2425 AS ?age )
BIND( FLOOR(YEAR(?born)/100)*100 AS ?century )
}
GROUP BY ?century
ORDER BY ?century
For instance, BIND( MAX( ?var1, ?var2 ) AS ?max)
does NOT work. Instead, use the expression BIND( IF( ?var1>?var2, ?var1, ?var2 ) AS ?max)
.
Functions on dates and times
[edit | edit source]NOW
[edit | edit source]The function NOW( )
returns the current date and time.
With constructions like NOW() + "P1D"^^xsd:duration
it is possible to add or subtract days from the current date.
NOW() + "P1M"^^xsd:duration
will add 1 month.
NOW() + "P1Y"^^xsd:duration
will add 1 year.
You can add or subtract any combination of years, months, days and even hours, minutes and seconds using "P1Y1M1DT0H0M0.000S"^^xsd:duration
.
YEAR, MONTH and DAY
[edit | edit source]The function YEAR( datevalue )
returns the year of the date value. The functions MONTH
and DAY
return the month or day.
HOURS, MINUTES and SECONDS
[edit | edit source]The function HOURS( datevalue )
returns the hour of the date value. The functions MINUTES
and SECONDS
returns the minutes or seconds.
Currently Wikidata does not hold date values in hour, minutes or seconds.
TIMEZONE and TZ
[edit | edit source]The function TIMEZONE( datevalue )
returns the time zone of the date value. Currently in Wikidata all dates have a TIMEZONE
of "PT0S" for UTC. Other valid values would be between -PT14H and PT14H indicating the time offset in hours.
The function TZ( datevalue )
returns the time zone as a simple literal of the date value. Currently in Wikidata all dates have a TZ
of "Z" for UTC.
Example with dates
# Query to find all musicians who have already died
# calculate their age (full years) at death
# count how many of them died at each age
#defaultView:LineChart
SELECT ?age (COUNT (DISTINCT ?a) AS ?count)
WHERE {
?a wdt:P31 wd:Q5. #instance of human
?a wdt:P106/wdt:P279 wd:Q639669. #occupation a subclass of musician
?a p:P569/psv:P569 ?birth_date_node.
?a p:P570/psv:P570 ?death_date_node.
?birth_date_node wikibase:timeValue ?birth_date.
?death_date_node wikibase:timeValue ?death_date.
BIND( YEAR(?death_date) - YEAR(?birth_date) -
IF(MONTH(?death_date)<MONTH(?birth_date) ||
(MONTH(?death_date)=MONTH(?birth_date) && DAY(?death_date)<DAY(?birth_date)),1,0) AS ?age )
# calculate the age, precisely to the day (times and timezones ignored)
FILTER(?age > 10 && ?age < 100). #ignore outliers, several of which are probably errors
}
GROUP BY ?age
ORDER BY ?age
Functions on coordinates
[edit | edit source]geof:distance
[edit | edit source]The function geof:distance
returns distance between two points, in kilometers.
Example usage:
# distance between 2 cities
SELECT ?city1 ?city1Label ?location1 ?city2 ?city2Label ?location2 ?dist
WHERE
{
VALUES ?city1 { wd:Q84 }. # London
VALUES ?city2 { wd:Q90 }. # Paris
?city1 wdt:P625 ?location1.
?city2 wdt:P625 ?location2.
BIND(geof:distance(?location1, ?location2) as ?dist)
SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
geof:longitude and geof:latitude
[edit | edit source]The functions geof:longitude
and geof:latitude
convert a well-known text literal (<http://www.opengis.net/ont/geosparql#wktLiteral>) of a Point geometry to its longitude and latitude, respectively.
Example usage:
# Museums in Barcelona with coordinates
SELECT ?item ?itemLabel ?coord ?lon ?lat
WHERE
{
?item wdt:P131 wd:Q1492; # in the administrative territory of Barcelona
wdt:P31 wd:Q33506; # is a museum
wdt:P625 ?coord.
BIND(geof:longitude(?coord) AS ?lon)
BIND(geof:latitude(?coord) AS ?lat)
SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
wikibase:geoLongitude and wikibase:geoLatitude
[edit | edit source]It is also possible to retrieve the longitude and latitude of a coordinate node using wikibase:geoLongitude
and wikibase:geoLatitude
qualifiers. See here for an explanation.
# Museums in Barcelona with coordinates
SELECT ?item ?itemLabel ?coord ?lon ?lat
WHERE
{
?item wdt:P131 wd:Q1492; # in the administrative territory of Barcelona
wdt:P31 wd:Q33506; # is a museum
p:P625 [
ps:P625 ?coord;
psv:P625 [
wikibase:geoLongitude ?lon;
wikibase:geoLatitude ?lat;
]
].
SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
AUTO_COORDINATES
[edit | edit source]The code BIND("[AUTO_COORDINATES]" as ?loc)
returns the current location, if you allow your browser to use it.
# Drinking establishments near me
SELECT DISTINCT ?pub ?pubLabel ?dist
WHERE
{
BIND("[AUTO_COORDINATES]" as ?loc) .
SERVICE wikibase:around {
?pub wdt:P625 ?location .
bd:serviceParam wikibase:center ?loc.
bd:serviceParam wikibase:radius "1" .
}
FILTER EXISTS { ?pub wdt:P31/wdt:P279* wd:Q5307737 }
SERVICE wikibase:label { bd:serviceParam wikibase:language "en" }
BIND(geof:distance(?loc, ?location) as ?dist)
} ORDER BY ?dist
Aggregate functions
[edit | edit source]See Aggregate functions for COUNT, MIN, MAX, SUM, AVG, SAMPLE and GROUP_CONCAT
References
[edit | edit source]- ↑ a b The Wikidata Query Service uses a Wikimedia fork of Blazegraph that depends on OpenJDK 8. Other instances may run different versions of Java with different levels of Unicode support.