SPARQL/Modifiers

SERVICE - mwapi

SPARQL
Modifiers

Aggregate functions

There are 4 optional modifiers to the SELECT ... WHERE { .. }. See the chapter on SELECT for an introduction.

The five modifiers are GROUP BY ..., HAVING ..., ORDER BY ..., LIMIT ... and OFFSET ....

GROUP BY

Let us consider this list of children of Bach, listing also their mother

SELECT ?mother ?motherLabel ?child ?childLabel
WHERE
{
  ?child wdt:P22 wd:Q1339.# ?child has father Bach
  ?child wdt:P25 ?mother.
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}

Try it!

Let us group this list by the mothers.

SELECT ?mother ?motherLabel (COUNT(?child) AS ?children) 
        (GROUP_CONCAT(DISTINCT ?childLabel; SEPARATOR=", ") AS ?names)
WHERE
{
  ?child wdt:P22 wd:Q1339.# ?child has father Bach
  ?child wdt:P25 ?mother.
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en". 
                          ?mother rdfs:label ?motherLabel.
                          ?child  rdfs:label ?childLabel.
                         }
}
GROUP BY ?mother ?motherLabel

Try it!

We GROUP BY both variables ?mother and ?motherLabel because if we would leave out the label we would end up with an error "Bad Aggregate".

First we added (COUNT(?child) AS ?children) to count the number of children. COUNT is one of the Aggregate functions MIN, MAX, SUM, AVG, COUNT or SAMPLE.
Notice first that it counts the variable ?child. Notice as well that the syntax is (COUNT(?var1) AS ?var2).

As second information we added a combined list of children per mother by using (GROUP_CONCAT(DISTINCT ?var1; SEPARATOR=", ") AS ?var2).

When using GROUP_CONCAT and a label, all labels should be defined explicitly in the SERVICE.

HAVING

HAVING is always used in combination with GROUP BY

SELECT ?mother ?motherLabel (COUNT(?child) AS ?children) 
        (GROUP_CONCAT(DISTINCT ?childLabel; SEPARATOR=", ") AS ?names)
WHERE
{
  ?child wdt:P22 wd:Q1339.# ?child has father Bach
  ?child wdt:P25 ?mother.
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en". 
                          ?mother rdfs:label ?motherLabel.
                          ?child  rdfs:label ?childLabel.
                         }
}
GROUP BY ?mother ?motherLabel 
HAVING (COUNT(?child)>7)

Try it!

HAVING will filter out groups that does not meet the condition specified. In this case only one mother is shown, with 13 children.

As COUNT(?child) is bound to the variable ?children the HAVING clause can also be written as HAVING (?children>7).

The HAVING clause can be useful for finding duplicates for instance like HAVING (COUNT(?var)>1).

ORDER BY

ORDER BY something sorts the results by something. something can be any expression or variables. This expression can also be wrapped in either ASC() or DESC() to specify the sorting order (ascending or descending). (If you don’t specify either, the default is ascending sort, so ASC(something) is equivalent to just something.)

SELECT ?mother ?motherLabel ?child ?childLabel
WHERE
{
  ?child wdt:P22 wd:Q1339.# ?child has father Bach
  ?child wdt:P25 ?mother.
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
ORDER BY DESC(?motherLabel) ?childLabel

Try it!

LIMIT

LIMIT count cuts off the result list at count results, where count is any natural number. For example, LIMIT 10 limits the query to ten results. LIMIT 1 only returns a single result.

This could be useful to get the top 10 results, or only 10 random results to see how the data looks like.

OFFSET

OFFSET count can be used to skip the first number of results. OFFSET 100 LIMIT 10 returns records 101-110.

References

d:User:TweetsFactsAndQueries/A Guide To WDQS

SERVICE - mwapi

SPARQL
Modifiers

Aggregate functions