Jump to content

SPARQL/Printable version

From Wikibooks, open books for an open world


SPARQL

The current, editable version of this book is available in Wikibooks, the open-content textbooks collection, at
https://en.wikibooks.org/wiki/SPARQL

Permission is granted to copy, distribute, and/or modify this document under the terms of the Creative Commons Attribution-ShareAlike 3.0 License.

Basics

SPARQL may look complicated, but the simple basics will already get you a long way – if you want, you can stop reading after this chapter, and you’ll already know enough to write many interesting queries. The other chapters just add information about more topics that you can use to write different queries. Each of them will empower you to write even more awesome queries, but none of them are necessary – you can stop reading at any point and hopefully still walk away with a lot of useful knowledge!

Also, if you’ve never heard of Wikidata, SPARQL, or WDQS before, here’s a short explanation of those terms:

SPARQL basics

[edit | edit source]

A simple SPARQL query looks like this:

SELECT ?a ?b ?c
WHERE
{
  x y ?a.
  m n ?b.
  ?b f ?c.
}

The SELECT clause lists variables that you want returned (variables start with a question mark), and the WHERE clause contains restrictions on them, mostly in the form of triples. All information in Wikidata (and similar knowledge databases) is stored in the form of triples; when you run the query, the query service tries to fill in the variables with actual values so that the resulting triples appear in the knowledge database, and returns one result for each combination of variables it finds.

A triple can be read like a sentence (which is why it ends with a period), with a subject, a predicate, and an object:

SELECT ?fruit
WHERE
{
  ?fruit hasColor yellow.
  ?fruit tastes sour.
}

The results for this query could include, for example, “lemon”. In Wikidata, most properties are “has”-kind properties, so the query might instead read:

SELECT ?fruit
WHERE
{
  ?fruit color yellow.
  ?fruit taste sour.
}

which reads like “?fruit has color ‘yellow’” (not?fruit is the color of ‘yellow’” – keep this in mind for property pairs like “parent”/“child”!).

However, that’s not a good example for WDQS. Taste is subjective, so Wikidata doesn’t have a property for it. Instead, let’s think about parent/child relationships, which are mostly unambiguous.

Our first query

[edit | edit source]

Suppose we want to list all children of the baroque composer Johann Sebastian Bach. Using pseudo-elements like in the queries above, how would you write that query?

Hopefully you got something like this:

SELECT ?child
WHERE
{
  # either this...
  ?child parent Bach.
  # or this...
  ?child father Bach.
  # or this.
  Bach child ?child.
  # (note: everything after a ‘#’ is a comment and ignored by WDQS.)
}

The first two triples say that the ?child must have the parent/father Bach; the third says that Bach must have the child ?child. Let’s go with the second one for now.

So what remains to be done in order to turn this into a proper WDQS query? On Wikidata, items and properties are not identified by human-readable names like “father” (property) or “Bach” (item). (For good reason: “Johann Sebastian Bach” is also the name of a German painter, and “Bach” might also refer to the surname, the French commune, the Mercury crater, etc.) Instead, Wikidata items and properties are assigned an identifier. To find the identifier for an item, we search for the item and copy the Q-number of the result that sounds like it’s the item we’re looking for (based on the description, for example). To find the identifier for a property, we do the same, but search for “P:search term” instead of just “search term”, which limits the search to properties. This tells us that the famous composer Johann Sebastian Bach is Q1339, and the property to designate an item’s father is P22.

And last but not least, we need to include prefixes. For simple WDQS triples, items should be prefixed with wd:, and properties with wdt:. (But this only applies to fixed values – variables don’t get a prefix!)

Putting this together, we arrive at our first proper WDQS query:

SELECT ?child
WHERE
{
# ?child  father   Bach
  ?child wdt:P22 wd:Q1339.
}

Try it!

Click that “Try it” link, then “Run” the query on the WDQS page. What do you get?

child
wd:Q57225
wd:Q76428

Well that’s disappointing. You just see the identifiers. You can click on them to see their Wikidata page (including a human-readable label), but isn’t there a better way to see the results?

Well, as it happens, there is! (Aren’t rhetorical questions great?) If you include the magic text

SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }

somewhere within the WHERE clause, you get additional variables: For every variable ?foo in your query, you now also have a variable ?fooLabel, which contains the label of the item behind ?foo. If you add this to the SELECT clause, you get the item as well as its label:

SELECT ?child ?childLabel
WHERE
{
# ?child  father   Bach
  ?child wdt:P22 wd:Q1339.
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}

Try it!

Try running that query – you should see not only the item numbers, but also the names of the various children.

child childLabel
wd:Q57225 Johann Christoph Friedrich Bach
wd:Q76428 Carl Philipp Emanuel Bach

This completes the basics. Try amending this by varying the properties.

References

[edit | edit source]


Wikidata Query Service

Wikimedia runs the public service instance of WDQS (Wikidata Query Service), which is available for use at http://query.wikidata.org/.

The GUI at the home page of http://query.wikidata.org/ allows you to edit and submit SPARQL queries to the query engine. The results are displayed as an HTML table. Note that every query has a unique URL which can be bookmarked for later use. Going to this URL will put the query in the edit window, but will not run it - you still have to click "Execute" for that.

One can also generate a short URL for the query via a URL shortening service by clicking the "Generate short URL" link on the right - this will produce the shortened URL for the current query.

The "Add prefixes" button generates the header containing standard prefixes for SPARQL queries. The full list of prefixes that can be useful is listed in the RDF format documentation. Note that most common prefixes work automatically, since WDQS supports them out of the box.

The GUI also features a simple entity explorer which can be activated by clicking on the "🔍" symbol next to the entity result. Clicking on the entity Q-id itself will take you to the entity page on wikidata.org.

Default views

[edit | edit source]

If you run the query in the WDQS GUI, you can choose which view to present by specifying a comment: #defaultView:viewName at the beginning of the query. Supported views are:

  • Table - default view, displays the results as a table of values
  • Map - displays coordinate points if any present in the result
  • ImageGrid - displays images present in the result as a grid
  • BubbleChart - displays bubble chart for numbers found in the result
  • TreeMap - displays hierarchical tree map for numbers found in the result
  • Timeline - for results having dates, displays timeline placing each row at appropriate time
  • Dimensions - displays rows as lines between points on the scales representing each column
  • Graph - displays result as a connected graph, using linkTo column

SPARQL endpoint

[edit | edit source]

SPARQL queries can be submitted directly to the SPARQL endpoint with a GET request to https://query.wikidata.org/sparql?query=SPARQL POST requests can be sent to query.wikidata.org/bigdata/namespace/wdq/sparql. The result is returned as XML by default, or as JSON if either the query parameter format=json or the header Accept: application/sparql-results+json are provided.

JSON format is standard SPARQL 1.1 Query Results JSON Format.

Supported formats

[edit | edit source]

The following output formats are currently supported by the SPARQL endpoint:

Format HTTP Header Query parameter Description
XML Accept: application/sparql-results+xml format=xml XML result format, is returned by default. As specified in https://www.w3.org/TR/rdf-sparql-XMLres/
JSON Accept: application/sparql-results+json format=json JSON result format, as in: https://www.w3.org/TR/sparql11-results-json/
TSV Accept: text/tab-separated-values As specified in https://www.w3.org/TR/sparql11-results-csv-tsv/
CSV Accept: text/csv As specified in https://www.w3.org/TR/sparql11-results-csv-tsv/
Binary RDF Accept: application/x-binary-rdf-results-table

Query timeout

[edit | edit source]

There is a hard query deadline configured which is set to 60 seconds. That is true both for the GUI and the public SPARQL endpoint. If your query does not execute in the allowed time, try to optimize it, or reduce the amount of data it returns.

Every query will timeout when it takes more time to execute than this configured deadline. You may want to optimize the query or report a problematic query here mw:Wikidata_query_service/Problematic_queries

Introduction to Wikidata Query Service

[edit | edit source]

See SPARQL/Wikidata Query Service - Introduction

References

[edit | edit source]

Other SPARQL Endpoints

[edit | edit source]

See www.w3.org/wiki/SparqlEndpoints for a list of other SPARQL Endpoints.


Wikidata Query Service - Introduction

How to use Wikidata for a simple query

[edit | edit source]

Let’s go through a simple example demonstrating how to get a list of all known cats in the world.

Get a list of all the cats in the world

[edit | edit source]

Use this URL to get to the Query Service: https://query.wikidata.org

  1. Click Examples
  2. Select Cats from the list that appears in the new window
  3. Click Run just below the code box

This has now given us a list of all the famous and infamous cats on the Internet – or at least the ones Wikidata knows about. This is great, provided you are a cat person and not, say, a dog person.

What about dogs?

[edit | edit source]

If you want to list the dogs instead (or anything else for that matter), there are two ways to edit your query:

  1. Use the visual query editor to change the item from cat to dog:
  2. Manually edit the query Each item on Wikidata is uniquely identified using a code. This code is Q146 for “cat”. To find the corresponding code for “dog”, you can look it up on Wikidata:
  1. Go to https://www.wikidata.org
  2. Type “dog” in the search box
  3. Click on the first result from the drop-down menu
  4. Take note of the item number for “dog” from the Wikidata page that opens (Q144)

To change your query from “cat” to “dog”, simply replace Q146 by Q144 in the SPARQL query editor.

Run the program, and you will be presented with a list of all the famous and infamous dogs on the Internet

Diving into the User Interface

[edit | edit source]

How to change the language that the results are displayed in?

[edit | edit source]

With this search query tool, you can not only customize and search for simple or aggregated, compound, nested and complex queries but you can also search in ANY language and get results in ANY language too.

  • Change the default language code from “en” (English) to any other language code, say “ja” for Japanese.
#Cats  
SELECT ?item ?itemLabel WHERE
{
     ?item wdt:P31 wd:Q146 .
     SERVICE wikibase:label { bd:serviceParam wikibase:language "ja" }
}

Try it!

How to change the sorting order of the results displayed?

[edit | edit source]

The order of the items displayed can be easily changed in the interface. Just click on the relevant columns of the search results to customize the sorting order:

How to change the language of the UI?

[edit | edit source]

Click the word English and select the language from the list displayed:

Notice the UI has changed to the selected language:

But I want pictures of cats! How to search for images?

[edit | edit source]

So far, we’ve seen the output of the queries displayed as a table. Let’s try for some images:

Click the “Examples” button and select the “Even more cats, with pictures” example. Run the query and cat images should pop up at the lower half of the screen.#

This is what displays the result as images instead of a table. You can manually switch how the data is displayed by using the “Display” menu in the lower right corner of the result. Try switching the menu to “Table” to see the same result in a table view.

Let’s have a look at what’s new in the query, compared to the query in the previous chapter. Written in plain English, the query would read “Give me all items that have something to do with cats. Also give me the image for each item. And by the way, display the result as a grid of images, not as a table.”

  • #defaultView:ImageGrid What looks like a comment, is actually an instruction to display the query result as images instead of a table.
  • ?x1 This means that we query for any predicate, not only for “is an instance of”. Think of predicates like “depicts” (P180) or “is named after” (P138). We are not using ?x1 anywhere else in the query, meaning the placeholder stands for “I don’t care what the predicate is, give me all results that somehow have a cat as the subject”
  • OPTIONAL Inside the curly braces you see another statement that adds optional data to the result. The placeholder ?item is reused and coupled with the predicate P18 (meaning “(has) image” and a new placeholder ?pic. Wrapping the statement with OPTIONAL means that items do not necessarily have to have a P18 predicate to show up in the list and that the ?pic placeholder can be empty.

Try modifying the query:

  • Search for dogs instead of cats. Hint: Change the Object part of the first statement in WHERE.
  • Only show items that “depict” cats (you’ll get paintings). Hint: replace ?x1 with something else.
  • Remove the OPTIONAL

There are other ways to show your data, but not all of them are always applicable, which is why some are grayed out in the menu. “Image map” is only selectable if the result data actually contains image URLs. In one of the following chapters you’ll learn how to display items as points on a map.

How to share your query for others to use?

[edit | edit source]

If you want to share your query with someone else – say, on social media – you can create a short link for your query.

  • Complete your query
  • Click the LINK icon:
  • Copy the URL listed there. This is the URL for the query.

Finding things on maps

[edit | edit source]

There are more ways to visualize the query results. If the query asks for geocoordinates, the results can be displayed on a map.

Look at this example of lighthouses in Norway. When you run the query, you’ll see red dots that mark the location of lighthouses on the Norwegian coast.


There is even more

[edit | edit source]

There are more visualizations for you to explore:

  • Bubble Chart
  • Tree Map
  • Dimensions
  • Graph

Look through the query examples and try them out!


See also

[edit | edit source]

See also this Video Querying Wikidata with SPARQL for Absolute Beginners

References

[edit | edit source]


Prefixes

WDQS understands many shortcut abbreviations, known as prefixes. Some are internal to Wikidata wd, wdt, p, ps, bd, etc. and many others are commonly used external prefixes, like rdf, skos, owl, schema, etc.

In the following query, we are asking for items where there is a statement of "P279 = Q7725634" or in fuller terms, selecting subjects that have a predicate of "subclass of" with an object of = "literary work".

For simple WDQS triples, items should be prefixed with wd:, and properties with wdt:. This only applies to fixed values – variables don’t get a prefix.

PREFIX wd: <http://www.wikidata.org/entity/>
PREFIX wds: <http://www.wikidata.org/entity/statement/>
PREFIX wdv: <http://www.wikidata.org/value/>
PREFIX wdt: <http://www.wikidata.org/prop/direct/>
PREFIX wikibase: <http://wikiba.se/ontology#>
PREFIX p: <http://www.wikidata.org/prop/>
PREFIX ps: <http://www.wikidata.org/prop/statement/>
PREFIX pq: <http://www.wikidata.org/prop/qualifier/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX bd: <http://www.bigdata.com/rdf#>

# The below SELECT query does the following:
# Selects all the items(?s subjects) and their descriptions(?desc)
# that have(WHERE) the statement(?s subject) has a direct property(wdt:) = P279 <subclasses of>
# with a value of entity(wd:) = Q7725634 <Literary Work>
# and Optionally return the label and description using the Wikidata label service

SELECT ?s ?desc WHERE {
  ?s wdt:P279 wd:Q7725634 .
  OPTIONAL {
      ?s rdfs:label ?desc 
      FILTER (LANG(?desc) = "en").
  }
}

Try it!

Without the use of prefixes this should be written as

# The below SELECT query does the following:
# Selects all the items(?s subjects) and their descriptions(?desc)
# that have(WHERE) the statement(?s subject) has a direct property(wdt:) = P279 <subclasses of>
# with a value of entity(wd:) = Q7725634 <Literary Work>
# and Optionally return the label and description using the Wikidata label service

SELECT ?s ?desc WHERE {
  ?s <http://www.wikidata.org/prop/direct/P279> <http://www.wikidata.org/entity/Q7725634> .
  OPTIONAL {
     ?s <http://www.w3.org/2000/01/rdf-schema#label> ?desc 
     FILTER (LANG(?desc) = "en").
  }
}

Try it!

Actually all mentioned prefixes are built in in the Wikidata Query Service, so they can be left out

# The below SELECT query does the following:
# Selects all the items(?s subjects) and their descriptions(?desc)
# that have(WHERE) the statement(?s subject) has a direct property(wdt:) = P279 <subclasses of>
# with a value of entity(wd:) = Q7725634 <Literary Work>
# and Optionally return the label and description using the Wikidata label service

SELECT ?s ?desc WHERE {
  ?s wdt:P279 wd:Q7725634 .
  OPTIONAL {
     ?s rdfs:label ?desc 
     FILTER (LANG(?desc) = "en").
  }
}

Try it!

The full list of built in prefixes is

PREFIX wd: <http://www.wikidata.org/entity/>
PREFIX wds: <http://www.wikidata.org/entity/statement/>
PREFIX wdv: <http://www.wikidata.org/value/>
PREFIX wdt: <http://www.wikidata.org/prop/direct/>
PREFIX wikibase: <http://wikiba.se/ontology#>
PREFIX p: <http://www.wikidata.org/prop/>
PREFIX ps: <http://www.wikidata.org/prop/statement/>
PREFIX pq: <http://www.wikidata.org/prop/qualifier/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX bd: <http://www.bigdata.com/rdf#>

PREFIX wdref: <http://www.wikidata.org/reference/>
PREFIX psv: <http://www.wikidata.org/prop/statement/value/>
PREFIX psn: <http://www.wikidata.org/prop/statement/value-normalized/>
PREFIX pqv: <http://www.wikidata.org/prop/qualifier/value/>
PREFIX pqn: <http://www.wikidata.org/prop/qualifier/value-normalized/>
PREFIX pr: <http://www.wikidata.org/prop/reference/>
PREFIX prv: <http://www.wikidata.org/prop/reference/value/>
PREFIX prn: <http://www.wikidata.org/prop/reference/value-normalized/>
PREFIX wdno: <http://www.wikidata.org/prop/novalue/>
PREFIX wdata: <http://www.wikidata.org/wiki/Special:EntityData/>

PREFIX schema: <http://schema.org/>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX owl: <http://www.w3.org/2002/07/owl#>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
PREFIX prov: <http://www.w3.org/ns/prov#>
PREFIX bds: <http://www.bigdata.com/rdf/search#>
PREFIX gas: <http://www.bigdata.com/rdf/gas#>
PREFIX hint: <http://www.bigdata.com/queryHints#>

References

[edit | edit source]


Sentences

Comma, Semicolon and Period

[edit | edit source]

At the Basics chapter we’ve seen all children of Johann Sebastian Bach – more specifically: all items with the father Johann Sebastian Bach. But Bach had two wives, and so those items have two different mothers: what if we only want to see the children of Johann Sebastian Bach with his first wife, Maria Barbara Bach (Q57487)? Try writing that query, based on the one above.

Done that? Okay, then onto the solution! The simplest way to do this is to add a second triple with that restriction:

SELECT ?child ?childLabel
WHERE
{
  ?child wdt:P22 wd:Q1339.
  ?child wdt:P25 wd:Q57487.
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}

Try it!

In English, this reads:

Child has father Johann Sebastian Bach. Child has mother Maria Barbara Bach.

That sounds a bit awkward, doesn’t it? In natural language, we’d abbreviate this:

Child has father Johann Sebastian Bach and mother Maria Barbara Bach.

In fact, it’s possible to express the same abbreviation in SPARQL as well: if you end a triple with a semicolon (;) instead of a period, you can add another predicate-object pair. This allows us to abbreviate the above query to:

SELECT ?child ?childLabel
WHERE
{
  ?child wdt:P22 wd:Q1339;
         wdt:P25 wd:Q57487.
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}

Try it!

which has the same results, but less repetition in the query.

Now suppose that, out of those results, we’re interested only in those children who also were also composers and pianists. The relevant properties and items are occupation (P106), composer (Q36834) and pianist (Q486748). Try updating the above query to add these restrictions!

Here’s my solution:

SELECT ?child ?childLabel
WHERE
{
  ?child wdt:P22 wd:Q1339;
         wdt:P25 wd:Q57487;
         wdt:P106 wd:Q36834;
         wdt:P106 wd:Q486748.
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}

Try it!

This uses the ; abbreviation two more times to add the two required occupations. But as you might notice, there’s still some repetition. This is as if we said:

Child has occupation composer and occupation pianist.

which we would usually abbreviate as:

Child has occupation composer and pianist.

And SPARQL has some syntax for that as well: just like a ; allows you to append a predicate-object pair to a triple (reusing the subject), a , allows you to append another object to a triple (reusing both subject and predicate). With this, the query can be abbreviated to:

SELECT ?child ?childLabel
WHERE
{
  ?child wdt:P22 wd:Q1339;
         wdt:P25 wd:Q57487;
         wdt:P106 wd:Q36834,
                  wd:Q486748.
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}

Try it!

Note: indentation and other whitespace doesn’t actually matter – I’ve just indented the query to make it more readable. You can also write this as:

SELECT ?child ?childLabel
WHERE
{
  ?child wdt:P22 wd:Q1339;
         wdt:P25 wd:Q57487;
         wdt:P106 wd:Q36834, wd:Q486748.
  # both occupations in one line
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}

Try it!

or, rather less readable:

SELECT ?child ?childLabel
WHERE
{
  ?child wdt:P22 wd:Q1339;
  wdt:P25 wd:Q57487;
  wdt:P106 wd:Q36834,
  wd:Q486748.
  # no indentation; makes it hard to distinguish between ; and ,
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}

Try it!

Luckily, the WDQS editor indents lines for you automatically, so you usually don’t have to worry about this.

Alright, let’s summarize here. We’ve seen that queries are structured like text. Each triple about a subject is terminated by a period. Multiple predicates about the same subject are separated by semicolons, and multiple objects for the same subject and predicate can be listed separated by commas.

SELECT ?s1 ?s2 ?s3
WHERE
{
  ?s1 p1 o1;
      p2 o2;
      p3 o31, o32, o33.
  ?s2 p4 o41, o42.
  ?s3 p5 o5;
      p6 o6.
}

Brackets ([ ])

[edit | edit source]

Now I want to introduce one more abbreviation that SPARQL offers. So if you’ll humor me for one more hypothetical scenario…

Suppose we’re not actually interested in Bach’s children. (Who knows, perhaps that’s actually true for you!) But we are interested in his grandchildren. (Hypothetically.) There’s one complication here: a grandchild may be related to Bach via the mother or the father. That’s two different properties, which is inconvenient. Instead, let’s flip the relation around: Wikidata also has a “child” property, child (P40), which points from parent to child and is gender-independent. With this information, can you write a query that returns Bach’s grandchildren?

Here’s my solution:

SELECT ?grandChild ?grandChildLabel
WHERE
{
  wd:Q1339 wdt:P40 ?child.
  ?child wdt:P40 ?grandChild.
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}

Try it!

In natural language, this reads:

Bach has a child ?child. ?child has a child ?grandChild.

Once more, I propose that we abbreviate this English sentence, and then I want to show you how SPARQL supports a similar abbreviation. Observe how we actually don’t care about the child: we don’t use the variable except to talk about the grandchild. We could therefore abbreviate the sentence to:

Bach has as child someone who has a child ?grandChild.

Instead of saying who Bach’s child is, we just say “someone”: we don’t care who it is. But we can refer back to them because we’ve said “someone who”: this starts a relative clause, and within that relative clause we can say things about “someone” (e. g., that he or she “has a child ?grandChild”). In a way, “someone” is a variable, but a special one that’s only valid within this relative clause, and one that we don’t explicitly refer to (we say “someone who is this and does that”, not “someone who is this and someone who does that” – that’s two different “someone”s).

In SPARQL, this can be written as:

SELECT ?grandChild ?grandChildLabel
WHERE
{
  wd:Q1339 wdt:P40 [ wdt:P40 ?grandChild ].
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}

Try it!

You can use a pair of brackets ([]) in place of a variable, which acts as an anonymous variable. Inside the brackets, you can specify predicate-object pairs, just like after a ; after a normal triple; the implicit subject is in this case the anonymous variable that the brackets represent. (Note: also just like after a ;, you can add more predicate-object pairs with more semicolons, or more objects for the same predicate with commas.)

And that’s it for triple patterns! There’s more to SPARQL, but as we’re about to leave the parts of it that are strongly analogous to natural language,

Summary

[edit | edit source]

I’d like to summarize that relationship once more:

natural language example SPARQL example
sentence Juliet loves Romeo. period juliet loves romeo.
conjunction (clause) Romeo loves Juliet and kills himself. semicolon romeo loves juliet; kills romeo.
conjunction (noun) Romeo kills Tybalt and himself. comma romeo kills tybalt, romeo.
relative clause Juliet loves someone who kills Tybalt. brackets juliet loves [ kills tybalt ].

References

[edit | edit source]


Triples

Introduction

[edit | edit source]

The statement "The sky has the color blue", consists of a subject ("the sky"), a predicate ("has the color"), and an object ("blue").

SPO or "subject, predicate, object" is known as a (Semantic) triple, or commonly referred to in Wikidata as a statement about data.

SPO is also used as a form of basic syntax layout for querying RDF data structures, or any graph database or triplestore, such as the Wikidata Query Service (WDQS).

See also w:en:Semantic triple

In Wikidata Query Service (WDQS) triples are used to describe the Query pattern in the WHERE clause of the SELECT statement

# ?child  father   Bach
  ?child wdt:P22 wd:Q1339.

In this case the triple ?child wdt:p22 wd:Q1339 specifies that the variable ?child must have the parent/father Bach.

Any of the triple parts Subject, Predicate and Object may be variables. This makes this selection very versatile.

Triples with the same subject

[edit | edit source]
Example of SPARQL Triples
Example of SPARQL Triples

Aditional variables can be added by adding additional triples. In the simplest case these triples use the same subject.

SELECT ?child ?childLabel ?genderLabel ?birth_date ?date_of_death
WHERE
{
  ?child wdt:P22 wd:Q1339.# ?child  has father   Bach
  ?child wdt:P21 ?gender.
  ?child wdt:P569 ?birth_date.
  ?child wdt:P570 ?date_of_death.
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}

Try it!

The first triple selects all the children of Bach. The additional triples links all these triples with a value for gender, birth date and date of death. The variable ?child links all of them together.

If you look closely at the result you might have noticed that Johann Christoph Friedrich Bach has 2 lines in the list because there are 2 different birth dates, 21 and 23 of June 1732. In his case ?child wdt:P569 ?birth_date. resulted into 2 values. See for further details at removing duplicates and modifiers.

OPTIONAL triples

[edit | edit source]

If not all subjects have a value for a certain triple the subject is excluded. To have it included the OPTIONAL keyword comes in handy.

SELECT DISTINCT ?child ?childLabel ?genderLabel ?birth_date ?date_of_death
WHERE
{
  ?child wdt:P22 wd:Q76.# ?child  has father   Obama
  OPTIONAL{ ?child wdt:P21 ?gender. }
  OPTIONAL{ ?child wdt:P569 ?birth_date. }
  OPTIONAL{ ?child wdt:P570 ?date_of_death. }
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}

Try it!

Both children are shown, even if one of the variables (in this case the date of death) is not filled in.

See the chapter OPTIONAL for a full description.

Complex triples

[edit | edit source]

Triples are not limited to one subject. In fact triples can be linked in any thinkable way.

You would for instance be able to list the coordinates of the birth places of the children of Bach

SELECT ?child ?childLabel ?placeofbirthLabel ?coordinates
WHERE
{
  ?child wdt:P22 wd:Q1339.# ?child  has father   Bach
  ?child wdt:P19 ?placeofbirth.
  ?placeofbirth wdt:P625 ?coordinates. 
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
ORDER BY ?placeofbirthLabel

Try it!

You could even see these birthplaces (Köthen, Leipzig and Weimar) on a map by using #defaultView:Map

#defaultView:Map
SELECT ?placeofbirthLabel ?coordinates
       (GROUP_CONCAT(DISTINCT ?childLabel; SEPARATOR=", ") AS ?children)
WHERE
{
  ?child wdt:P22 wd:Q1339.# ?child  has father   Bach
  ?child wdt:P19 ?placeofbirth.
  ?placeofbirth wdt:P625 ?coordinates. 
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en". 
                         ?child        rdfs:label ?childLabel.
                         ?placeofbirth rdfs:label ?placeofbirthLabel.
                         }
}
GROUP BY ?placeofbirthLabel ?coordinates ?children

Try it!

If you click on a red dot you will get additional data as specified above with the variables ?placeofbirthLabel and ?children. We had to use GROUP BY, GROUP_CONCAT, DISTINCT and all labels should be defined explicitly in the SERVICE. You can toggle between the Map display and standard table display by the Display drop down list, at the right side of the Run button.

See more about views at Map views or all views

Triples by number of variables

[edit | edit source]

Triples with one variable

[edit | edit source]

An example of a triple with one variable for Subject would be

SELECT ?child ?childLabel
WHERE
{
  ?child wdt:P22 wd:Q1339.         # ?child  has father   Bach
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}

Try it!

This will list all Subjects (as variable ?child) with Predicate father (P22) and Object Johann Sebastian Bach (Q1339).

An example of a triple with one variable for Predicate would be

SELECT ?predicate ?pLabel
WHERE
{
  wd:Q57225 ?predicate wd:Q1339.         # Johann Christoph Friedrich Bach ?predicate Johann Sebastian Bach

  BIND( IRI(REPLACE( STR(?predicate),"prop/direct/","entity/" )) AS ?p). 
  # or ?p wikibase:directClaim ?predicate. 
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}

Try it!

This will list all Predicates (as variable ?predicate) with Object Johann Christoph Friedrich Bach (Q57225) and Subject Johann Sebastian Bach (Q1339).
It shows that he is not only his father (P22) but also student of (P1066) him

An example of a triple with one variable for Object would be

SELECT ?workloc ?worklocLabel
WHERE
{
  wd:Q1339 wdt:P937 ?workloc.         # Bach  work location  ?workloc
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}

Try it!

This will list all Objects (as variable ?workloc) with Subject Johann Sebastian Bach (Q1339) and Predicate work location (P937).

Triples with two variables

[edit | edit source]

An example of a triple with 2 variables and only a fixed value for Subject would list all raw information available in Wikidata about Bach

SELECT ?predicate ?object
WHERE
{
  wd:Q1339 ?predicate ?object.         # Bach
}

Try it!

See further at next section with 3 variables for further usage

An example of a triple with 2 variables and only a fixed value for Predicate would list all subjects (probably airports) with an IATA airport code

SELECT ?subject ?subjectLabel ?object
WHERE
{
  ?subject wdt:P238 ?object.         # IATA airport code
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
ORDER BY ?object

Try it!

An usage could be to check for duplicate IATA codes:

SELECT ?object (COUNT(?subject) AS ?count)
               (MIN(?subject) AS ?subject1) (MAX(?subject) AS ?subject2)
               (GROUP_CONCAT(DISTINCT ?subjectLabel; SEPARATOR=", ") AS ?subjectLabels)
WHERE
{
  ?subject wdt:P238 ?object.         # IATA airport code
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en". 
                         ?subject rdfs:label ?subjectLabel.
                         }
}
GROUP BY ?object
HAVING(COUNT(?subject) > 1)
ORDER BY ?object

Try it!

An example of a triple with 2 variables and only a fixed value for Object would list all subjects related to Bach

SELECT ?subject ?subjectLabel ?subjectDescription ?predicate ?pLabel
WHERE
{
  ?subject ?predicate wd:Q1339.  # Bach

  BIND( IRI(REPLACE( STR(?predicate),"prop/direct/","entity/" )) AS ?p). 
  # or ?p wikibase:directClaim ?predicate. 
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
ORDER BY ?subject

Try it!

An other possibility of a triple with fixed value for Object would list all subjects with value "ABC", and will show for instance airport Albacete Airport

SELECT ?subject ?subjectLabel ?subjectDescription ?predicate ?pLabel
WHERE
{
  ?subject ?predicate "ABC". 

  BIND( IRI(REPLACE( STR(?predicate),"prop/direct/","entity/" )) AS ?p). 
  # or ?p wikibase:directClaim ?predicate. 
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
ORDER BY ?subject

Try it!

Triples with three variables

[edit | edit source]

When you would use triples with all 3 as variables (one for Subject, one for Predicate and one for Object) you basically will list out the whole database. This can be done for small databases, and can be used as well to get a rough idea of the available data, on all available properties.

All raw information available in Wikidata about the children of Bach:

SELECT ?subject ?predicate ?object 
WHERE
{
  ?subject ?predicate ?object.
  ?subject wdt:P22 wd:Q1339.		# subject has father   Bach
}
ORDER BY ?subject ?predicate ?object
LIMIT 10000

Try it!

The same query but grouped by predicate:

SELECT DISTINCT ?subject ?subjectLabel ?predicate 
       (GROUP_CONCAT(DISTINCT ?object; SEPARATOR=", ") AS ?objects)
WHERE
{
  ?subject ?predicate ?object.
  ?subject wdt:P22 wd:Q1339.		# subject has father   Bach
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
GROUP BY ?subject ?subjectLabel ?predicate
ORDER BY ?subject ?subjectLabel ?predicate
LIMIT 10000

Try it!

From the query below you can discover triples about the date the Wikidata page was last updated, the total number of statements, the number of sitelinks etc. These are schema:dateModified, wikibase:statements and wikibase:sitelinks respectively.

SELECT ?subject ?subjectLabel ?datemodified ?statements ?sitelinks 
WHERE
{
  ?subject wdt:P22 wd:Q1339.		# subject has father   Bach
  ?subject schema:dateModified ?datemodified.
  ?subject wikibase:statements ?statements.
  ?subject wikibase:sitelinks  ?sitelinks.
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}

Try it!


SELECT

The SELECT clause consists of 2 or 3 parts.

SELECT  ... query result variables ...
WHERE {
        ... query pattern ...
}
        ... optional query modifiers ...
GROUP BY ...
HAVING ...
ORDER BY ...
LIMIT ...

The first part is the query result variables. These will be shown when the query is executed. The second part is the WHERE clause with the query pattern. This defines the data selection and generates the variables, The last part are the optional modifiers.

Example

SELECT ?child ?childLabel
WHERE
{
# ?child  father   Bach
  ?child wdt:P22 wd:Q1339.
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}

Try it!

Query result variables

[edit | edit source]

SELECT ?child ?childLabel ... In the above example ?child and ?childLabel are the variables. The variables are separated by spaces. A variable can be displayed with another name (Alias) by using AS, like (?child AS ?Child_of_Bach). Notice that the alias again should be a variable, and the combination should start and end with brackets (?a AS ?b).
PS. For aliases of Labels the label should be defined explicitly in the SERVICE. Alternatively the variable could be named as requested there as well.

SELECT (?child AS ?Child_of_Bach) (?childLabel AS ?Name)
WHERE
{
# ?child  father   Bach
  ?child wdt:P22 wd:Q1339.
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en". 
                           ?child rdfs:label ?childLabel.
                         }
}

Try it!

Query pattern

[edit | edit source]

The query pattern specifies the data selection and generates the variables,

# ?child  father   Bach
  ?child wdt:P22 wd:Q1339.

In this case the triple ?child wdt:p22 wd:Q1339 specifies that the variable ?child must have the parent/father Bach.

Any of the triple parts Subject, Predicate and Object may be variables. This makes this selection very versatile.

Additional triples can be added, for instance to show gender, birth date and date of death. Each sentence should end with a period. The new variable should be added at the top (query result variables) to display them.

SELECT ?child ?childLabel ?genderLabel ?birth_date ?date_of_death
WHERE
{
  ?child wdt:P22 wd:Q1339.# ?child  has father   Bach
  ?child wdt:P21 ?gender.
  ?child wdt:P569 ?birth_date.
  ?child wdt:P570 ?date_of_death.
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}

Try it!

Optional query modifiers

[edit | edit source]

See the chapter modifiers for a full description.

The above query can be sorted by the birth date by adding ORDER BY ?birth_date after the last curly bracket of the WHERE { } clause

SELECT ?child ?childLabel ?genderLabel ?birth_date ?date_of_death
WHERE
{
  ?child wdt:P22 wd:Q1339.# ?child  has father   Bach
  ?child wdt:P21 ?gender.
  ?child wdt:P569 ?birth_date.
  ?child wdt:P570 ?date_of_death.
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
ORDER BY ?birth_date

Try it!

Removing duplicates

[edit | edit source]

You might have noticed that the above query resulted in 21 records, while Bach had only 20 children. This is caused by 2 entries of Johann Christoph Friedrich Bach, because there are 2 different birth dates, 21 and 23 of June 1732.

Generally it is advised to use DISTINCT in the result variables like

SELECT DISTINCT ?child ?childLabel ?genderLabel ?birth_date ?date_of_death
WHERE
{
  ?child wdt:P22 wd:Q1339.# ?child  has father   Bach
  ?child wdt:P21 ?gender.
  ?child wdt:P569 ?birth_date.
  ?child wdt:P570 ?date_of_death.
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
ORDER BY ?child

Try it!

DISTINCT removes duplicate entries, but in this case that does not help as the entries are different. To be helpful ALL parameters should have been equal.

What helps is to group by child and combine the values of ?birth_date. Also ?date_of_death and ?genderLabel are grouped to be consistent

SELECT ?child ?childLabel 
       (GROUP_CONCAT(DISTINCT ?genderLabel;   SEPARATOR=", ") AS ?genderLabels)
       (GROUP_CONCAT(DISTINCT ?birth_date;    SEPARATOR=", ") AS ?birth_dates)
       (GROUP_CONCAT(DISTINCT ?date_of_death; SEPARATOR=", ") AS ?dates_of_death)
WHERE
{
  ?child wdt:P22 wd:Q1339.# ?child  has father   Bach
  ?child wdt:P21 ?gender.
  ?child wdt:P569 ?birth_date.
  ?child wdt:P570 ?date_of_death.
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en". 
                           ?child  rdfs:label ?childLabel.
                           ?gender rdfs:label ?genderLabel.
                         }
}
GROUP BY ?child ?childLabel
ORDER BY ?birth_dates

Try it!

The optional query modifier used is GROUP BY, and the variables are combined by using (GROUP_CONCAT(DISTINCT ?var1; SEPARATOR=", ") AS ?var2). The ORDER BY has been modified by using the combined variable ?birth_dates, instead of ?birth_date. All labels should be defined explicitly in the SERVICE.

See the chapter modifiers for a full description.

Another way to remove duplicates it so show only one of the possible values, by using one of the Aggregate functions MIN, MAX, SUM or AVG.

SELECT ?child ?childLabel 
       (MIN(?genderLabel)   AS ?genderlabel1) 
       (MIN(?birth_date)    AS ?birth_date1) 
       (MAX(?date_of_death) AS ?date_of_death1)
WHERE
{
  ?child wdt:P22 wd:Q1339.# ?child  has father   Bach
  ?child wdt:P21 ?gender.
  ?child wdt:P569 ?birth_date.
  ?child wdt:P570 ?date_of_death.
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en". 
                           ?child  rdfs:label ?childLabel.
                           ?gender rdfs:label ?genderLabel.
                         }
}
GROUP BY ?child ?childLabel
ORDER BY ?birth_date1

Try it!

The optional query modifier used is also GROUP BY, and the variables are combined by using (MIN(?var) AS ?var1). The ORDER BY has been modified by using the first birth date ?birth_date1, instead of ?birth_date. All labels should also be defined explicitly in the SERVICE.

Adding missing entries

[edit | edit source]

Let's try one of the above queries for Obama (Q76):

SELECT DISTINCT ?child ?childLabel ?genderLabel ?birth_date ?date_of_death
WHERE
{
  ?child wdt:P22 wd:Q76.# ?child  has father   Obama
  ?child wdt:P21 ?gender.
  ?child wdt:P569 ?birth_date.
  ?child wdt:P570 ?date_of_death.
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
ORDER BY ?child

Try it!

This results in NO results although Obama has 2 children. The reason is that to match this query, a potential result (a child) must match all the triples we listed: it must have a gender, and a birth data, and a date of death. If one or more of those properties don't exist, it won’t match. And that’s not what we want in this case: we primarily want a list of all the children – if additional data is available, we’d like to include it, but we don’t want that to limit our list of results.

The solution is to tell WDQS that those triples are OPTIONAL:

SELECT DISTINCT ?child ?childLabel ?genderLabel ?birth_date ?date_of_death
WHERE
{
  ?child wdt:P22 wd:Q76.# ?child  has father   Obama
  OPTIONAL{ ?child wdt:P21 ?gender. }
  OPTIONAL{ ?child wdt:P569 ?birth_date. }
  OPTIONAL{ ?child wdt:P570 ?date_of_death. }
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
ORDER BY ?child

Try it!

Now both children are shown, and it is clear that the date of death is not filled in yet, as these children are still alive.

See the chapter OPTIONAL for a full description.

Fool proof: Add missing and remove duplicate entries

[edit | edit source]

A fool proof list of children and their details of any given father would be

SELECT ?child ?childLabel 
       (GROUP_CONCAT(DISTINCT ?genderLabel;   SEPARATOR=", ") AS ?genderLabels)
       (GROUP_CONCAT(DISTINCT ?birth_date;    SEPARATOR=", ") AS ?birth_dates)
       (GROUP_CONCAT(DISTINCT ?date_of_death; SEPARATOR=", ") AS ?dates_of_death)
WHERE
{
  ?child wdt:P22 wd:Q76.# ?child  has father   Obama
  OPTIONAL{ ?child wdt:P21 ?gender. }
  OPTIONAL{ ?child wdt:P569 ?birth_date. }
  OPTIONAL{ ?child wdt:P570 ?date_of_death. }
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en". 
                           ?child  rdfs:label ?childLabel.
                           ?gender rdfs:label ?genderLabel.
                         }
}
GROUP BY ?child ?childLabel
ORDER BY ?birth_dates

Try it!

This combines the OPTIONAL clause in case a property is not present and the GROUP_CONCAT clause in case of multiple properties.


OPTIONAL

With a triple ?person wdt:P569 ?birth_date the birth date of a person can be retrieved. But there might be occasions when a persons birth date is not known. In that case this triple acts as a selection: Only persons with birth dates are selected.

And that’s not what we want in this case: we primarily want to include this person and if additional data is available, we’d like to list that, but we don’t want that to limit our list of results.

The solution is to tell WDQS that those triples are OPTIONAL:

SELECT ?child ?childLabel ?genderLabel ?birth_date ?date_of_death
WHERE
{
  ?child wdt:P22 wd:Q76.# ?child  has father   Obama
  OPTIONAL{ ?child wdt:P21 ?gender. }
  OPTIONAL{ ?child wdt:P569 ?birth_date. }
  OPTIONAL{ ?child wdt:P570 ?date_of_death. }
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
ORDER BY ?child

Try it!

Without OPTIONAL none of two the children of Obama would be shown, because the date of death is not filled for any of the children.

In general you should NOT include all optional triplets into one OPTIONAL { } sentence.

SELECT ?child ?childLabel ?genderLabel ?birth_date ?date_of_death
WHERE
{
  ?child wdt:P22 wd:Q76.# ?child  has father   Obama
  OPTIONAL{ ?child wdt:P21 ?gender.            # In general you should NOT include all optional triplets into one sentence   
            ?child wdt:P569 ?birth_date. 
            ?child wdt:P570 ?date_of_death. 
          }
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
ORDER BY ?child

Try it!

The result is that all variables are blank when only one of them is missing in the data.

Cases where this might be useful is when the variables are tightly linked.


FILTER

FILTER(condition) is a clause you can insert into your SPARQL query to, well, filter the results. Inside the parentheses, you can put any expression of boolean type, and only those results where the expression returns true are used.

FILTER on values

[edit | edit source]

For example, to get a list of all humans born in 2015, we first get all humans with their date of birth –

SELECT ?person ?personLabel ?dob
WHERE
{
  ?person wdt:P31 wd:Q5;
          wdt:P569 ?dob.
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en". } 
}

– and then filter that to only return the results where the year of the date of birth is 2015. There are two ways to do that: extract the year of the date with the YEAR function, and test that it’s 2015 –

FILTER(YEAR(?dob) = 2015).

– or check that the date is between Jan. 1st (inclusive), 2015 and Jan. 1st, 2016 (exclusive):

FILTER("2015-01-01"^^xsd:dateTime <= ?dob && ?dob < "2016-01-01"^^xsd:dateTime).

I’d say that the first one is more straightforward, but it turns out the second one is much faster, so let’s use that:

SELECT ?person ?personLabel ?dob
WHERE
{
  ?person wdt:P31 wd:Q5;
          wdt:P569 ?dob.
  FILTER("2015-01-01"^^xsd:dateTime <= ?dob && ?dob < "2016-01-01"^^xsd:dateTime).
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en". } 
}

Try it!

FILTER on values in Labels

[edit | edit source]

Another possible use of FILTER is related to labels. The label service is very useful if you just want to display the label of a variable. But if you want to do stuff with the label – for example: check if it starts with “Mr. ” – you’ll find that it doesn’t work:

SELECT ?human ?humanLabel
WHERE
{
  ?human wdt:P31 wd:Q15632617.                # fictional human
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
  FILTER(STRSTARTS(?humanLabel, "Mr. ")).     # This does not work
}

Try it!

This query finds all instances of Q15632617 and tests if their label starts with "Mr. " (STRSTARTS is short for “string starts [with]”; there’s also STRENDS and CONTAINS). The reason why this doesn’t work is that the label service adds its variables very late during query evaluation; at the point where we try to filter on ?humanLabel, the label service hasn’t created that variable yet.

Fortunately, the label service isn’t the only way to get an item’s label. Labels are also stored as regular triples, using the predicate rdfs:label. Of course, this means all labels, not just English ones; if we only want English labels, we’ll have to filter on the language of the label:

FILTER(LANG(?label) = "en").

The LANG function returns the language of a monolingual string, and here we only select those labels that are in English. The full query is:

SELECT ?human ?humanLabel
WHERE
{
  ?human wdt:P31 wd:Q15632617;  # fictional human
         rdfs:label ?humanLabel.
  FILTER(LANG(?humanLabel) = "en").
  FILTER(STRSTARTS(?humanLabel, "Mr. ")).
}

Try it!

We get the label with the ?human rdfs:label ?label triple, restrict it to English labels, and then check if it starts with “Mr. ”.

FILTER NOT EXISTS

[edit | edit source]

Let's add images to these fictional humans

SELECT ?human ?humanLabel ?image
WHERE
{
  ?human wdt:P31 wd:Q15632617;  # fictional human
         rdfs:label ?humanLabel.
  FILTER(LANG(?humanLabel) = "en").
  FILTER(STRSTARTS(?humanLabel, "Mr. ")).
  OPTIONAL{ ?human wdt:P18 ?image. }
}

Try it!

The code OPTIONAL{ ?human wdt:P18 ?image. } shows all the fictional humans that start with “Mr. ” and shows an image if available.
To select only fictional humans starting with “Mr. ” with an image the code ?human wdt:P18 ?image. can be used.
To select those fictional humans without an image the code FILTER NOT EXISTS{ ?human wdt:P18 ?image. } should be used

SELECT ?human ?humanLabel ?image
WHERE
{
  ?human wdt:P31 wd:Q15632617;  # fictional human
         rdfs:label ?humanLabel.
  FILTER(LANG(?humanLabel) = "en").
  FILTER(STRSTARTS(?humanLabel, "Mr. ")).
  FILTER NOT EXISTS{ ?human wdt:P18 ?image. }  # without images
}

Try it!

MINUS

[edit | edit source]

A similar way of negation provided in SPARQL is MINUS which evaluates both its arguments, then calculates solutions in the left-hand side that are not compatible with the solutions on the right-hand side.

SELECT ?human ?humanLabel ?image
WHERE
{
  ?human wdt:P31 wd:Q15632617;  # fictional human
         rdfs:label ?humanLabel.
  FILTER(LANG(?humanLabel) = "en").
  FILTER(STRSTARTS(?humanLabel, "Mr. ")).
  MINUS{ ?human wdt:P18 ?image. }  # without images
}

Try it!

References

[edit | edit source]


UNION

Suppose the set A: 28 capitals in European Union

SELECT ?city ?cityLabel
WHERE {
  wd:Q458 wdt:P150 ?country.   # European Union  contains administrative territorial entity
  ?country wdt:P36 ?city.      # capital
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
Try it!

Suppose the set B: 3 cities with millions of inhabitants in European Union.

SELECT ?city ?cityLabel
WHERE {
  wd:Q458 wdt:P150 ?country.   # European Union  contains administrative territorial entity
  ?city wdt:P17 ?country.      # city in a (European) country
  ?city wdt:P31 wd:Q1637706.   # city with millions of inhabitants
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
Try it!

The intersection of A and B, big capitals of the European Union, can simply be achieved by combining all the triples. This results in 1 city.

SELECT ?city ?cityLabel
WHERE {
  wd:Q458 wdt:P150 ?country.   # European Union  contains administrative territorial entity
  ?country wdt:P36 ?city.      # capital
  ?city wdt:P17 ?country.      # city in a (European) country
  ?city wdt:P31 wd:Q1637706.   # city with millions of inhabitants
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}

Try it!

All capitals of the European union, excluding big cities can be achieved by filtering using FILTER NOT EXISTS { }. This results in the other 27 capitals.

SELECT ?city ?cityLabel
WHERE {
  wd:Q458 wdt:P150 ?country.   # European Union  contains administrative territorial entity
  ?country wdt:P36 ?city.      # capital
  ?city wdt:P17 ?country.      # city in a (European) country
  FILTER NOT EXISTS{ ?city wdt:P31 wd:Q1637706. }  # NOT a city with millions of inhabitants
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}

Try it!

Finally a UNION of capitals and big cities result in 30 cities, one of which was deduplicated by DISTINCT.

SELECT DISTINCT ?city ?cityLabel
WHERE {
  wd:Q458 wdt:P150 ?country.   # European Union  contains administrative territorial entity
  { ?country wdt:P36 ?city. }  # capital 
  UNION
  { ?city wdt:P17 ?country.    # city in a (European) country
    ?city wdt:P31 wd:Q1637706. # a city with millions of inhabitants
  }
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}

Try it!

Mind that the 2 parts of the union should both be within brackets { ... } UNION { ... }.

Two simpler examples of UNION are

The last code can be simplified by using property path ?child wdt:P22|wdt:P25 ?parent.

An overview of all kind of joins:

Venn diagram Mathematical SPARQL
And A.
B.
A.
FILTER NOT EXISTS{ B. }
A.
OPTIONAL{ B. }
Or { A. } UNION { B. }


SERVICE - Label

You can fetch the label, alias, or description of entities you query, with language fallback, using the specialized service with the URI <http://wikiba.se/ontology#label>. The service is very helpful when you want to retrieve labels, as it reduces the complexity of SPARQL queries that you would otherwise need to achieve the same effect.

The service can be used in one of the two modes: manual and automatic.

Automatic Label SERVICE

[edit | edit source]

In automatic mode, you only need to specify the service template, e.g.:

 PREFIX wikibase: <http://wikiba.se/ontology#>
 SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }

and WDQS will automatically generate labels as follows:

  • If an unbound variable in SELECT is named ?NAMELabel, then WDQS produces the label (rdfs:label) for the entity in variable ?NAME.
  • If an unbound variable in SELECT is named ?NAMEAltLabel, then WDQS produces the alias (skos:altLabel) for the entity in variable ?NAME.
  • If an unbound variable in SELECT is named ?NAMEDescription, then WDQS produces the description (schema:description) for the entity in variable ?NAME.

In each case, the variable in ?NAME should be bound, otherwise the service fails.

Example, showing the list of EU country names and capitals in french. For demonstation also Description and AltLabel are shown

SELECT ?country ?countryLabel ?capitalLabel ?capitalDescription ?capitalAltLabel
   WHERE {
     wd:Q458 wdt:P150 ?country.   # European Union  contains administrative territorial entity
     OPTIONAL{ ?country wdt:P36 ?capital. }
     SERVICE wikibase:label { bd:serviceParam wikibase:language "fr". }
}

Try it!

In this example WDQS automatically creates the labels ?countryLabel, ?capitalLabel, ?capitalDescription and ?capitalAltLabel.

Manual Label SERVICE

[edit | edit source]

In the manual mode, you explicitly bind the label variables within the service call, but WDQS will still provide language resolution and fallback.

Manual Label service is mandatory for using labels in some SPARQL functions like

Example:

SELECT *
WHERE {
     SERVICE wikibase:label { bd:serviceParam wikibase:language "fr,de,en".
          wd:Q123 rdfs:label         ?q123Label.
          wd:Q123 skos:altLabel      ?q123Alt.
          wd:Q123 schema:description ?q123Desc.
      }
}

Try it!

This will consider labels and descriptions in French, German and English, and if none are available, will use the Q-id as the label.

Manual labels can also be used to list labels in more languages, for instance European countries in English, German and French

SELECT ?country ?country_EN ?country_DE ?country_FR
   WHERE {
     wd:Q458 wdt:P150 ?country.   # European Union  contains administrative territorial entity
     SERVICE wikibase:label { bd:serviceParam wikibase:language "en".
            ?country rdfs:label ?country_EN.
     }
     SERVICE wikibase:label { bd:serviceParam wikibase:language "de".
            ?country rdfs:label ?country_DE.
     } hint:Prior hint:runLast false.
     SERVICE wikibase:label { bd:serviceParam wikibase:language "fr".
            ?country rdfs:label ?country_FR.
     } hint:Prior hint:runLast false.
}

Try it!

PS: hint:Prior hint:runLast false. is added to prevent error: "there can be only one "run last" join in any group". [1]

It is also possible to write the above query by using rdfs:label directly without the wikibase:label SERVICE:

SELECT ?country ?country_EN ?country_DE ?country_FR
   WHERE {
     wd:Q458 wdt:P150 ?country.   # European Union  contains administrative territorial entity
     OPTIONAL {?country rdfs:label ?country_EN FILTER (LANG(?country_EN) = "en")}.
     OPTIONAL {?country rdfs:label ?country_DE FILTER (LANG(?country_DE) = "de")}.
     OPTIONAL {?country rdfs:label ?country_FR FILTER (LANG(?country_FR) = "fr")}.
}

Try it!

Languages

[edit | edit source]

You can specify a list of languages to be used as fallback in case a label does not exist in a language You specify your preferred language(s) for the label with one or more of bd:serviceParam wikibase:language "language-code" triples. Each string can contain one or more language codes, separated by commas. WDQS considers languages in the order in which you specify them. If no label is available in any of the specified languages, the Q-id of the entity (without any prefix) is its label.

 SERVICE wikibase:label { bd:serviceParam wikibase:language "fr,de,en" }

The Wikidata Query Service website auto-magically replaces [AUTO_LANGUAGE] with the language code of current user's interface. For example, if the user's UI is in French, the SPARQL's code bd:serviceParam wikibase:language "[AUTO_LANGUAGE],de,en" will be converted to bd:serviceParam wikibase:language "fr,de,en" before being sent to the query service.

 SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],de,en" }

References

[edit | edit source]

mw:Wikidata query service/User Manual


SERVICE - around and box

The service allows to search for items with coordinates located within a certain radius of the center or within a certain bounding box.

Most of the examples below use a map view by using #defaultView:Map at the top of the query. You may switch to table display to see the underlying data.

The property most commonly used for locations is coordinate location (P625).

Search around point

[edit | edit source]

Example Airports within 100km from Berlin:

# Airports within 100km from Berlin
#defaultView:Map
SELECT ?place ?placeLabel ?location ?dist 
WHERE {
  # Berlin coordinates
  wd:Q64 wdt:P625 ?berlinLoc . 
  SERVICE wikibase:around { 
      ?place wdt:P625 ?location . 
      bd:serviceParam wikibase:center ?berlinLoc . 
      bd:serviceParam wikibase:radius "100" . 
      bd:serviceParam wikibase:distance ?dist.
  } 
  FILTER EXISTS {
    # Is an airport
    ?place wdt:P31/wdt:P279* wd:Q1248784 .
  }
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en". } 
} 
ORDER BY ASC(?dist)

Try it!

The first line of the around service call must have format ?item predicate ?location, where the result of the search will bind ?item to items within the specified location and ?location to their coordinates. The parameters supported are:

Predicate Meaning
wikibase:center The point around which search is performed. Must be bound for search to work.
wikibase:radius Distance from the center. Currently the distance is always in kilometers, other units are not supported yet.
wikibase:globe The globe which is being searched. Optional, default it's Earth (wd:Q2).
wikibase:distance The variable receiving distance information

Search within box

[edit | edit source]

Example of box search Schools between San Jose, CA and Sacramento, CA:

# Schools between San Jose, CA and Sacramento, CA
#defaultView:Map
SELECT ?place ?placeLabel ?location 
WHERE {
  wd:Q16553 wdt:P625 ?SJloc.
  wd:Q18013 wdt:P625 ?SCloc.
  SERVICE wikibase:box {
      ?place wdt:P625 ?location.
      bd:serviceParam wikibase:cornerSouthWest ?SJloc.
      bd:serviceParam wikibase:cornerNorthEast ?SCloc.
  }
  ?place wdt:P31/wdt:P279* wd:Q3914.
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en". } 
}

Try it!

Coordinates may be specified directly:

# Schools between San Jose, CA and Sacramento, CA
#same as previous
#defaultView:Map
SELECT ?place ?placeLabel ?location 
WHERE {
  SERVICE wikibase:box {
    ?place wdt:P625 ?location.
    bd:serviceParam wikibase:cornerWest "Point(-121.872777777 37.304166666)"^^geo:wktLiteral.
    bd:serviceParam wikibase:cornerEast "Point(-121.486111111 38.575277777)"^^geo:wktLiteral.
  }
  ?place wdt:P31/wdt:P279* wd:Q3914.
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en". } 
}

Try it!

The first line of the box service call must have format ?item predicate ?location, where the result of the search will bind ?item to items within the specified location and ?location to their coordinates. The parameters supported are:

Predicate Meaning
wikibase:cornerSouthWest The south-west corner of the box.
wikibase:cornerNorthEast The north-east corner of the box.
wikibase:cornerWest The western corner of the box.
wikibase:cornerEast The eastern corner of the box.
wikibase:globe The globe which is being searched. Optional, default it's Earth (wd:Q2).

wikibase:cornerSouthWest and wikibase:cornerNorthEast should be used together, as well as wikibase:cornerWest and wikibase:cornerEast, and can not be mixed. If wikibase:cornerWest and wikibase:cornerEast predicates are used, then the points are assumed to be the coordinates of the diagonal of the box, and the corners are derived accordingly.

Distance function

[edit | edit source]

The function geof:distance returns distance between two points, in kilometers. Example usage:

# Airports within 100km from Berlin
SELECT ?place ?placeLabel ?location ?dist 
WHERE {
  # Berlin coordinates
  wd:Q64 wdt:P625 ?berlinLoc. 
  SERVICE wikibase:around { 
      ?place wdt:P625 ?location. 
      bd:serviceParam wikibase:center ?berlinLoc. 
      bd:serviceParam wikibase:radius "100". 
  } 
  # Is an airport
  ?place wdt:P31/wdt:P279* wd:Q1248784.
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en". } 
  BIND(geof:distance(?berlinLoc, ?location) as ?dist) 
} 
ORDER BY ?dist

Try it!

# Places around 0°,0° 
SELECT ?place ?placeLabel ?location ?dist
WHERE {
  SERVICE wikibase:around { 
      ?place wdt:P625 ?location. 
      bd:serviceParam wikibase:center "Point(0 0)"^^geo:wktLiteral.
      bd:serviceParam wikibase:radius "250". 
  } 
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
  BIND(geof:distance("Point(0 0)"^^geo:wktLiteral, ?location) as ?dist) 
} 
ORDER BY ?dist

Try it!

References

[edit | edit source]


SERVICE - mwapi

Mediawiki API Service allows to call out to Mediawiki API from SPARQL, and receive the results from inside the SPARQL query. The query is initiated by SERVICE with URL wikibase:mwapi. Currently supported Mediawiki endpoints are: *.wikipedia.org, commons.wikimedia.org, www.mediawiki.org, www.wikidata.org, test.wikidata.org.

Currently the following services are supported:

Service Documentation Inputs Outputs Description
Generator see here generator, prop, pprop title, item, pageid, lastrevid, timestamp Call any generator API. Use "generator" parameter to specify, and specific generator parameters to further amend the search (see the example below).
Categories see here titles,cllimit category, title Get a list of categories on the page.
Search see here srsearch,srwhat,srlimit title Full-text search in wiki.
EntitySearch see here search,language,type,limit item,label Wikibase entity search, by title.

Required parameters are in bold. Please refer to the service documentation (linked in Documentation column) for the meaning of input parameters.

Please see full description on Mediawiki API Service documentation page.

Example finding members of wikipedia category

[edit | edit source]
SELECT * WHERE {
  wd:Q6501349 wdt:P910 ?category . # Parking lot - Main category
  ?link schema:about ?category; schema:isPartOf <https://en.wikipedia.org/>; schema:name ?title .
  SERVICE wikibase:mwapi {
     bd:serviceParam wikibase:api "Generator" .
     bd:serviceParam wikibase:endpoint "en.wikipedia.org" .
     bd:serviceParam mwapi:gcmtitle ?title .
     bd:serviceParam mwapi:generator "categorymembers" .
     bd:serviceParam mwapi:gcmprop "ids|title|type" .
     bd:serviceParam mwapi:gcmlimit "max" .
     bd:serviceParam wikibase:limit 50 .
    # out
    ?subcat wikibase:apiOutput mwapi:title  .
    ?ns wikibase:apiOutput "@ns" .
    ?item wikibase:apiOutputItem mwapi:item .
  }
}

Try it!

Depicts statements with Dutch labels, of files in one Commons category

[edit | edit source]
SELECT ?file ?title ?depicts ?depicts_label
WITH
{ SELECT ?file ?title
  WHERE
  { SERVICE wikibase:mwapi
    {
      bd:serviceParam wikibase:api "Generator" .
      bd:serviceParam wikibase:endpoint "commons.wikimedia.org" .
      bd:serviceParam mwapi:gcmtitle "Category:Historia Naturalis van Rudolf II" .
      bd:serviceParam mwapi:generator "categorymembers" .
      bd:serviceParam mwapi:gcmtype "file" .
      bd:serviceParam mwapi:gcmlimit "max" .
      ?title wikibase:apiOutput mwapi:title .
      ?pageid wikibase:apiOutput "@pageid" .
    }
    BIND (URI(CONCAT('https://commons.wikimedia.org/entity/M', ?pageid)) AS ?file)
  }
} AS %get_files
WHERE
{  INCLUDE %get_files
  ?file wdt:P180 ?depicts .
  service <https://query.wikidata.org/sparql> {
    OPTIONAL {?depicts rdfs:label ?depicts_label FILTER (lang(?depicts_label) = 'nl') } 
    }
}

Try it!

References

[edit | edit source]


Modifiers

There are 4 optional modifiers to the SELECT ... WHERE { .. }. See the chapter on SELECT for an introduction.

The five modifiers are GROUP BY ..., HAVING ..., ORDER BY ..., LIMIT ... and OFFSET ....

GROUP BY

[edit | edit source]

Let us consider this list of children of Bach, listing also their mother

SELECT ?mother ?motherLabel ?child ?childLabel
WHERE
{
  ?child wdt:P22 wd:Q1339.# ?child has father Bach
  ?child wdt:P25 ?mother.
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}

Try it!

Let us group this list by the mothers.

SELECT ?mother ?motherLabel (COUNT(?child) AS ?children) 
        (GROUP_CONCAT(DISTINCT ?childLabel; SEPARATOR=", ") AS ?names)
WHERE
{
  ?child wdt:P22 wd:Q1339.# ?child has father Bach
  ?child wdt:P25 ?mother.
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en". 
                          ?mother rdfs:label ?motherLabel.
                          ?child  rdfs:label ?childLabel.
                         }
}
GROUP BY ?mother ?motherLabel

Try it!

We GROUP BY both variables ?mother and ?motherLabel because if we would leave out the label we would end up with an error "Bad Aggregate".

First we added (COUNT(?child) AS ?children) to count the number of children. COUNT is one of the Aggregate functions MIN, MAX, SUM, AVG, COUNT or SAMPLE.
Notice first that it counts the variable ?child. Notice as well that the syntax is (COUNT(?var1) AS ?var2).

As second information we added a combined list of children per mother by using (GROUP_CONCAT(DISTINCT ?var1; SEPARATOR=", ") AS ?var2).

When using GROUP_CONCAT and a label, all labels should be defined explicitly in the SERVICE.

HAVING

[edit | edit source]

HAVING is always used in combination with GROUP BY

SELECT ?mother ?motherLabel (COUNT(?child) AS ?children) 
        (GROUP_CONCAT(DISTINCT ?childLabel; SEPARATOR=", ") AS ?names)
WHERE
{
  ?child wdt:P22 wd:Q1339.# ?child has father Bach
  ?child wdt:P25 ?mother.
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en". 
                          ?mother rdfs:label ?motherLabel.
                          ?child  rdfs:label ?childLabel.
                         }
}
GROUP BY ?mother ?motherLabel 
HAVING (COUNT(?child)>7)

Try it!

HAVING will filter out groups that does not meet the condition specified. In this case only one mother is shown, with 13 children.

As COUNT(?child) is bound to the variable ?children the HAVING clause can also be written as HAVING (?children>7).

The HAVING clause can be useful for finding duplicates for instance like HAVING (COUNT(?var)>1).

ORDER BY

[edit | edit source]

ORDER BY something sorts the results by something. something can be any expression or variables. This expression can also be wrapped in either ASC() or DESC() to specify the sorting order (ascending or descending). (If you don’t specify either, the default is ascending sort, so ASC(something) is equivalent to just something.)

SELECT ?mother ?motherLabel ?child ?childLabel
WHERE
{
  ?child wdt:P22 wd:Q1339.# ?child has father Bach
  ?child wdt:P25 ?mother.
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
ORDER BY DESC(?motherLabel) ?childLabel

Try it!

LIMIT

[edit | edit source]

LIMIT count cuts off the result list at count results, where count is any natural number. For example, LIMIT 10 limits the query to ten results. LIMIT 1 only returns a single result.

This could be useful to get the top 10 results, or only 10 random results to see how the data looks like.

OFFSET

[edit | edit source]

OFFSET count can be used to skip the first number of results. OFFSET 100 LIMIT 10 returns records 101-110.

References

[edit | edit source]


Aggregate functions

Aggregate functions are used in combination with modifier GROUP BY.

Available aggregate functions are:

  • COUNT: the number of elements. You can also write COUNT(*) to simply count all results.
  • SUM, AVG: the sum or average of all elements, respectively. If the elements aren't numbers, you'll get weird results.
  • MIN, MAX: the minimum or maximum value of all elements, respectively. This works for all value types; numbers are sorted numerically, strings and other types lexically.
  • SAMPLE: any element. This is occasionally useful if you know there's only one result, or if you don't care which one is returned.
  • GROUP_CONCAT: concatenates all elements.

Aggregate functions with GROUP BY

[edit | edit source]

See modifier GROUP BY for examples.

# The syntax is like this
SELECT ?group1 ?group2
               (COUNT(?var1) AS ?var2)
               (GROUP_CONCAT(DISTINCT ?var3; SEPARATOR=", ") AS ?var4)
WHERE 
{
...
}
GROUP BY ?group1 ?group2

When using aggregate functions of a label, all labels should be defined explicitly in the SERVICE.

Additionally, you can add a DISTINCT modifier for any of these functions to eliminate duplicate results. For example, if there are two results but they both have the same value in ?var, then COUNT(?var) will return 2 but COUNT(DISTINCT ?var) will only return 1. You often have to use DISTINCT when your query can return the same item multiple times – this can happen if, for example, you use ?item wdt:P31/wdt:P279* ?class, and there are multiple paths from ?item to ?class: you will get a new result for each of those paths, even though all the values in the result are identical. (If you're not grouping, you can also eliminate those duplicate results by starting the query with SELECT DISTINCT instead of just SELECT.)

Aggregate functions with HAVING

[edit | edit source]

Aggregate functions can also be used with the HAVING modifier.

See for instance this query to check for duplicate IATA codes for airports:

SELECT ?object (COUNT(?subject) AS ?count)
               (MIN(?subject) AS ?subject1) (MAX(?subject) AS ?subject2)
               (GROUP_CONCAT(DISTINCT ?subjectLabel; SEPARATOR=", ") AS ?subjectLabels)
WHERE
{
  ?subject wdt:P238 ?object.         # IATA airport code
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en". 
                         ?subject rdfs:label ?subjectLabel.
                         }
}
GROUP BY ?object
HAVING(COUNT(?subject) > 1)
ORDER BY ?object

Try it!

References

[edit | edit source]


Bad aggregate

The error Bad aggregate might appear when coding mistakes upon using GROUP BY. See also the chapter modifiers for a description of GROUP BY.

The error "Bad aggregate" does not specify what exactly is the mistake, hence this explanation to help you find out the cause.

This is a correct example. This list the children of Bach, grouped by their mother:

SELECT ?mother ?motherLabel (COUNT(?child) AS ?children) 
        (GROUP_CONCAT(DISTINCT ?childLabel; SEPARATOR=", ") AS ?names)
WHERE
{
  ?child wdt:P22 wd:Q1339.# ?child has father Bach
  ?child wdt:P25 ?mother.
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en". 
                          ?mother rdfs:label ?motherLabel.
                          ?child  rdfs:label ?childLabel.
                         }
}
GROUP BY ?mother ?motherLabel

Try it!

The first condition is that the grouping variables should match.

  • In the SELECT the grouping is by both variables ?mother and ?motherLabel.
  • In the GROUP BY the same variables ?mother ?motherLabel should be listed.

Try for yourself what happens when you use GROUP BY ?mother.
At the bottom of the long list of the errors you might find the text "Non-aggregate variable in select expression: motherLabel".
If the GROUP BY is left out completely the error message is similar "Non-aggregate variable in select expression: mother".


The second condition is that the rest of the variables in the SELECT should each result into one value.

  • You need expressions that will result into one value, for instance COUNT or one of the other Aggregate functions MIN, MAX, SUM or AVG.
  • You can also use GROUP_CONCAT to list all values into one variable, as all values of ?childLabel into ?names.
  • All variables should be converted with expressions as described above, even if a variable has one value only.

Try for yourself what happens when you change in the rest of the SELECT code.
If for instance you would use ?childLabel you might find at the bottom of the long list of the errors the text "Non-aggregate variable in select expression: childLabel".


The error message is always "Non-aggregate variable in select expression: xxxxx", whether a variable is missing from the GROUP BY or a variable not aggregated at the SELECT clause.


Property paths

Property paths

[edit | edit source]

Statements in a triplestore have a particular Property in the triples. In SPARQL queries you can also write down property paths in the triples.

Property paths are a shorthand to write down a path of properties between two items. The simplest path is just a single property, which forms an ordinary triple:

?item wdt:P31 ?class.

You can add path elements with a forward slash (/).

?item wdt:P31/wdt:P279/wdt:P279 ?class.

This is equivalent to either of the following:

?item wdt:P31 ?temp1.
?temp1 wdt:P279 ?temp2.
?temp2 wdt:P279 ?class.
?item wdt:P31 [ wdt:P279 [ wdt:P279 ?class ] ].

Exercise: (re)write the “grandchildren of Bach” query to use this syntax.

An asterisk (*) after a path element means “zero or more of this element”.

?item wdt:P31/wdt:P279* ?class.
# means:
?item wdt:P31 ?class
# or
?item wdt:P31/wdt:P279 ?class
# or
?item wdt:P31/wdt:P279/wdt:P279 ?class
# or
?item wdt:P31/wdt:P279/wdt:P279/wdt:P279 ?class
# or ...

If there are no other elements in the path, ?a something* ?b means that ?b might also just be ?a directly, with no path elements between them at all.

A plus (+) is similar to an asterisk, but means “one or more of this element”. The following query finds all descendants of Bach:

SELECT ?descendant ?descendantLabel
WHERE
{
  wd:Q1339 wdt:P40+ ?descendant.
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}

Try it!

If we used an asterisk instead of a plus here, the query results would include Bach himself.

A question mark (?) is similar to an asterisk or a plus, but means “zero or one of this element”.

You can separate path elements with a vertical bar (|) instead of a forward slash; this means “either-or”: the path might use either of those properties. (But not both – an either-or path segment always matches a path of length one.)

You can also group path elements with parentheses (()), and freely combine all these syntax elements (/|*+?). This means that another way to find all descendants of Bach is:

SELECT ?descendant ?descendantLabel
WHERE
{
  ?descendant (wdt:P22|wdt:P25)+ wd:Q1339.
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}

Try it!

Instead of using the “child” property to go from Bach to his descendants, we use the “father” and “mother” properties to go from the descendants to Bach. The path might include two mothers and one father, or four fathers, or father-mother-mother-father, or any other combination. (Though, of course, Bach can’t be the mother of someone, so the last element will always be father.)

Summary of the codes after a path element:

Code Meaning
? (Question mark) zero or one of this element
* (Asterisk) zero or more of this element
+ (Plus) one or more of this element
[edit | edit source]

Instead of the normal Triple "subject, predicate, object" it is also possible to write it as inverse link "object, predicate, subject". This can be done by adding ^ in front of the predicate. For normal triples this is not very useful, but for property paths it avoids using dummy variables.

For example this query finds the siblings of Johan Sebastian Bach, by querying siblings with the same father.

SELECT ?sibling ?siblingLabel
WHERE
{
  # Bach   father/has father sibling
  wd:Q1339 wdt:P22/^wdt:P22 ?sibling. # ^ = Inverse link
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}

Try it!

With dummy variable this can be written as

SELECT ?sibling ?siblingLabel
WHERE
{
  # Bach   father/has father sibling
  wd:Q1339 wdt:P22 ?dummy.
  ?dummy ^wdt:P22 ?sibling. # ^ = Inverse link
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}

Try it!

Or without inverse link:

SELECT ?sibling ?siblingLabel
WHERE
{
  # Bach   father/has father sibling
  wd:Q1339 wdt:P22 ?dummy.
  ?sibling wdt:P22 ?dummy.
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}

Try it!

Code Meaning
^ (Circumflex) Inverse link

Instances and classes

[edit | edit source]

Most Wikidata properties are “has” relations: has child, has father, has occupation. But sometimes (in fact, frequently), you also need to talk about what something is. But there are in fact two kinds of relations there:

  • Gone with the Wind is a film.
  • A film is a work of art.

Gone with the Wind is one particular film. It has a particular director (Victor Fleming), a specific duration (238 minutes), a list of cast members (Clark Gable, Vivien Leigh, …), and so on.

Film is a general concept. Films can have directors, durations, and cast members, but the concept “film” as such does not have any particular director, duration, or cast members. And although a film is a work of art, and a work of art usually has a creator, the concept of “film” itself does not have a creator – only particular instances of this concept do.

This difference is why there are two properties for “is” in Wikidata: P31 and P279. Gone with the Wind is a particular instance of the class “film”; the class “film” is a subclass (more specific class; specialization) of the more general class “work of art”.

So what does this mean for us when we’re writing SPARQL queries? When we want to search for “all works of art”, it’s not enough search for all items that are directly instances of “work of art”:

SELECT ?work ?workLabel
WHERE
{
  ?work wdt:P31 wd:Q838948. # instance of work of art
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}

Try it!

As I’m writing this, that query only returns 2815 results – obviously, there are more works of art than that! The problem is that this misses items like Gone with the Wind, which is only an instance of “film”, not of “work of art”. “film” is a subclass of “work of art”, but we need to tell SPARQL to take that into account when searching.

One possible solution to this is the [] syntax we talked about: Gone with the Wind is an instance of some subclass of “work of art”. (For exercise, try writing that query!) But that still has problems:

  1. We’re no longer including items that are directly instances of work of art.
  2. We’re still missing items that are instances of some subclass of some other subclass of “work of art” – for example, Snow White and the Seven Dwarfs is an animated film, which is a film, which is a work of art. In this case, we need to follow two “subclass of” statements – but it might also be three, four, five, any number really.

The solution: ?item wdt:P31/wdt:P279* ?class. This means that there’s one “instance of” and then any number of “subclass of” statements between the item and the class.

SELECT ?work ?workLabel
WHERE
{
  ?work wdt:P31/wdt:P279* wd:Q838948. # instance of any subclass of work of art
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
LIMIT 1000

Try it!

I don’t recommend running that query for all works of art. WDQS can handle it (just barely), but your browser might crash when trying to display the results because there’s so many of them. For that reason a LIMIT 1000 is inserted.

Now you know how to search for all works of art, or all buildings, or all human settlements: the magic incantation wdt:P31/wdt:P279*, along with the appropriate class. This uses some more SPARQL features that I haven’t explained yet, but quite honestly, this is almost the only relevant use of those features, so you don’t need to understand how it works in order to use WDQS effectively.

References

[edit | edit source]


Variables

A query variable is marked by the use of either "?" or "$"; the "?" or "$" is not part of the variable name. In a query, $abc and ?abc identify the same variable.

In this book only variables marked with "?" are used.

Allowable characters are [a-z], [A-Z], [0-9], _, · and also diacrictics like áéíóú etcetera.

Example

SELECT $child_of_Bach $child_of_BachLabel
WHERE
{
  $child_of_Bach wdt:P22 wd:Q1339.
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}

Try it!


Expressions and Functions

Expressions

[edit | edit source]

The BIND( expression AS ?variable ). clause can be used to assign the result of an expression to a variable (usually a new variable, but you can also overwrite existing ones).

# Persons died by capital punishment
SELECT ?person ?personLabel ?personDescription ?age
WHERE
{
  ?person wdt:P31 wd:Q5;
          wdt:P569 ?born;
          wdt:P570 ?died;
          wdt:P1196 wd:Q8454.
  BIND(?died - ?born AS ?ageInDays).
  BIND(?ageInDays/365.2425 AS ?ageInYears).
  BIND(FLOOR(?ageInYears) AS ?age).
  # or, as one expression:
  #BIND(FLOOR((?died - ?born)/365.2425) AS ?age).
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}

Try it!

BIND can also be used to simply bind constant values to variables in order to increase readability. For example, a query that finds all female priests:

SELECT ?woman ?womanLabel
WHERE
{
  ?woman ?instanceOf ?human;
         ?sexOrGender ?female;
         ?occupation ?priest.
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }

  # linking to wikidata Q and P codes
  BIND(wdt:P31 AS ?instanceOf).
  BIND(wd:Q5 AS ?human).
  BIND(wdt:P21 AS ?sexOrGender).
  BIND(wd:Q6581072 AS ?female).
  BIND(wdt:P106 AS ?occupation).
  BIND(wd:Q42603 AS ?priest).
}

Try it!

The above gives the same result as

SELECT ?woman ?womanLabel
WHERE
{
  ?woman wdt:P31 wd:Q5;        # instance of human
         wdt:P21 wd:Q6581072;  # sex or gender female
         wdt:P106 wd:Q42603.   # occupation priest
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}

Try it!

The IF( condition, thenExpression, elseExpression ) expressions are used to assign different values based on a condition.

For instance to calculate the age, precisely to the day. In the expression BIND( .... - (IF( condition,1,0 ) AS ?age) the condition determines to subtract 1 or 0 (nothing) from the ?age based on the month and the day of the birth and the death.

# Query to find all musicians who have already died 
# calculate their age (full years) at death
# count how many of them died at each age
#defaultView:LineChart
SELECT ?age  (COUNT (DISTINCT ?a) AS ?count) 
WHERE {
        ?a wdt:P31 wd:Q5.                   #instance of human
        ?a wdt:P106/wdt:P279 wd:Q639669.    #occupation a subclass of musician
        ?a p:P569/psv:P569 ?birth_date_node. 
        ?a p:P570/psv:P570 ?death_date_node.
        ?birth_date_node wikibase:timeValue ?birth_date.
        ?death_date_node wikibase:timeValue ?death_date.
        BIND( YEAR(?death_date) - YEAR(?birth_date) - 
              IF(MONTH(?death_date)<MONTH(?birth_date) || 
                 (MONTH(?death_date)=MONTH(?birth_date) && DAY(?death_date)<DAY(?birth_date)),1,0) AS ?age )
        # calculate the age, precisely to the day (times and timezones ignored)
        FILTER(?age > 10 && ?age < 100).    #ignore outlyers, several of which are probably errors
}
GROUP BY ?age
ORDER BY ?age

Try it!

Here an example to divide population in groups. The ?layer in the #defaultView:Map shows it in different colors.

#Big cities, grouped into map layers by population
#defaultView:Map
SELECT DISTINCT ?city ?cityLabel (SAMPLE(?location) AS ?location) (MAX(?population) AS ?population) (SAMPLE(?layer) AS ?layer)
WHERE
{
  ?city wdt:P31/wdt:P279* wd:Q515;
        wdt:P625 ?location;
        wdt:P1082 ?population.
  FILTER(?population >= 500000).
  BIND(
    IF(?population < 1000000, "<1M",
    IF(?population < 2000000, "1M-2M",
    IF(?population < 5000000, "2M-5M",
    IF(?population < 10000000, "5M-10M",
    IF(?population < 20000000, "10M-20M",
    ">20M")))))
    AS ?layer).
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
GROUP BY ?city ?cityLabel
ORDER BY ?population

Try it!

Operators
[edit | edit source]

The familiar mathematical operators are available: +, -, *, / to add, subtract, multiply or divide numbers.

The operators <, >, =, <=, >= are available to compare values. Comparison is also defined for other types; for example, "abc" < "abd" is true (lexical comparison), as is "2016-01-01"^^xsd:dateTime > "2015-12-31"^^xsd:dateTime and wd:Q4653 != wd:Q283111. The inequality test ≠ is written !=. The ! can also be used as a prefix to functions that results a boolean like !BOUND and !REGEX.

Boolean conditions can be combined with && (logical and: a && b is true if both a and b are true) and || (logical or: a || b is true if either (or both) of a and b is true).

Examples IF( ?a != ?b, ... , ... ) and IF( ?a = ?b && ?c = ?d, ... , ... ).

VALUES
[edit | edit source]

The VALUES ?var { val1 ... } clause generates a variable with one or more values. The values can be of any type, like numeric, strings, or even items, like VALUES ?number { 1 2 3 }, VALUES ?abc { "a" "b" "c" } or VALUES ?city { wd:Q84 wd:Q90 }.

SELECT ?city ?cityLabel WHERE {
     VALUES ?city { wd:Q84 wd:Q90 }. # London, Paris 
     SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }  
}

Try it!

Values can have more dimensions too. The dimensions are grouped by ( ) like VALUES ( ?varA ?varB ... ) { ( valA1 valB1 ... ) ... }.

SELECT ?country ?capital WHERE {
     VALUES ( ?country ?capital ) { 
            ( "United Kingdom" "London" )
            ( "France"         "Paris"  )
            ( "Etc"            UNDEF    ) # Use UNDEF for an undefined value
            }
}

Try it!

Functions

[edit | edit source]

General functions

[edit | edit source]
DATATYPE
[edit | edit source]
See also: d:Special:ListDatatypes

Each value in SPARQL has a type, which tells you what kind of value it is and what you can do with it. The most important types are:

  • item, like wd:Q42 for Douglas Adams (Q42).
  • boolean, with the two possible values true and false. Boolean values aren’t stored in statements, but many expressions return a boolean value, e. g. 2 < 3 (true) or "a" = "b" (false).
  • string, a piece of text. String literals are written in double quotes.
  • monolingual text, a string with a language tag attached. In a literal, you can add the language tag after the string with an @ sign, e. g. "Douglas Adams"@en.
  • numbers, either integers (1) or decimals (1.23).
  • dates. Date literals can be written by adding ^^xsd:dateTime (case sensitive – ^^xsd:datetime won’t work!) to an ISO 8601 date string: "2012-10-29"^^xsd:dateTime.
# Date related properties of Bach
SELECT ?predicate ?pLabel ?object
WHERE
{
  wd:Q1339 ?predicate ?object.         # Bach
  FILTER(DATATYPE(?object) = xsd:dateTime).
  
  BIND( IRI(REPLACE( STR(?predicate),"prop/direct/","entity/" )) AS ?p). 
  # or ?p wikibase:directClaim ?predicate. 
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
ORDER BY ?object

Try it!

Some datatypes are rdf:langString, xsd:string, xsd:integer, xsd:dateTime, xsd:decimal or "<http://www.opengis.net/ont/geosparql#wktLiteral>".

The STR( value ) function converts values to a string. It also resolves prefixes, for instance wd:Q1339 will be converted into http://www.wikidata.org/entity/Q1339. The reversal of this is the IRI( string ), which will convert a string to an IRI.

SELECT ?item ?itemLabel ?string ?stringLabel ?iri ?iriLabel 
WHERE {
    VALUES ?item { wd:Q1339 }.
    BIND( STR(?item) AS ?string ).
  	BIND( IRI(?string) AS ?iri ).
    SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}

Try it!

the IRI( string ), which will convert a string to an IRI, by applying the prefixes.

In the example below a Predicate like wdt:P569 get converted to a string http://www.wikidata.org/prop/direct/P569. To get a Label for the predicate it first needs to be replaced to get an entity (http://www.wikidata.org/entity/P569), which with IRI gets converted to wd:P569. The label of this can then be displayed as "date of birth (P569)".

# Date related properties of Bach
SELECT ?object ?predicate ?string ?entity ?p ?pLabel 
WHERE
{
  wd:Q1339 ?predicate ?object.                    # Bach
  FILTER(DATATYPE(?object) = xsd:dateTime).
  BIND( STR(?predicate) AS ?string ).
  BIND( REPLACE( ?string,"prop/direct/","entity/" ) AS ?entity ).
  BIND( IRI(?entity) AS ?p ).
  
#  or all on one line: 
#  BIND( IRI(REPLACE( STR(?predicate),"prop/direct/","entity/" )) AS ?p). 
 
#   This can also be written as: ?p wikibase:directClaim ?predicate. 

  SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
ORDER BY ?object

Try it!

The LANG( string ) function retrieves the language tag of Labels, Descriptions, Aliases and of Monolingual texts.

#Countries in European Union with native name and language
SELECT ?country ?countryLabel ?nativename ?language
WHERE
{
  wd:Q458 wdt:P150 ?country.   # European Union  contains administrative territorial entity
  OPTIONAL { ?country wdt:P1705 ?nativename.
              BIND( LANG(?nativename) AS ?language). }
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
ORDER BY DESC(?language)

Try it!

BOUND
[edit | edit source]

The function BOUND( variable ) returns true if variable is bound to a value. It returns false otherwise. The BOUND expression is used often with IF(condition,thenExpression,elseExpression) expressions. For example, suppose you have a query that shows some humans, and instead of just showing their label, you’d like to display their pseudonym (P742) if they have one, and only use the label if a pseudonym doesn’t exist. For this, you select the pseudonym in an OPTIONAL clause (it has to be optional – you don’t want to throw out results that don’t have a pseudonym), and then use BIND(IF(BOUND(… to select either the pseudonym or the label.

SELECT ?writer ?label
WHERE
{
  ?writer wdt:P31 wd:Q5;                                         # French writer 
          wdt:P27 wd:Q142;
          wdt:P106 wd:Q36180;
          wdt:P569 ?dob.
  FILTER("1751-01-01"^^xsd:dateTime <= ?dob && ?dob < "1801-01-01"^^xsd:dateTime) # born in the second half of the 18th century
  ?writer rdfs:label ?writerLabel.                               # get the English label
  FILTER(LANG(?writerLabel) = "en")  
  OPTIONAL { ?writer wdt:P742 ?pseudonym. }                      # get the pseudonym, if it exists
  BIND(IF(BOUND(?pseudonym),?pseudonym,?writerLabel) AS ?label). # bind the pseudonym, or if it doesn’t exist the English label, as ?label
}

Try it!

IN and NOT IN
[edit | edit source]

The function value IN( list of values ) checks if a value is found in a list of values. For example 2 IN( 1, 2, 3 ) is true. A similar function value NOT IN( list of values ) checks if a value is not found in a list of values.

# Railway stations in London or Paris
SELECT ?cityLabel ?station ?stationLabel ?location 
WHERE {
  ?station wdt:P31 wd:Q55488.         # is a railway station
  ?station wdt:P131* ?city.
  ?station wdt:P625  ?location
  FILTER( ?city IN(wd:Q84, wd:Q90) ). # in London or Paris
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en,fr". }
}

Try it!

isBLANK
[edit | edit source]

The isBLANK( variable ) expression checks if a value is an "unknown value".

#Demonstrates "unknown value" handling
SELECT ?human ?humanLabel
WHERE
{
	?human wdt:P21 ?gender
	FILTER isBLANK(?gender) .
	SERVICE wikibase:label { bd:serviceParam wikibase:language "en" }
}

Try it!

Wikidata value can also have special value "no value". With normal triples this simply results in the absence of a result.

# persons who were stateless (country of citizenship: no value) for some time (start time and end time qualifiers)
SELECT ?person ?personLabel ?start ?end 
WHERE {
  ?person wdt:P31 wd:Q5;
          p:P27 [
            a wdno:P27;			# no value for country of citizenship
            pq:P580 ?start;
            pq:P582 ?end
          ].
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
ORDER BY DESC(?end - ?start)

Try it!

The prefix wdno: selects the "no value" node.
The code a is a special word.

Functions on strings

[edit | edit source]
STRLEN
[edit | edit source]

The function STRLEN( string ) returns the length of a string. For instance STRLEN(?name) where ?name is "ABC" will return 3.

SUBSTR
[edit | edit source]

The function SUBSTR( string, beginposition, stringlength ) returns a substring of a string beginning at the position marked by beginposition, and will have a length as indicated. For instance SUBSTR( ?name,2,3 ) where ?name = "ABCDEFGH" will return "BCD". A substring without length like SUBSTR( ?name,2 ) will return "BCDEFGH".

UCASE
[edit | edit source]

The function UCASE( string ) returns the string in Upper case. For instance a string "Abc" will be returned as "ABC".

LCASE
[edit | edit source]

The function LCASE( string ) returns the string in Lower case. For instance a string "Abc" will be returned as "abc".

STRSTARTS
[edit | edit source]

The function STRSTARTS( string, comparestring ) checks if the string starts with the compare string. For instance STRSTARTS( "ABCDEFGH", "ABC" ) returns true, indicating that the string starts with "ABC".

STRENDS
[edit | edit source]

The function STRENDS( string, comparestring ) checks if the string ends with the compare string. For instance STRENDS( "ABCDEFGH", "FGH" ) returns true, indicating that the string ends with "FGH".

CONTAINS
[edit | edit source]

The function CONTAINS( string, comparestring ) checks if the string contains the compare string. For instance CONTAINS( "ABCDEFGH", "DEF" ) returns true, indicating that the string contains "DEF".

STRBEFORE
[edit | edit source]

The function STRBEFORE( string, comparestring ) returns the part of the string before the compare string. For instance STRBEFORE( "ABCDEFGH", "DEF" ) returns "ABC". If the compare string is not found it returns "".

STRAFTER
[edit | edit source]

The function STRAFTER( string, comparestring ) returns the part of the string after the compare string. For instance STRAFTER( "ABCDEFGH", "DEF" ) returns "GH". If the compare string is not found, it returns "".

ENCODE_FOR_URI
[edit | edit source]

The function ENCODE_FOR_URI( string ) converts the special characters in the string, to be able to use it in an web URL. For instance ENCODE_FOR_URI( "ABC DËFGH" ) returns "ABC%20D%C3%8BFGH". The reverse of this function is wikibase:decodeUri( string ).

CONCAT
[edit | edit source]

The function CONCAT( string1, string2 ... ) returns the concatenation of 2 or more strings. For instance CONCAT( "ABCDEFGH", "XYZ" ) returns "ABCDEFGHXYZ".

LANGMATCHES
[edit | edit source]

The function LANGMATCHES( languagetag, languagerange ) checks if the language tag matches the language range. For example this code FILTER LANGMATCHES( LANG(?label), "fr" ). filters labels in the french language. It is similar to FILTER (LANG(?label) = "fr"). with the difference that the LANGMATCHES filter will also output language tags with regions, e.g. "fr-BE" while (FILTER (LANG(?label) = "fr"). will only output the strings with the exact tag "fr". A language range of "*" matches any non-empty language-tag string.

REGEX
[edit | edit source]

The function REGEX( string, pattern, flag ) checks if the string matches the pattern. It is mostly used in a FILTER clause like FILTER REGEX( string, pattern )..

The pattern may contain different special characters. Blazegraph instances such as the Wikidata Query Service interpret the pattern as a Java Pattern,[1] which is a subset of ICU regular expressions. The table below lists the most common special characters:

Special characters Meaning
(a|b) a or b
[abc] Range (a or b or c)
[^abc] Not (a or b or c)
[a-q] Lower case letter from a to q
[A-Q] Upper case letter from A to Q
[0-7] Digit from 0 to 7
. Wildcard: Matches any single character except \n.
   
Special characters Meaning
* 0 or more
+ 1 or more
? 0 or 1
{3} Exactly 3
{3,} 3 or more
{3,5} 3, 4 or 5
(pattern) Matches pattern and saves the match
\1 Retrieves the saved match
(?:pattern) Matches pattern but does not save the match
   
Special characters Meaning
^ Start of string, or start of line in multi-line pattern
\A Start of string
\b \B Word boundary / Not word boundary
\d \D Digits [0-9] / Nondigit characters [^0-9]
\p{ name } Matches any single character in the Unicode general category or named block specified by name
\w \W The characters [A-Za-z0-9_] / None of the characters [A-Za-z0-9_]
\< Start of word
\> End of word
$ End of string, or end of line in multi-line pattern
\Z End of string

The flag is optional. Flag "i" means the match is case-insensitive.

# Names of human settlements ending in "-ow" or "-itz" in Germany
#defaultView:Map
SELECT DISTINCT ?item ?itemLabel ?coord
WHERE 
{
   ?item wdt:P31/wdt:P279* wd:Q486972;   # instance/subclass of human settlement
         wdt:P17 wd:Q183;                # Germany
         rdfs:label ?itemLabel;
         wdt:P625 ?coord;
   FILTER (LANG(?itemLabel) = "de"). 
   FILTER REGEX (?itemLabel, "(ow|itz)$").
}
LIMIT 1000

Try it!

Examples

  1. title might be a rhyme FILTER(REGEX(?title, "^\\w*(\\w{3})(\\W+\\w*\\1)+$", "i") && !REGEX(?title, "^(\\w+)(\\W+\\1)+$", "i")).
  2. title is an alliteration FILTER(REGEX(STR(?title), "^(\\p{L})\\w+(?:\\W+\\1\\w+){2,}$", "i")).

PS: A single \ is used as an escape symbol in strings, so \\ is used to indicate a single \.

REPLACE
[edit | edit source]

The function REPLACE( string, pattern, replacement, flag ) returns the string after replacing all occurrences of pattern in string with replacement. pattern is interpreted the same way as in REGEX. The replacement can contain $n or ${name}, which are replaced by the corresponding numbered or named capture group in the pattern.[1][2] An optional flag affects the regular expression pattern, just as with the flag argument to REGEX(). For example, REPLACE( "ABCDEFGH", "DEF", "_def_" ) returns "ABC_def_GH". REPLACE( "ABCDEFGH", "[AEIOU]", "" ) removes all the vowels from the original string.

Functions on numbers

[edit | edit source]

The function ABS( number ) returns the absolute value of a number. For instance, ABS( -1 ) returns 1.

ROUND
[edit | edit source]

The function ROUND( number ) returns the rounded value of a number. For instance, ROUND( 1.4 ) returns 1, while ROUND( 1.6 ) returns 2.

The function CEIL( number ) returns the largest number (round up). For instance, both CEIL( 1.4 ) and CEIL( 1.6 ) return 2.

FLOOR
[edit | edit source]

The function FLOOR( number ) returns the smallest number (round down). For instance, both FLOOR( 1.4 ) and FLOOR( 1.6 ) return 1.

The function RAND( ) returns a random value between 0 and 1. For instance, RAND( ) returns 0.7156405780739334.

COUNT, MIN, MAX, AVG and SUM
[edit | edit source]

The functions COUNT, MIN, MAX, AVG, and SUM can only be used as Aggregate functions.

# average age of painters by century
SELECT ?century (AVG(?age) AS ?average_age) (ROUND(AVG(?age)) AS ?rounded_average_age)
WHERE 
{
  ?item wdt:P31 wd:Q5.          # is a human
  ?item wdt:P106 wd:Q1028181.   # occupation painter
  ?item wdt:P569 ?born.
  ?item wdt:P570 ?died.
  FILTER( ?died > ?born ).
  BIND( (?died - ?born)/365.2425 AS ?age )
  BIND( FLOOR(YEAR(?born)/100)*100 AS ?century ) 
}
GROUP BY ?century
ORDER BY ?century

Try it!

For instance, BIND( MAX( ?var1, ?var2 ) AS ?max) does NOT work. Instead, use the expression BIND( IF( ?var1>?var2, ?var1, ?var2 ) AS ?max).

Functions on dates and times

[edit | edit source]

The function NOW( ) returns the current date and time.

With constructions like NOW() + "P1D"^^xsd:duration it is possible to add or subtract days from the current date.
NOW() + "P1M"^^xsd:duration will add 1 month.
NOW() + "P1Y"^^xsd:duration will add 1 year.
You can add or subtract any combination of years, months, days and even hours, minutes and seconds using "P1Y1M1DT0H0M0.000S"^^xsd:duration.

YEAR, MONTH and DAY
[edit | edit source]

The function YEAR( datevalue ) returns the year of the date value. The functions MONTH and DAY return the month or day.

HOURS, MINUTES and SECONDS
[edit | edit source]

The function HOURS( datevalue ) returns the hour of the date value. The functions MINUTES and SECONDS returns the minutes or seconds.
Currently Wikidata does not hold date values in hour, minutes or seconds.

TIMEZONE and TZ
[edit | edit source]

The function TIMEZONE( datevalue ) returns the time zone of the date value. Currently in Wikidata all dates have a TIMEZONE of "PT0S" for UTC. Other valid values would be between -PT14H and PT14H indicating the time offset in hours.

The function TZ( datevalue ) returns the time zone as a simple literal of the date value. Currently in Wikidata all dates have a TZ of "Z" for UTC.

Example with dates

# Query to find all musicians who have already died 
# calculate their age (full years) at death
# count how many of them died at each age
#defaultView:LineChart
SELECT ?age  (COUNT (DISTINCT ?a) AS ?count) 
WHERE {
        ?a wdt:P31 wd:Q5.                   #instance of human
        ?a wdt:P106/wdt:P279 wd:Q639669.    #occupation a subclass of musician
        ?a p:P569/psv:P569 ?birth_date_node. 
        ?a p:P570/psv:P570 ?death_date_node.
        ?birth_date_node wikibase:timeValue ?birth_date.
        ?death_date_node wikibase:timeValue ?death_date.
        BIND( YEAR(?death_date) - YEAR(?birth_date) - 
              IF(MONTH(?death_date)<MONTH(?birth_date) || 
                 (MONTH(?death_date)=MONTH(?birth_date) && DAY(?death_date)<DAY(?birth_date)),1,0) AS ?age )
        # calculate the age, precisely to the day (times and timezones ignored)
        FILTER(?age > 10 && ?age < 100).    #ignore outliers, several of which are probably errors
}
GROUP BY ?age
ORDER BY ?age

Try it!

Functions on coordinates

[edit | edit source]
geof:distance
[edit | edit source]

The function geof:distance returns distance between two points, in kilometers.

Example usage:

# distance between 2 cities
SELECT ?city1 ?city1Label ?location1 ?city2 ?city2Label ?location2 ?dist 
WHERE
{
     VALUES ?city1 { wd:Q84 }. # London 
     VALUES ?city2 { wd:Q90 }. # Paris
     ?city1 wdt:P625 ?location1.
     ?city2 wdt:P625 ?location2.
     BIND(geof:distance(?location1, ?location2) as ?dist) 
     SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }  
}

Try it!

geof:longitude and geof:latitude
[edit | edit source]

The functions geof:longitude and geof:latitude convert a well-known text literal (<http://www.opengis.net/ont/geosparql#wktLiteral>) of a Point geometry to its longitude and latitude, respectively.

Example usage:

# Museums in Barcelona with coordinates
SELECT ?item ?itemLabel ?coord ?lon ?lat
WHERE
{
   ?item wdt:P131 wd:Q1492;   # in the administrative territory of Barcelona
         wdt:P31 wd:Q33506;   # is a museum
         wdt:P625 ?coord.
   BIND(geof:longitude(?coord) AS ?lon)
   BIND(geof:latitude(?coord)  AS ?lat)
   SERVICE wikibase:label { bd:serviceParam wikibase:language "en". } 
}

Try it!

wikibase:geoLongitude and wikibase:geoLatitude
[edit | edit source]

It is also possible to retrieve the longitude and latitude of a coordinate node using wikibase:geoLongitude and wikibase:geoLatitude qualifiers. See here for an explanation.

# Museums in Barcelona with coordinates
SELECT ?item ?itemLabel ?coord ?lon ?lat
WHERE
{
   ?item wdt:P131 wd:Q1492;   # in the administrative territory of Barcelona
         wdt:P31 wd:Q33506;   # is a museum
         p:P625 [
            ps:P625 ?coord;
            psv:P625 [
               wikibase:geoLongitude ?lon;
               wikibase:geoLatitude  ?lat; 
               ]
          ].
   SERVICE wikibase:label { bd:serviceParam wikibase:language "en". } 
}

Try it!

AUTO_COORDINATES
[edit | edit source]

The code BIND("[AUTO_COORDINATES]" as ?loc) returns the current location, if you allow your browser to use it.

# Drinking establishments near me
SELECT DISTINCT ?pub ?pubLabel ?dist
WHERE
{
  BIND("[AUTO_COORDINATES]" as ?loc)  .
  SERVICE wikibase:around {
      ?pub wdt:P625 ?location .
      bd:serviceParam wikibase:center ?loc. 
      bd:serviceParam wikibase:radius "1" .
  }
  FILTER EXISTS { ?pub wdt:P31/wdt:P279* wd:Q5307737 }
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en" }
  BIND(geof:distance(?loc, ?location) as ?dist)
} ORDER BY ?dist

Try it!

Aggregate functions

[edit | edit source]

See Aggregate functions for COUNT, MIN, MAX, SUM, AVG, SAMPLE and GROUP_CONCAT

References

[edit | edit source]
  1. a b The Wikidata Query Service uses a Wikimedia fork of Blazegraph that depends on OpenJDK 8. Other instances may run different versions of Java with different levels of Unicode support.
  2. QLever uses the \n syntax instead, which you need to escape as \\n inside a string literal.


Federated query

Federated query

[edit | edit source]

Federated query is the ability to take a query and provide solutions based on information from many different sources.

A building block is the ability to have one query be able to issue a query on another SPARQL endpoint during query execution.

Example

[edit | edit source]

Federated query comparing Wikidata and UK Parliament database, to show differences of more than 10km in locations of UK parliamentary constituencies (source Wikidata:Wikidata:Status updates/2019 06 10)

# compare lat/long of Parliament and Wikidata constituency records
#defaultView:Map{"hide":["?line"]}
PREFIX parliament:<https://id.parliament.uk/schema/>

SELECT DISTINCT ?constituency ?parlcoord ?item ?itemLabel ?wdcoord ?dist ?line WHERE {
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
  SERVICE <https://api.parliament.uk/sparql> 
    { ?constituency parliament:constituencyGroupHasConstituencyArea ?area .
      ?area parliament:latitude ?lat . ?area parliament:longitude ?long . 
      bind(SUBSTR(str(?constituency),26) as ?parlid) . }
  BIND(concat("Point(",str(?long)," ",str(?lat),")") as ?parlcoord) 
  # get constituencies from Parliament with coordinates
  ?item wdt:P6213 ?parlid . ?item wdt:P31 wd:Q27971968 . ?item wdt:P625 ?wdcoord . 
  # now get them from Wikidata with coordinates
  BIND(geof:distance(?parlcoord, ?wdcoord) as ?dist) . filter (?dist >= 10)
  # now find out the distance (in kms)
  ?item p:P625 ?statementnode. ?statementnode psv:P625 ?valuenode.
  ?valuenode wikibase:geoLatitude ?wikilat . ?valuenode wikibase:geoLongitude ?wikilon.
  BIND(CONCAT('LINESTRING (', STR(?wikilon), ' ', STR(?wikilat), ',', STR(?long), ' ', STR(?lat), ')') AS ?str) .
  BIND(STRDT(?str, geo:wktLiteral) AS ?line) 
}

Try it!

References

[edit | edit source]


Subqueries

SPARQL allows one SELECT query to be nested inside another. The inner SELECT query is called a subquery and is evaluated first. The subquery result variable(s) can then be used in the outer SELECT query.

Simplest example:

SELECT ?x ?y WHERE {
  VALUES ?x { 1 2 3 4 }
  {
    SELECT ?y WHERE { VALUES ?y { 5 6 7 8 }  }
  }  # \subQuery
} # \mainQuery

Try it!

The example below calculates the population of each country in the world, expressing the population as a percentage of the world's total population. In order to calculate the world's total population, it uses a subquery.

SELECT ?countryLabel ?population (round(?population/?worldpopulation*1000)/10 AS ?percentage)
WHERE {
  ?country wdt:P31 wd:Q3624078;    # is a sovereign state
           wdt:P1082 ?population.

  { 
    # subquery to determine ?worldpopulation
    SELECT (sum(?population) AS ?worldpopulation)
    WHERE { 
      ?country wdt:P31 wd:Q3624078;    # is a sovereign state
               wdt:P1082 ?population. 
    }
  }

  SERVICE wikibase:label {bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en".}
}
ORDER BY desc(?population)

Try it!

The syntax of a query with a subquery is shown below. A subquery is basically the same as a simple query and is enclosed within { brackets }.

SELECT  ... query result variables ...
WHERE 
{
        ... query pattern ...

        { # subquery
          SELECT  ... subquery result variables ...
          WHERE 
          {
                  ... subquery pattern ...
          }
                  ... optional subquery modifiers ...
        } # end of subquery

}
        ... optional query modifiers ...

Subqueries can be used, often with a LIMIT, to avoid a query timeout by fractioning the task. As an example, this query is timing out :

#100 humans with exactly 6 months between their month of birthday and their month of death.
SELECT DISTINCT ?itemLabel ?item WHERE {
  ?item wdt:P31 wd:Q5 ;
        p:P569/psv:P569 [wikibase:timePrecision ?datePrecision1; wikibase:timeValue ?naissance] ;
        p:P570/psv:P570 [wikibase:timePrecision ?datePrecision2; wikibase:timeValue ?mort ].
  filter(?datePrecision1>10)
  filter(?datePrecision2>10)
  
  bind(month(?mort) - month(?naissance) as ?mois)
  bind(day(?mort) - day(?naissance) as ?jour)
  
  filter(abs(?mois) = 6)
  filter(?jour = 0)

  SERVICE wikibase:label {bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en".}
}
ORDER BY ?itemLabel
LIMIT 100

Try it!

but the same query putting the limit on the selected items in a subquery and the label service outside it didn't timeout :

#100 humans with exactly 6 months between their month of birthday and their month of death.
SELECT DISTINCT ?itemLabel ?item WHERE {
  {
    SELECT DISTINCT ?item WHERE {
      ?item wdt:P31 wd:Q5 ; 
            p:P569/psv:P569 [wikibase:timePrecision ?datePrecision1; wikibase:timeValue ?naissance] ;
            p:P570/psv:P570 [wikibase:timePrecision ?datePrecision2; wikibase:timeValue ?mort ].

      filter(?datePrecision1>10)
      filter(?datePrecision2>10)

      bind(month(?mort) - month(?naissance) as ?mois)
      bind(day(?mort) - day(?naissance) as ?jour)
      filter(abs(?mois) = 6)
      filter(?jour = 0)
    }
    LIMIT 100
  }

  SERVICE wikibase:label {bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en".}
}
ORDER BY ?itemLabel

Try it!


Templates

Within Wikidata Query Service Query Helper the comment #TEMPLATE can build a simple template where the user can choose one or more variables to change a query without needing to know the SPARQL query language.

See below an example to select presidents and their spouses from any country:

#TEMPLATE={"template":"Presidents of ?country and their spouses","variables":{"?country":{"query":" SELECT ?id WHERE { ?id wdt:P31 wd:Q6256 . }"} } }

SELECT ?p ?pLabel ?ppicture ?w ?wLabel ?wpicture WHERE {
  BIND(wd:Q30 AS ?country)        # United States of America
  ?country (p:P6/ps:P6) ?p.       # Head of government
  ?p wdt:P26 ?w.                  # Spouse
  OPTIONAL {
    ?p wdt:P18 ?ppicture.
    ?w wdt:P18 ?wpicture.
  }
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}

Try it!

The "template" contains the texts and one ore more variables.
The "variables" list the variables and optionally a "query" to select possible values, in this case ?id is instance of country.

If no query is needed the syntax is "?var1":{} In that case that might be
  #TEMPLATE={"template":"Presidents of ?country and their spouses","variables":{"?country":{} } }

Mind: BIND(wd:Q30 AS ?country) is used as a default for the variable ?country.

References

[edit | edit source]


WIKIDATA Qualifiers, References and Ranks

Wikidata datamodel

The data on WIKIDATA contains more info than only the triples. For a full description see Wikidata:Glossary.

Let us look at Douglas Adams (Q42) and where he is educated at (P69).

Qualifiers

[edit | edit source]

Let us list the education of Douglas Adams and the qualifiers Start time end End time:

SELECT ?education ?educationLabel ?starttime ?endtime
WHERE
{
     wd:Q42 p:P69 ?statement.
     ?statement ps:P69 ?education.
     ?statement pq:P580 ?starttime.
     ?statement pq:P582 ?endtime.
     SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
ORDER BY ?starttime

Try it!

The prefix p: points not to the object, but to a statement node. This node then is the subject of other triples.
The prefix ps: within the statement node retrieves the object.
The prefix pq: within the statement node retrieves the qualifier information.

The code can be abbreviated a lot with the [ ] syntax by eliminating the variable ?statement.

SELECT ?education ?educationLabel ?starttime ?endtime
WHERE
{
     wd:Q42 p:P69 [ps:P69 ?education;
                   pq:P580 ?starttime;
                   pq:P582 ?endtime;
                  ].
     SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
ORDER BY ?starttime

Try it!

References

[edit | edit source]

Let us list the education of Douglas Adams and the stated in (P248) reference:

SELECT ?education ?educationLabel ?ref ?refLabel
WHERE
{
     wd:Q42 p:P69 ?statement.
     ?statement ps:P69 ?education.
     ?statement prov:wasDerivedFrom ?refnode.
     ?refnode   pr:P248 ?ref.
     SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}

Try it!

The prefix p: points not to the object, but to a statement node. This node then is the subject of other triples.
The prefix ps: within the statement node retrieves the object.
The prov:wasDerivedFrom within the statement node points to a new reference node.
The prefix pr: within the reference node retrieves the reference information.

The code can be abbreviated a lot with the [ ] syntax by eliminating the variables ?statement and ?refnode.

SELECT ?education ?educationLabel ?ref ?refLabel
WHERE
{
     wd:Q42 p:P69 [ ps:P69 ?education;
                    prov:wasDerivedFrom 
                    [ pr:P248 ?ref; 
                    ]
                  ].
     SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}

Try it!

You might have noticed that only one of the 2 educations is listed in the queries above. To list both we need to introduce OPTIONAL{ }. As this can only be used with full sentences we need to use the full expanded syntax with triples as short sentence:

SELECT ?education ?educationLabel ?ref ?refLabel
WHERE
{
     wd:Q42 p:P69 ?statement.
     ?statement ps:P69 ?education.
     OPTIONAL{ ?statement prov:wasDerivedFrom ?refnode.
              ?refnode   pr:P248 ?ref.
             }
     SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}

Try it!

Ranks

[edit | edit source]

Let us list the education of Douglas Adams and the ranks of them:

SELECT ?education ?educationLabel ?rank
WHERE
{
     wd:Q42 p:P69 ?statement.
     ?statement ps:P69 ?education.
     ?statement wikibase:rank ?rank.
     SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}

Try it!

The prefix p: points not to the object, but to a statement node. This node then is the subject of other triples.
The prefix ps: within the statement node retrieves the object.
The wikibase:rank within the statement node retrieves the rank information.

The code can be abbreviated a lot with the [ ] syntax by eliminating the variable ?statement.

SELECT ?education ?educationLabel ?rank
WHERE
{
     wd:Q42 p:P69 [ps:P69 ?education;
                   wikibase:rank ?rank;
                  ].
     SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}

Try it!

An example with different ranks is the (historcal) countries in which Berlin lied.

# Berlins countries and ranking
SELECT ?country ?countryLabel ?rank
WHERE
{
     wd:Q64 p:P17 [ps:P17 ?country;
                   wikibase:rank ?rank;
                  ].
     SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}

Try it!

The 3 possible values for ranks are wikibase:PreferredRank, wikibase:NormalRank and wikibase:DeprecatedRank

Compare this with the normal triples, which will select only the value(s) with the highest rank. In this case only the Preferred Rank value Germany (Q183).

# Berlins countries via normal triples
SELECT ?country ?countryLabel
WHERE
{
     wd:Q64 wdt:P17 ?country.
     SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}

Try it!

Ignore ranks in queries

[edit | edit source]

Ranks might cause unexpected results. For example, consider this query, that will give you back all Dutch municipalities that share a border with Alphen aan den Rijn (Q213246):

select ?muni ?muniLabel where {
  ?muni wdt:P31 wd:Q2039348;
        wdt:P47 wd:Q213246.
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}

Try it!

This will not show Boskoop (Q894442), because its rank for municipality of the Netherlands (Q2039348) is 'normal' while the other three values for instance of (P31) are 'preferred'. To see Boskoop in the previous query as well, rewrite it like this:

select ?muni ?muniLabel where {
  ?muni p:P31 [ps:P31 wd:Q2039348];
        wdt:P47 wd:Q213246.
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}

Try it!

Or use a property path to shorten the query like this:

select ?muni ?muniLabel where {
  ?muni p:P31/ps:P31 wd:Q2039348;
        wdt:P47 wd:Q213246.
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}

Try it!

Summary

[edit | edit source]
Example
Statements wd:Q42 wdt:P69 wd:Q691283.
or wd:Q42 p:P69 ?s. ?s ps:P69 wd:Q691283.
or wd:Q42 p:P69 [ ps:P69 wd:Q691283 ].
Wikidata datamodel
Rank wd:Q42 p:P69 [ wikibase:rank ?rank ].
Qualifier wd:Q42 p:P69 [ pq:P580 ?qualifier ].
Reference wd:Q42 p:P69 [ prov:wasDerivedFrom [ pr:P248 ?ref ] ].
SPARQL data representation, as used by Wikidata Query Service Wikibase RDF mapping diagram

A full query of Douglas Adams education, with rank, qualifiers and references could look like

# Douglas Adams education, with rank, qualifiers and references
SELECT ?education ?educationLabel ?rank ?starttime ?endtime ?ref ?refLabel
WHERE
{
     wd:Q42 p:P69 ?statement.
     ?statement ps:P69 ?education.
     # rank
     ?statement wikibase:rank ?rank.
     # qualifiers
     OPTIONAL{ ?statement pq:P580 ?starttime. }
     OPTIONAL{ ?statement pq:P582 ?endtime.   }
     # references
     OPTIONAL{ ?statement prov:wasDerivedFrom ?refnode.
               ?refnode   pr:P248 ?ref.
             }
     SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
ORDER BY ?starttime

Try it!

References

[edit | edit source]


WIKIDATA Language links and Badges

[edit | edit source]
#Female scientists with most number of sitelinks (but not English Wikipedia)
SELECT ?item ?itemLabel (COUNT(DISTINCT ?languagelink) AS ?linkcount) (SAMPLE( ?languagelink ) AS ?sample ) 
WHERE {
    ?item wdt:P31 wd:Q5.            # is a human 
    ?item wdt:P21 wd:Q6581072.      # gender female
    ?item wdt:P106 wd:Q901.         # occupation scientist
    ?languagelink schema:about ?item.
  FILTER NOT EXISTS {
    ?enlanguagelink schema:about ?item.
    ?enlanguagelink schema:inLanguage "en". 
    ?enlanguagelink schema:isPartOf <https://en.wikipedia.org/>
  }
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en,de,es,ar,fr" }
}
GROUP BY ?item ?itemLabel
ORDER BY DESC(?linkcount)

Try it!

The code schema:about links an item to the language link.
The code schema:inLanguage within the language link retrieves the language.
The code schema:isPartOf within the language link retrieves the wikimedia project, like <https://en.wikipedia.org/>.

Badges

[edit | edit source]

Badges are a kind of marker attached to a language link, which could identify, for example, that the article is a featured article badge (Q17437796) on a specific site. They do not describe the external entity but the page on the specific site.

Here a query that ranks the wikimedia projects badges.

SELECT ?wiki ?badge ?badgeLabel (COUNT(DISTINCT ?languagelink) AS ?count) (SAMPLE(?item) AS ?sample)
WHERE {
  ?languagelink schema:isPartOf ?wiki.
  ?languagelink wikibase:badge ?badge.
  ?languagelink schema:about ?item.
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en". 
                          ?badge rdfs:label ?badgeLabel.
                         }

}
GROUP BY ?wiki ?badge ?badgeLabel
ORDER BY DESC(?count)
LIMIT 100

Try it!

The code schema:isPartOf (within the language link) retrieves the wikimedia project, like <https://en.wikipedia.org/>.
The code wikibase:badge (within the language link) retrieves the badge, like featured article badge (Q17437796).
The code schema:about links an item to the language link.


WIKIDATA Precision, Units and Coordinates

Wikidata data values

The data on WIKIDATA contains more info than only the triples. For a full description see Wikidata:Glossary.

Values on Wikidata often have additional info like precision, units etc. Wikidata's solution for almost everything is more triples. And it means more prefixes.

Entities

[edit | edit source]

For entities there is no additional information.

Strings

[edit | edit source]

For strings there is no additional information.

# examples of dates, precision, time zones and calendars
SELECT ?time ?timeprecision ?timezone ?timecalendar ?timecalendarLabel
WHERE
{
     { wd:Q5598  p:P569/psv:P569 ?timenode. }  # Jul 15, 1606
     UNION 
     { wd:Q220   p:P571/psv:P571 ?timenode. } # 13 April 753 BCE
     UNION 
     { wd:Q1     p:P580/psv:P580 ?timenode. } # 13798 million years BCE
  
     ?timenode wikibase:timeValue         ?time.
     ?timenode wikibase:timePrecision     ?timeprecision.
     ?timenode wikibase:timeTimezone      ?timezone.
     ?timenode wikibase:timeCalendarModel ?timecalendar.
  
     SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}

Try it!

The prefix p: points to a statement node.
The prefix psv: within a statement node retrieves a time node.
The wikibase:timeValue within the time node retrieves the time.
The wikibase:timePrecision within a time node retrieves the precision of the time.

The codes for precision are 0: billion years, 1: hundred million years, 3: million years, 4: hundred thousand years, 5: ten thousand years, 6: millennium, 7: century, 8: decade, 9: year, 10: month, 11: day, 12: hour, 13: minute, 14: second.

The wikibase:timeTimezone within a time node retrieves the timezone, as an offset from UTC in minutes.
The wikibase:timeCalendarModel within a time node retrieves the calendar, a common used value is proleptic Gregorian calendar (Q1985727).


Comment on filtering on dates.
On filtering dates the code ^^xsd:dateTime should be added, like:

FILTER("2015-01-01"^^xsd:dateTime <= ?dob && ?dob < "2016-01-01"^^xsd:dateTime).

Monolingual texts

[edit | edit source]

For monolingual text there is no additional information. The text is represented as a string literal with language tag. It has only simple value.

#Countries in European Union with native name and language
SELECT ?country ?countryLabel ?nativename ?language
{
  wd:Q458 wdt:P150 ?country.   # European Union  contains administrative territorial entity
  OPTIONAL { ?country wdt:P1705 ?nativename.
              BIND( LANG(?nativename) AS ?language). }
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
ORDER BY DESC(?language)

Try it!

Coordinates

[edit | edit source]
# Museums in Barcelona with coordinates
SELECT ?item ?itemLabel ?coord ?lon ?lat
WHERE
{
 ?item wdt:P131 wd:Q1492.   # in the administrative territory of Barcelona
 ?item wdt:P31 wd:Q33506.   # is a museum
 ?item p:P625 ?coordinate.
 ?coordinate ps:P625 ?coord.
 ?coordinate psv:P625 ?coordinate_node.
 ?coordinate_node wikibase:geoLongitude ?lon.
 ?coordinate_node wikibase:geoLatitude ?lat.  
 SERVICE wikibase:label { bd:serviceParam wikibase:language "en". } 
}

Try it!

The prefix p: points to a statement node.
The prefix ps: within a statement node for a coordinate retrieves the full coordinate, like Point(2.1749 41.3834).
The prefix psv: within a statement node retrieves a coordinate node.
The wikibase:geoLongitude within the coordinate node retrieves the longitude value.
The wikibase:geoLatitude within a coordinate node retrieves the latitude value.
The wikibase:geoGlobe within a coordinate node retrieves the globe object. For coordinates on earth it will be Earth (Q2).
The wikibase:geoPrecision within a coordinate node retrieves the precision of the coordinate values, measured in degrees. Multiply by 111000 to convert to meters.

Here an example of mountains not located on Earth.

# Mountains, with coordinates, not located on Earth
SELECT ?item ?name ?coord ?lon ?lat ?globe ?globeLabel
{
   ?item wdt:P31 wd:Q8502;                 # is a mountain
         p:P625 [
           ps:P625 ?coord;
           psv:P625 [
             wikibase:geoLongitude ?lon;
             wikibase:geoLatitude ?lat;
             wikibase:geoGlobe ?globe;
           ] ;
         ]
  FILTER ( ?globe != wd:Q2 )              # globe is not earth
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en" .
                           ?item  rdfs:label ?name.
                           ?globe rdfs:label ?globeLabel.
                         }
}
ORDER BY ?globeLabel ?name

Try it!

Quantities

[edit | edit source]
# Longest rivers in the USA
SELECT ?item ?itemLabel ?length ?unitLabel ?lowerbound ?upperbound ?precision ?length2 ?conversion ?length_in_m 
WHERE
{
  ?item          wdt:P31/wdt:P279*           wd:Q4022.    # rivers
  ?item          wdt:P17                     wd:Q30.      # country USA
  ?item          p:P2043                     ?stmnode.    # length
  ?stmnode       psv:P2043                   ?valuenode.
  ?valuenode     wikibase:quantityAmount     ?length.
  ?valuenode     wikibase:quantityUnit       ?unit.
  ?valuenode     wikibase:quantityLowerBound ?lowerbound.
  ?valuenode     wikibase:quantityUpperBound ?upperbound.
  BIND((?upperbound-?lowerbound)/2 AS ?precision).
  BIND(IF(?precision=0, ?length, (CONCAT(str(?length), "±", str(?precision)))) AS ?length2). 

  # conversion to SI unit
  ?unit          p:P2370                 ?unitstmnode.   # conversion to SI unit
  ?unitstmnode   psv:P2370               ?unitvaluenode. 
  ?unitvaluenode wikibase:quantityAmount ?conversion.
  ?unitvaluenode wikibase:quantityUnit   wd:Q11573.      # meter
  BIND(?length * ?conversion AS ?length_in_m).
  
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
} 
ORDER BY DESC(?length_in_m)
LIMIT 10

Try it!

The prefix p: points to a statement node.
The prefix psv: within a statement node retrieves a value node.
The wikibase:quantityAmount within the value node retrieves a quantity value.
The wikibase:quantityUnit within a value node retrieves a unit. Not all quantities have units.
The wikibase:quantityLowerBound and wikibase:quantityUpperBound can be used to indicate the precision.

Within the Unit you can retrieve the statement for converting the unit to the SI unit etc. See in the example the length of the Mississippi River is in miles, and can be converted to SI units. The conversion could also be needed if some lengths are in meters and some in kilometers to so.

Some variables are used only for demonstration. Without those the query will be:

# Longest rivers in the USA
SELECT ?item ?itemLabel ?length2 ?unitLabel ?length_in_m 
WHERE
{
  ?item          wdt:P31/wdt:P279*           wd:Q4022.    # rivers
  ?item          wdt:P17                     wd:Q30.      # country USA
  ?item          p:P2043                     ?stmnode.    # length
  ?stmnode       psv:P2043                   ?valuenode.
  ?valuenode     wikibase:quantityAmount     ?length.
  ?valuenode     wikibase:quantityUnit       ?unit.
  ?valuenode     wikibase:quantityLowerBound ?lowerbound.
  ?valuenode     wikibase:quantityUpperBound ?upperbound.
  BIND((?upperbound-?lowerbound)/2 AS ?precision).
  BIND(IF(?precision=0, ?length, (CONCAT(str(?length), "±", str(?precision)))) AS ?length2). 

  # conversion to SI unit
  ?unit          p:P2370                 ?unitstmnode.   # conversion to SI unit
  ?unitstmnode   psv:P2370               ?unitvaluenode. 
  ?unitvaluenode wikibase:quantityAmount ?conversion.
  ?unitvaluenode wikibase:quantityUnit   wd:Q11573.      # meter
  BIND(?length * ?conversion AS ?length_in_m).
  
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
} 
ORDER BY DESC(?length_in_m)
LIMIT 10

Try it!

The code can be abbreviated a lot with the [ ] syntax by eliminating the variables for the nodes and unused variables.

# Longest rivers in the USA
SELECT ?item ?itemLabel ?length2 ?unitLabel ?length_in_m 
WHERE
{
  ?item  wdt:P31/wdt:P279* wd:Q4022.    # rivers
  ?item  wdt:P17           wd:Q30.      # country USA
  ?item  p:P2043/psv:P2043 [            # length
     wikibase:quantityAmount     ?length;
     wikibase:quantityUnit       ?unit;
     wikibase:quantityLowerBound ?lowerbound;
     wikibase:quantityUpperBound ?upperbound;
  ]
  BIND((?upperbound-?lowerbound)/2 AS ?precision).
  BIND(IF(?precision=0, ?length, (CONCAT(str(?length), "±", str(?precision)))) AS ?length2). 

  # conversion to SI unit
  ?unit p:P2370/psv:P2370 [                # conversion to SI unit
     wikibase:quantityAmount ?conversion;
     wikibase:quantityUnit wd:Q11573;      # meter
  ]
  BIND(?length * ?conversion AS ?length_in_m).
  
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
} 
ORDER BY DESC(?length_in_m)
LIMIT 10

Try it!

Normalised units

[edit | edit source]

Instead of converting by SPARQL code as above the units can be converted automatically by using prefix psn: (normalised) instead of psv:.

# Longest rivers in the USA, normalised units
SELECT ?item ?itemLabel ?length ?unitLabel ?lowerbound ?upperbound ?precision ?length2
WHERE
{
  ?item          wdt:P31/wdt:P279*           wd:Q4022.    # rivers
  ?item          wdt:P17                     wd:Q30.      # country USA
  ?item          p:P2043                     ?stmnode.    # length
  ?stmnode       psn:P2043                   ?valuenode.  # normalised value
  ?valuenode     wikibase:quantityAmount     ?length.
  ?valuenode     wikibase:quantityUnit       ?unit.
  ?valuenode     wikibase:quantityLowerBound ?lowerbound.
  ?valuenode     wikibase:quantityUpperBound ?upperbound.
  BIND((?upperbound-?lowerbound)/2 AS ?precision).
  BIND(IF(?precision=0, ?length, (CONCAT(str(?length), "±", str(?precision)))) AS ?length2). 
    
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
} 
ORDER BY DESC(?length)
LIMIT 10

Try it!

The prefix psv: within a statement node retrieves a value node.
The prefix psn: within a statement node retrieves a normalised value node. Note that a query with psn: will give you only the values with units presents in this list.[1]

Notice that the units changed from km and miles into meters, and that the values are all calculated accordingly. Also the lower- and upper-bound values change accordingly. The precision now is in meters.

Normalized quantity values are value nodes that are parallel to the original data nodes but represented in base units. They are connected to their parent nodes by predicates with prefix having "v" replaced with "n" - i.e. psn:, prn: (for references) and pqn: (for qualifiers).

The code can be abbreviated a lot with the [ ] syntax by eliminating the variables for the nodes and unused variables.

# Longest rivers in the USA, normalised units
SELECT ?item ?itemLabel ?length2 ?unitLabel
WHERE
{
  ?item  wdt:P31/wdt:P279* wd:Q4022.    # rivers
  ?item  wdt:P17           wd:Q30.      # country USA
  ?item  p:P2043/psn:P2043 [            # length, normalised
     wikibase:quantityAmount     ?length;
     wikibase:quantityUnit       ?unit;
     wikibase:quantityLowerBound ?lowerbound;
     wikibase:quantityUpperBound ?upperbound;
  ]
  BIND((?upperbound-?lowerbound)/2 AS ?precision).
  BIND(IF(?precision=0, ?length, (CONCAT(str(?length), "±", str(?precision)))) AS ?length2). 

  SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
} 
ORDER BY DESC(?length)
LIMIT 10

Try it!

Summary

[edit | edit source]
item Statement node Value node
p:Pxxx
psv:Pxxx
or psn:Pxxx (normalised)
Time values
wikibase:timeValue
wikibase:timePrecision
wikibase:timeTimezone
wikibase:timeCalendarModel
Coordinate values
wikibase:geoLongitude
wikibase:geoLatitude
wikibase:geoGlobe
wikibase:geoPrecision
Quantity values
wikibase:quantityAmount
wikibase:quantityUnit
wikibase:quantityLowerBound
wikibase:quantityUpperBound
wikibase:quantityAmount

SPARQL data representation, as used by Wikidata Query Service


WIKIDATA Lexicographical data

The data on WIKIDATA contains more info than only triples with concepts: Q-items are related to a thing or an idea. Since 2018, Wikidata has also stored a new type of data: words, phrases and sentences, in many languages, described in many languages. This information is stored in new types of entities, called Lexemes (L), Forms (F) and Senses (S).

Comment This chapter is not yet complete. Please help expand this.

Glossary SPARQL code

A Lexeme is a lexical element of a language, such as a word, a phrase, or a prefix (see Lexeme on Wikipedia). Lexemes are Entities in the sense of the Wikibase data model. A Lexeme is described using the following information:

  • An ID. Lexemes have IDs starting with an "L" followed by a natural number in decimal notation, e.g. L3746552. These IDs are unique within the repository that manages the Lexeme. The ID can be combined with a repository's concept base URI to form a unique URI for the Lexeme.
  • A Lemma for use as a human readable representation of the lexeme, e.g. "run".
  • The Language to which the lexeme belongs. This is a reference to a concrete Item, e.g. English (Q1860).
  • The Lexical category to which the lexeme belongs. This is given as a reference to a concrete Item, e.g. adjective (Q34698).
  • A list of Lexeme Statements to describe properties of the lexeme that are not specific to a Form or Sense (e.g. derived from or grammatical gender or syntactic function)

?l a ontolex:LexicalEntry .
?l wikibase:lemma ?word .
?l dct:language wd:Q1860 . # English
?l wikibase:lexicalCategory ?category .

  • A list of Forms, typically one for each relevant combination of grammatical features, such as 2nd person / singular / past tense. A Form is described using the following information:
    • An ID. Forms have IDs starting with the ID of the Lexeme they belong to, followed by a hyphen ("-") and an "F", followed by a natural number in decimal notation: e.g. L3746552-F7
    • A representation, spelling out the Form as a string.
    • A list of grammatical features that define for which syntactic role the given form applies. These are given as references to a concrete Items, e.g. participle (Q814722) for participle.
    • A list of Form Statements further describing the Form or its relations to other Forms or Items (e.g. IPA transcription (P898), pronunciation audio, rhymes with, used until, used in region)

?l ontolex:lexicalForm ?form .
?form a ontolex:Form .
?form ontolex:representation ?word .
?form wikibase:grammaticalFeature ?feat .

  • A list of Senses, describing the different meanings of the lexeme (e.g. "financial institution" and "edge of a body of water" for the English noun bank). A sense is described using the following information:
    • An ID. Senses have IDs starting with the ID of the Lexeme they belong to, followed by a hyphen ("-") and an "S", followed by a natural number in decimal notation: e.g. L3746552-S4. These IDs are unique within the repository that manages the Lexeme. The ID can be combined with a repository's concept base URI to form a unique URI for the Sense.
    • A Gloss, defining the meaning of the Sense using natural language.
    • A list of Sense Statements further describing the Sense and its relations to Senses and Items (e.g. translation, synonym, antonym, connotation, register, denotes, evokes).

?l ontolex:sense ?sense .
?sense a ontolex:LexicalSense .
?sense skos:definition ?gloss .
FILTER(LANG(?gloss) = "sv")

Prefixes

[edit | edit source]

Prefixes used only for Lexicograpical data are:

PREFIX ontolex: <http://www.w3.org/ns/lemon/ontolex#>
PREFIX dct: <http://purl.org/dc/terms/>

Try it!

Examples

[edit | edit source]

Get swedish gloss of specific lexeme

[edit | edit source]
SELECT  ?sense ?gloss
WHERE {
      VALUES ?l {wd:L35455}.   # Swedish noun "vara"
      ?l ontolex:sense ?sense.
      ?sense skos:definition ?gloss.
      # Get only the swedish gloss, exclude otherwise
      FILTER(LANG(?gloss) = "sv")
 }

Try it!

Get senses of a specific lexeme that has P5137 (item for this sense)

[edit | edit source]
SELECT ?sense ?gloss
WHERE {
      VALUES ?l {wd:L39751}.   # Swedish adjective "smaklös"
      ?l ontolex:sense ?sense.
      ?sense skos:definition ?gloss.
      # Exclude lexemes without a linked QID from at least one sense
      ?sense wdt:P5137 [].     # has P5137 (item for this sense)
}

Try it!

Lexemes describing a color

[edit | edit source]
# By Vesihiisi
SELECT ?l ?lemma ?languageLabel  WHERE {
  ?l a ontolex:LexicalEntry; 
       dct:language ?language; 
       wikibase:lemma ?lemma .
  ?l wdt:P31 wd:Q376431.   # color term
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}
ORDER BY ?languageLabel

Try it!

Words per language

[edit | edit source]

Here is an overview of the number of words per language

SELECT  (?language AS ?label) (COUNT(*) AS ?count) 
WHERE {
   ?l a ontolex:LexicalEntry ; wikibase:lemma ?word .
   BIND( LANG(?word) AS ?language ) 
   SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
} 
GROUP BY ?language 
ORDER BY DESC(?count)

Try it!

English and American English

[edit | edit source]

This query lists all words that are different in English and American English

SELECT  ?l ?english ?american
WHERE {
      ?l wikibase:lemma ?english .  FILTER(LANG(?english)="en-gb")
      ?l wikibase:lemma ?american . FILTER(LANG(?american)="en")
      FILTER(?english!=?american)
}
ORDER BY ?english

Try it!

Overview of Lexical categories

[edit | edit source]

Here is an overview of the most used Lexical categories in English:

SELECT ?categoryLabel (COUNT(*) AS ?count) 
WHERE {
   ?l a ontolex:LexicalEntry ; wikibase:lemma ?word ; wikibase:lexicalCategory ?category; dct:language ?language.
   ?language wdt:P218 'en'
   SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
} 
GROUP BY ?categoryLabel
ORDER BY DESC(?count)

Try it!

Longest words

[edit | edit source]

Here as example a list of the longest words in English

SELECT DISTINCT ?l ?word ?len 
WHERE {
  {
   ?l a ontolex:LexicalEntry ; dct:language wd:Q1860 ; wikibase:lemma ?word .
   BIND(strlen(?word) as ?len)  
  } UNION {
   ?l a ontolex:LexicalEntry ; dct:language wd:Q1860 ; ontolex:lexicalForm/ontolex:representation ?word .
   BIND(strlen(?word) as ?len)  
  }
} 
order by DESC(?len) 
LIMIT 20

Try it!

Adjectives

[edit | edit source]

This example shows (English) adjectives and their positive, comparative and superlative degrees. By changing VALUES ?language { wd:Q1860 } this query can be changed into any language.

# adjectives
SELECT DISTINCT ?l ?word (GROUP_CONCAT(DISTINCT ?subfeatLabel; SEPARATOR=", ") AS ?subfeatures) 
        (GROUP_CONCAT(DISTINCT ?positive;    SEPARATOR=", ") AS ?Positive)
        (GROUP_CONCAT(DISTINCT ?comparative; SEPARATOR=", ") AS ?Comparative)
        (GROUP_CONCAT(DISTINCT ?superlative; SEPARATOR=", ") AS ?Superlative)
WHERE {
   VALUES ?language { wd:Q1860 } # English
  
   ?l a ontolex:LexicalEntry ; wikibase:lemma ?word; wikibase:lexicalCategory wd:Q34698 .      # adjective
   ?l dct:language ?language.

OPTIONAL {      
   ?l ontolex:lexicalForm ?form1 .
   ?form1 ontolex:representation ?positive ;       wikibase:grammaticalFeature wd:Q3482678 .   # positive
   OPTIONAL { ?form1 wikibase:grammaticalFeature ?subfeat . FILTER(?subfeat != wd:Q3482678 ) } 
}

   ?l ontolex:lexicalForm ?form2 .
   ?form2 ontolex:representation ?comparative ;    wikibase:grammaticalFeature wd:Q14169499 .   # comparative
   OPTIONAL { ?form2 wikibase:grammaticalFeature ?subfeat . FILTER(?subfeat != wd:Q14169499 ) } 

   ?l ontolex:lexicalForm ?form3 .
   ?form3 ontolex:representation ?superlative ;    wikibase:grammaticalFeature wd:Q1817208 .    # superlative
   OPTIONAL { ?form3 wikibase:grammaticalFeature ?subfeat . FILTER(?subfeat != wd:Q1817208 ) } 
 
   # use ?word if ?positive is blank
   BIND(IF(BOUND(?positive),?positive,?word) AS ?positive).
  
   SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". 
                           ?subfeat rdfs:label ?subfeatLabel.
                          }
}
GROUP BY ?word ?l
ORDER BY ?word ?l
LIMIT 20000

Try it!

Verbs

[edit | edit source]

This example shows (English) verbs and their conjugations. This query is very complex because conjugations in Wikidata are modeled very complex. By changing VALUES ?language { "en" } this query can be changed into any language. Currently only a few verbs are conjugated.

# verbs
SELECT ?l ?word (GROUP_CONCAT(DISTINCT ?subfeatLabel; SEPARATOR=", ") AS ?subfeatures) 
          ?single1 ?single2 ?single3 ?plural1 ?plural2 ?plural3
WHERE {
   VALUES ?language { "en" }
  
   ?l a ontolex:LexicalEntry ; wikibase:lemma ?word; wikibase:lexicalCategory ?category .
   FILTER(?category = wd:Q24905 ) # verb
   FILTER(LANG(?word) = ?language)

OPTIONAL { 
   ?l ontolex:lexicalForm ?form1 .
   { ?form1 ontolex:representation ?single1 ; wikibase:grammaticalFeature wd:Q51929218 .  # first-person singular
   } UNION 
   { ?form1 ontolex:representation ?single1 ; wikibase:grammaticalFeature wd:Q21714344 .  # first person
     FILTER NOT EXISTS{ ?form1 wikibase:grammaticalFeature wd:Q146786 .   }               # without plural
     FILTER NOT EXISTS{ ?form1 wikibase:grammaticalFeature wd:Q51929154 . }               # without plural person
   } UNION
   { ?form1 ontolex:representation ?single1 ; wikibase:grammaticalFeature wd:Q51929131 .  # singular person
     FILTER NOT EXISTS{ ?form1 wikibase:grammaticalFeature wd:Q21714344 . }               # without first person
     FILTER NOT EXISTS{ ?form1 wikibase:grammaticalFeature wd:Q51929049 . }               # without second person
     FILTER NOT EXISTS{ ?form1 wikibase:grammaticalFeature wd:Q51929074 . }               # without third person 
   } UNION
   { ?form1 ontolex:representation ?single1 ; wikibase:grammaticalFeature wd:Q110786 .    # singular
     FILTER NOT EXISTS{ ?form1 wikibase:grammaticalFeature wd:Q21714344 . }               # without first person
     FILTER NOT EXISTS{ ?form1 wikibase:grammaticalFeature wd:Q51929049 . }               # without second person
     FILTER NOT EXISTS{ ?form1 wikibase:grammaticalFeature wd:Q51929074 . }               # without third person
   }
   FILTER(LANG(?single1) = ?language )
   OPTIONAL { ?form1 wikibase:grammaticalFeature ?subfeat . 
             FILTER(?subfeat != wd:Q51929218 && ?subfeat != wd:Q21714344 )   # not first-person singular / first person
             FILTER(?subfeat != wd:Q51929131 && ?subfeat != wd:Q110786 )     # not singular person / singular
             FILTER(?subfeat != wd:Q51929049 && ?subfeat != wd:Q51929074 ) } # not second person / third person
   }  
OPTIONAL { 
   ?l ontolex:lexicalForm ?form2 .
   { ?form2 ontolex:representation ?single2 ; wikibase:grammaticalFeature wd:Q51929369 .  # second-person singular
   } UNION 
   { ?form2 ontolex:representation ?single2 ; wikibase:grammaticalFeature wd:Q51929049 .  # second person
     FILTER NOT EXISTS{ ?form2 wikibase:grammaticalFeature wd:Q146786 .   }               # without plural
     FILTER NOT EXISTS{ ?form2 wikibase:grammaticalFeature wd:Q51929154 . }               # without plural person
   } UNION
   { ?form2 ontolex:representation ?single2 ; wikibase:grammaticalFeature wd:Q51929131 .  # singular person
     FILTER NOT EXISTS{ ?form2 wikibase:grammaticalFeature wd:Q21714344 . }               # without first person
     FILTER NOT EXISTS{ ?form2 wikibase:grammaticalFeature wd:Q51929049 . }               # without second person
     FILTER NOT EXISTS{ ?form2 wikibase:grammaticalFeature wd:Q51929074 . }               # without third person 
   } UNION
   { ?form2 ontolex:representation ?single2 ; wikibase:grammaticalFeature wd:Q110786 .    # singular
     FILTER NOT EXISTS{ ?form2 wikibase:grammaticalFeature wd:Q21714344 . }               # without first person
     FILTER NOT EXISTS{ ?form2 wikibase:grammaticalFeature wd:Q51929049 . }               # without second person
     FILTER NOT EXISTS{ ?form2 wikibase:grammaticalFeature wd:Q51929074 . }               # without third person
   }
   FILTER(LANG(?single2) = ?language )
   OPTIONAL { ?form2 wikibase:grammaticalFeature ?subfeat . 
             FILTER(?subfeat != wd:Q51929369 && ?subfeat != wd:Q51929049 )   # not second-person singular / second person
             FILTER(?subfeat != wd:Q51929131 && ?subfeat != wd:Q110786 )     # not singular person / singular
             FILTER(?subfeat != wd:Q21714344 && ?subfeat != wd:Q51929074 ) } # not first person / third person
   }
OPTIONAL { 
   ?l ontolex:lexicalForm ?form3 .
   { ?form3 ontolex:representation ?single3 ; wikibase:grammaticalFeature wd:Q51929447 .  # third-person singular
   } UNION 
   { ?form3 ontolex:representation ?single3 ; wikibase:grammaticalFeature wd:Q51929074 .  # third person
     FILTER NOT EXISTS{ ?form3 wikibase:grammaticalFeature wd:Q146786 .   }               # without plural
     FILTER NOT EXISTS{ ?form3 wikibase:grammaticalFeature wd:Q51929154 . }               # without plural person
   } UNION
   { ?form3 ontolex:representation ?single3 ; wikibase:grammaticalFeature wd:Q51929131 .  # singular person
     FILTER NOT EXISTS{ ?form3 wikibase:grammaticalFeature wd:Q21714344 . }               # without first person
     FILTER NOT EXISTS{ ?form3 wikibase:grammaticalFeature wd:Q51929049 . }               # without second person
     FILTER NOT EXISTS{ ?form3 wikibase:grammaticalFeature wd:Q51929074 . }               # without third person 
   } UNION
   { ?form3 ontolex:representation ?single3 ; wikibase:grammaticalFeature wd:Q110786 .    # singular
     FILTER NOT EXISTS{ ?form3 wikibase:grammaticalFeature wd:Q21714344 . }               # without first person
     FILTER NOT EXISTS{ ?form3 wikibase:grammaticalFeature wd:Q51929049 . }               # without second person
     FILTER NOT EXISTS{ ?form3 wikibase:grammaticalFeature wd:Q51929074 . }               # without third person
   }
   FILTER(LANG(?single3) = ?language )        
   OPTIONAL { ?form3 wikibase:grammaticalFeature ?subfeat . 
             FILTER(?subfeat != wd:Q51929447 && ?subfeat != wd:Q51929074 )   # not third-person singular / third person
             FILTER(?subfeat != wd:Q51929131 && ?subfeat != wd:Q110786 )     # not singular person / singular
             FILTER(?subfeat != wd:Q21714344 && ?subfeat != wd:Q51929049 ) } # not first person / second person
   }
OPTIONAL { 
   ?l ontolex:lexicalForm ?form4 .
   { ?form4 ontolex:representation ?plural1 ; wikibase:grammaticalFeature wd:Q51929290 .  # first-person plural
   } UNION 
   { ?form4 ontolex:representation ?plural1 ; wikibase:grammaticalFeature wd:Q21714344 .  # first person
     FILTER NOT EXISTS{ ?form4 wikibase:grammaticalFeature wd:Q110786 . }                 # without singular
     FILTER NOT EXISTS{ ?form4 wikibase:grammaticalFeature wd:Q51929131 . }               # without singular person
   } UNION
   { ?form4 ontolex:representation ?plural1 ; wikibase:grammaticalFeature wd:Q51929154 .  # plural person
     FILTER NOT EXISTS{ ?form4 wikibase:grammaticalFeature wd:Q21714344 . }               # without first person
     FILTER NOT EXISTS{ ?form4 wikibase:grammaticalFeature wd:Q51929049 . }               # without second person
     FILTER NOT EXISTS{ ?form4 wikibase:grammaticalFeature wd:Q51929074 . }               # without third person
   } UNION
   { ?form4 ontolex:representation ?plural1 ; wikibase:grammaticalFeature wd:Q146786 .    # plural
     FILTER NOT EXISTS{ ?form4 wikibase:grammaticalFeature wd:Q21714344 . }               # without first person
     FILTER NOT EXISTS{ ?form4 wikibase:grammaticalFeature wd:Q51929049 . }               # without second person
     FILTER NOT EXISTS{ ?form4 wikibase:grammaticalFeature wd:Q51929074 . }               # without third person
   }
   FILTER(LANG(?plural1) = ?language )
   OPTIONAL { ?form4 wikibase:grammaticalFeature ?subfeat . 
             FILTER(?subfeat != wd:Q51929290 && ?subfeat != wd:Q21714344 )     # not first-person plural / first person
             FILTER(?subfeat != wd:Q51929154 && ?subfeat != wd:Q146786 )       # not plural person / plural
             FILTER(?subfeat != wd:Q51929049 && ?subfeat != wd:Q51929074 ) }   # not second person / third person
   }
OPTIONAL { 
   ?l ontolex:lexicalForm ?form5 .
   { ?form5 ontolex:representation ?plural2 ; wikibase:grammaticalFeature wd:Q51929403 . # second-person plural
   } UNION 
   { ?form5 ontolex:representation ?plural2 ; wikibase:grammaticalFeature wd:Q51929049 . # second person
     FILTER NOT EXISTS{ ?form5 wikibase:grammaticalFeature wd:Q110786 . }                # without singular
     FILTER NOT EXISTS{ ?form5 wikibase:grammaticalFeature wd:Q51929131 . }              # without singular person
   } UNION
   { ?form5 ontolex:representation ?plural2 ; wikibase:grammaticalFeature wd:Q51929154 . # plural person
     FILTER NOT EXISTS{ ?form5 wikibase:grammaticalFeature wd:Q21714344 . }              # without first person
     FILTER NOT EXISTS{ ?form5 wikibase:grammaticalFeature wd:Q51929049 . }              # without second person
     FILTER NOT EXISTS{ ?form5 wikibase:grammaticalFeature wd:Q51929074 . }              # without third person
   } UNION
   { ?form5 ontolex:representation ?plural2 ; wikibase:grammaticalFeature wd:Q146786 .   # plural
     FILTER NOT EXISTS{ ?form5 wikibase:grammaticalFeature wd:Q21714344 . }              # without first person
     FILTER NOT EXISTS{ ?form5 wikibase:grammaticalFeature wd:Q51929049 . }              # without second person
     FILTER NOT EXISTS{ ?form5 wikibase:grammaticalFeature wd:Q51929074 . }              # without third person
   }
   FILTER(LANG(?plural2) = ?language )
   OPTIONAL { ?form5 wikibase:grammaticalFeature ?subfeat . 
             FILTER(?subfeat != wd:Q51929403 && ?subfeat != wd:Q51929049 )     # not second-person plural / second person
             FILTER(?subfeat != wd:Q51929154 && ?subfeat != wd:Q146786 )       # not plural person / plural
             FILTER(?subfeat!= wd:Q21714344 && ?subfeat != wd:Q51929074 ) }    # not first person / third person
   }
OPTIONAL { 
   ?l ontolex:lexicalForm ?form6 .
   { ?form6 ontolex:representation ?plural3 ; wikibase:grammaticalFeature wd:Q51929517 . # third-person plural
   } UNION 
   { ?form6 ontolex:representation ?plural3 ; wikibase:grammaticalFeature wd:Q51929074 . # third person
     FILTER NOT EXISTS{ ?form6 wikibase:grammaticalFeature wd:Q110786 . }                # without singular
     FILTER NOT EXISTS{ ?form6 wikibase:grammaticalFeature wd:Q51929131 . }              # without singular person
   } UNION
   { ?form6 ontolex:representation ?plural3 ; wikibase:grammaticalFeature wd:Q51929154 . # plural person
     FILTER NOT EXISTS{ ?form6 wikibase:grammaticalFeature wd:Q21714344 . }              # without first person
     FILTER NOT EXISTS{ ?form6 wikibase:grammaticalFeature wd:Q51929049 . }              # without second person
     FILTER NOT EXISTS{ ?form6 wikibase:grammaticalFeature wd:Q51929074 . }              # without third person
   } UNION
   { ?form6 ontolex:representation ?plural3 ; wikibase:grammaticalFeature wd:Q146786 .   # plural
     FILTER NOT EXISTS{ ?form6 wikibase:grammaticalFeature wd:Q21714344 . }              # without first person
     FILTER NOT EXISTS{ ?form6 wikibase:grammaticalFeature wd:Q51929049 . }              # without second person
     FILTER NOT EXISTS{ ?form6 wikibase:grammaticalFeature wd:Q51929074 . }              # without third person
   }
   FILTER(LANG(?plural3) = ?language )        
   OPTIONAL { ?form6 wikibase:grammaticalFeature ?subfeat . 
             FILTER(?subfeat != wd:Q51929517 && ?subfeat != wd:Q51929074 )     # not third-person plural / third person
             FILTER(?subfeat != wd:Q51929154 && ?subfeat != wd:Q146786 )       # not plural person / plural
             FILTER(?subfeat != wd:Q21714344 && ?subfeat != wd:Q51929049 ) }   # not first person / second person
   }
  
   SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". 
                           ?subfeat rdfs:label ?subfeatLabel.
                          }
}
GROUP BY ?l ?word ?single1 ?single2 ?single3 ?plural1 ?plural2 ?plural3
ORDER BY ?word ?single1 ?single2 ?single3 ?plural1 ?plural2 ?plural3
LIMIT 20000

Try it!

Articles in all languages

[edit | edit source]

This example shows articles in many languages.

# articles in all languages
SELECT ?l ?language ?categoryLabel ?word ?subfeatures 
     (GROUP_CONCAT(DISTINCT ?masculine; SEPARATOR=", ") AS ?Masculine ) 
     (GROUP_CONCAT(DISTINCT ?feminine;  SEPARATOR=", ") AS ?Feminine ) 
     (GROUP_CONCAT(DISTINCT ?neuter;    SEPARATOR=", ") AS ?Neuter ) 
WHERE {
SELECT ?l ?language ?categoryLabel ?word (GROUP_CONCAT(DISTINCT ?subfeatLabel; SEPARATOR=", ") AS ?subfeatures) 
     ?masculine ?feminine ?neuter 
WHERE {
   VALUES ?categories { wd:Q103184 wd:Q2865743 wd:Q3813849 } # article or definite article or indefinite article
 
   ?l a ontolex:LexicalEntry ; wikibase:lemma ?word; wikibase:lexicalCategory ?category .
   FILTER(?category = ?categories ) # article or definite article or indefinite article
   BIND(LANG(?word) as ?language)

OPTIONAL { 
   ?l ontolex:lexicalForm ?form1 .
   ?form1 ontolex:representation ?masculine ; wikibase:grammaticalFeature wd:Q499327 . # masculine
   FILTER(LANG(?masculine) = ?language )
   OPTIONAL { ?form1 wikibase:grammaticalFeature ?subfeat . FILTER(?subfeat != wd:Q499327 ) } 
   }
 
OPTIONAL { 
   ?l ontolex:lexicalForm ?form2 .
   ?form2 ontolex:representation ?feminine ; wikibase:grammaticalFeature wd:Q1775415 . # feminine
   FILTER(LANG(?feminine) = ?language )
   OPTIONAL { ?form2 wikibase:grammaticalFeature ?subfeat . FILTER(?subfeat != wd:Q1775415 ) } 
   }  
  
OPTIONAL { 
   ?l ontolex:lexicalForm ?form3 .
   ?form3 ontolex:representation ?neuter ; wikibase:grammaticalFeature wd:Q1775461 . # neuter
   FILTER(LANG(?neuter) = ?language )
   OPTIONAL { ?form3 wikibase:grammaticalFeature ?subfeat . FILTER(?subfeat != wd:Q1775461 ) } 
   }  
   
   SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". 
                           ?subfeat  rdfs:label ?subfeatLabel.
                           ?category rdfs:label ?categoryLabel.
                          }
}
GROUP BY ?language ?l ?categoryLabel ?word ?masculine ?feminine ?neuter
ORDER BY ?language ?categoryLabel ?subfeatures ?word ?masculine ?feminine ?neuter 
}
GROUP BY ?language ?l ?categoryLabel ?word ?subfeatures
ORDER BY ?language ?l ?categoryLabel ?word ?subfeatures ?masculine ?feminine ?neuter

Try it!

External tools

[edit | edit source]

See Wikidata:Tools/Lexicographical data for a list of external tools for Lexicographical data.

References

[edit | edit source]


Wikimedia Commons Query Service

Wikimedia Commons images use structured data too, see Structured Data on Commons (SDoC) . With Wikimedia Commons Query Service (WCQS) the data can be queried. This is available per 1/2/2022 at https://commons-query.wikimedia.org/ It was previously available in beta phase at https://wcqs-beta.wmflabs.org/.

Find below an example of Depictions of Douglas Adams shown as image grid.

#Show images of Douglas Adams in an image grid
#defaultView:ImageGrid
SELECT ?file ?image WHERE {
  ?file wdt:P180 wd:Q42 .
  ?file schema:url ?image.
}

Try it!

Mind: use |project=sdc where sdc =Structured Data on Commons, the default is |project=wd for Wikidata.

See more examples on WCQS itself.

The query below shows all properties of an image:

# all properties of an image 
SELECT DISTINCT ?file ?predicate ?pLabel ?o ?oLabel
WHERE {
  VALUES ?file { sdc:M107651852 }    # an example image
{ ?file ?predicate ?o.
  BIND( IRI(REPLACE( STR(?predicate),"prop/direct/","entity/" )) AS ?p). 
}
UNION
{ ?file ?predicate1 [ ?predicate ?o ].  # qualifiers   
  BIND( IRI((REPLACE( REPLACE( STR(?predicate), "(direct/|statement/|value/|value-normalized/|qualifier/|reference/)", ""),"prop/","entity/"))) AS ?p).  
}
  FILTER( CONTAINS( STR(?predicate), "/prop/direct/") || CONTAINS( STR(?predicate), "/prop/qualifier/") || CONTAINS( STR(?predicate), "/prop/reference/") 
       || CONTAINS( STR(?predicate), "schema.org") || CONTAINS( STR(?predicate), "w3.org") ) 
  SERVICE <https://query.wikidata.org/sparql> {
    SERVICE wikibase:label {
        bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en" .
        ?p rdfs:label ?pLabel .
        ?o rdfs:label ?oLabel .
    }
  }
}
LIMIT 100

Try it!

This reveals some hidden statements like: date Modified / width / height / contentSize (bytes). The query below gives an example how to use it.

# show hidden statements
SELECT ?file ?instance_of ?instance_ofLabel ?inception_date ?dateModified ?width ?height ?contentSize
WHERE { 
  VALUES ?file { sdc:M107651852 }    # an example image
  OPTIONAL{ ?file wdt:P31 ?instance_of. }
  OPTIONAL{ ?file wdt:P571 ?inception_date. }
  OPTIONAL{ ?file schema:dateModified ?dateModified. }
  OPTIONAL{ ?file schema:width ?width. }
  OPTIONAL{ ?file schema:height ?height. }
  OPTIONAL{ ?file schema:contentSize ?contentSize. }
  SERVICE <https://query.wikidata.org/sparql> {
    SERVICE wikibase:label {
        bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en" . 
           ?instance_of rdfs:label ?instance_ofLabel .
    }
  }
}

Try it!

This example with combination of SERVICE - mwapi shows Depicts statements with Dutch labels, of files in one Commons category

SELECT ?file ?title ?depicts ?depicts_label
WITH
{ SELECT ?file ?title
  WHERE
  { SERVICE wikibase:mwapi
    {
      bd:serviceParam wikibase:api "Generator" .
      bd:serviceParam wikibase:endpoint "commons.wikimedia.org" .
      bd:serviceParam mwapi:gcmtitle "Category:Historia Naturalis van Rudolf II" .
      bd:serviceParam mwapi:generator "categorymembers" .
      bd:serviceParam mwapi:gcmtype "file" .
      bd:serviceParam mwapi:gcmlimit "max" .
      ?title wikibase:apiOutput mwapi:title .
      ?pageid wikibase:apiOutput "@pageid" .
    }
    BIND (URI(CONCAT('https://commons.wikimedia.org/entity/M', ?pageid)) AS ?file)
  }
} AS %get_files
WHERE
{  INCLUDE %get_files
  ?file wdt:P180 ?depicts .
  service <https://query.wikidata.org/sparql> {
    OPTIONAL {?depicts rdfs:label ?depicts_label FILTER (lang(?depicts_label) = 'nl') } 
    }
}

Try it!

References

[edit | edit source]


Views

This page helps you to understand the various possibilities to display results on Wikidata Query. The query example page contains many good examples you can try!

After running a query, you can choose with the "Display" button which result view you want to display. Depending on your query and the data type, some of the views will be available.

#Countries in European Union with Flags and Population
# to be displayed as
# 1) Table
# 2) Image Grid
# 3) Map
# 4) Bar Chart
# 5) Bubble Chart
SELECT ?country ?countryLabel (MAX(?population) AS ?Population) (SAMPLE(?flag_image) AS ?Flag) (MAX(?coordinate) AS ?coor) 
WHERE {
  wd:Q458 wdt:P150 ?country.   # European Union  contains administrative territorial entity
  OPTIONAL { ?country wdt:P41 ?flag_image. }
  OPTIONAL { ?country wdt:P36 ?capital. }
  OPTIONAL { ?country wdt:P1082 ?population. }
  OPTIONAL { ?capital wdt:P625  ?coordinate. }
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en". 
                           ?country rdfs:label ?countryLabel             
                         }
}
GROUP BY ?country ?countryLabel
ORDER BY DESC(?Population)
Try it!
Display button. Try the query to the left as Table, Image Grid, Map, Bar Chart and Bubble Chart

Default View

[edit | edit source]

The default view can be changed for each SPARQL query by using a comment

#defaultView:[NAME]

For example:

#defaultView:ImageGrid

Table (default)

[edit | edit source]

Shows the result data as table. It is the default view for results and can display every data type.

This will map every variable from the SPARQL select clause to a formatted table column.

Image Grid

[edit | edit source]

The #defaultView:ImageGrid shows the result data as a grid with images.

Every grid item has a big picture and opens the gallery when clicking on it.

Contents an item is the formatted row data.

Variable Data Type Mapping Description
* Commons Media Grid Item Image columns will create a grid item.

Rows containing no image will be ignored.

Options:

Option name Type Description
hide One or more variable names (strings, starting with ?), single value or array Don’t show these variables in the result.
#Even more cats, with pictures
#added before 2016-10
#defaultView:ImageGrid
SELECT ?itemDescription ?itemLabel ?item ?pic
WHERE
{
	?item wdt:P31 wd:Q146 . 
	OPTIONAL {
		?item wdt:P18 ?pic
	} 
	SERVICE wikibase:label { bd:serviceParam wikibase:language "en" }
}
Try it!
Image Grid Try Out

The #defaultView:Map shows the result data as map with OpenStreetMap data.

Variable Data Type Mapping Description
* Coordinate Map Marker Columns will create a marker on the map.
Content of that marker is the formatted row data.
?layer Map Marker Layer Creates a colored map layer that allows filtering.
?rgb Color Color Map Marker Example 000000 for black color.

Options:

specified using #defaultView:Map{"hide":["?foo", "?bar"],"layer":"?bar"} using the array if there are multiple variables.

Option name Type Description
hide One or more variable names (strings, starting with ?), single value or array Don’t show these variables in the result.
layer One or more variable names (strings, starting with ?), single value or array Use these variables for the map layer, instead of the default ?layerLabel, ?layer.
markercluster boolean or object
  • If the value is an object, pass these options to Leaflet.markercluster, a map plugin for clustering nearby map markers. (If the value is the empty object {}, Leaflet.markercluster is enabled with its default options.) For example #defaultView:Map{"markercluster": {"maxClusterRadius": 1}}
  • If the value is true (default for queries without map layers), Leaflet.markercluster is enabled with a special set of options that doesn’t cluster any nearby map markers (it only “spiderfies” markers with exactly identical coordinates) and attempts to replicate the default map look as much as possible. The code is #defaultView:Map{"markercluster": "true"}
  • If the value is false (default for queries with map layers), Leaflet.markercluster is not enabled.


#Big cities, grouped into map layers by population
#defaultView:Map
SELECT DISTINCT ?city ?cityLabel (SAMPLE(?location) AS ?location) (MAX(?population) AS ?population) (SAMPLE(?layer) AS ?layer)
WHERE
{
  ?city wdt:P31/wdt:P279* wd:Q515;
        wdt:P625 ?location;
        wdt:P1082 ?population.
  FILTER(?population >= 500000).
  BIND(
    IF(?population < 1000000, "<1M",
    IF(?population < 2000000, "1M-2M",
    IF(?population < 5000000, "2M-5M",
    IF(?population < 10000000, "5M-10M",
    IF(?population < 20000000, "10M-20M",
    ">20M")))))
    AS ?layer).
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
GROUP BY ?city ?cityLabel
ORDER BY ?population
Try it!
Map with layers Try Out
#Airports in Belgium
#added before 2016-10
#List of airports in Belgium
#defaultView:Map
SELECT DISTINCT ?airport ?airportLabel ?coor ?range
WHERE
{
  ?airport wdt:P31 wd:Q1248784 ;
          ?range wd:Q31;
          wdt:P625 ?coor.
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
Try it!
Map of airports in Belgium Try Out

Technically, any well-known text literal can be displayed, not just a point coordinate.

Some Geoshapes are available via property geoshape (P3896). See this example for the countries.

#Geoshapes of counties
#defaultView:Map
SELECT DISTINCT ?country ?countryLabel ?geoshape (?countryLabel AS ?layer )
WHERE
{
  ?country wdt:P31 wd:Q3624078.
  ?country wdt:P3896 ?geoshape.
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}

Try it!

Geometry values can also be constructed in SPARQL using functions like CONCAT() and STRDT() – see this map of Czech railway lines for an example of LINESTRING() geometry.

#defaultView:Map
SELECT ?line ?coord1 ?coord2 ?comm1 ?comm1Label ?comm2 ?comm2Label WHERE {
  ?comm1 p:P31/ps:P31/wdt:P279* wd:Q55488 ; wdt:P17 wd:Q213 ; wdt:P625 ?coord1 ; wdt:P197 ?comm2 .
  MINUS { ?comm1 p:P31 [ ps:P31 wd:Q55488 ; pq:P582 [] ] } .
  MINUS { ?comm2 p:P31 [ ps:P31 wd:Q55488 ; pq:P582 [] ] } .
  MINUS { ?comm1 p:P31 [ ps:P31 wd:Q928830 ] } .
  MINUS { ?comm2 p:P31 [ ps:P31 wd:Q928830 ] } .
  ?comm2 wdt:P625 ?coord2 .
  ?comm1 p:P625 [ ps:P625 []; psv:P625 [ wikibase:geoLongitude ?coord1lon; wikibase:geoLatitude ?coord1lat; ] ] .
  ?comm2 p:P625 [ ps:P625 []; psv:P625 [ wikibase:geoLongitude ?coord2lon; wikibase:geoLatitude ?coord2lat; ] ] .
  BIND(CONCAT('LINESTRING (', STR(?coord1lon), ' ', STR(?coord1lat), ',', STR(?coord2lon), ' ', STR(?coord2lat), ')') AS ?str) .
  BIND(STRDT(?str, geo:wktLiteral) AS ?line) .
  service wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en" . }
}

Try it!

Line, Area, Bar and Scatter Chart

[edit | edit source]

#defaultView:LineChart, #defaultView:AreaChart, #defaultView:BarChart and #defaultView:ScatterChart shows one of these chart types.

Variable Data Type Mapping Description
* Number
Label
DateTime
X-Axis
Y-Axis
First will be X-axis, second Y-axis
* Label Category / Legend Item (optional) Will only be used after the above are mapped
* Label Storyboard / Animation (optional) Will only be used after the above are mapped

LineChart

[edit | edit source]
#Number of paintings per decade
#defaultView:LineChart
SELECT ?decade (COUNT(?decade) AS ?Paintings) 
WHERE {
  ?item wdt:P31 wd:Q3305213 . # instance of painting
  ?item wdt:P571 ?inception .
  BIND( year(?inception) as ?year ). 
  BIND( ROUND(?year/10)*10 as ?decade ) .
  FILTER( ?year > 1400)
} GROUP BY ?decade
ORDER BY ?decade
Try it!
#Population of countries sharing a border with Germany
#defaultView:LineChart
SELECT ?country  ?year ?population ?countryLabel WHERE {
  {
    SELECT ?country ?year (AVG(?population) AS ?population) WHERE {
      {
        SELECT ?country (str(YEAR(?date)) AS ?year) ?population WHERE {
          ?country wdt:P47 wd:Q183.
          ?country p:P1082 ?populationStatement.
          ?populationStatement ps:P1082 ?population.
          ?populationStatement pq:P585 ?date.
        }
      }
    }
    GROUP BY ?country ?year
  }
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
Try it!
Line Chart Try Out

AreaChart

[edit | edit source]
#Cancer deaths per year and cancer type
#defaultView:AreaChart
SELECT ?cid (str(SAMPLE(?year)) AS ?year )  (COUNT(*) AS ?count)  ?cause WHERE {
  ?pid wdt:P31 wd:Q5.
  ?pid wdt:P509 ?cid.
  ?cid wdt:P279* wd:Q12078.
  OPTIONAL {
    ?cid rdfs:label ?cause.
    FILTER((LANG(?cause)) = "en")
  }
  ?pid wdt:P570 ?_date_of_death. 
  BIND ( YEAR(?_date_of_death) AS ?year )
  FILTER( BOUND(?cause)  )
  FILTER( BOUND(?year)  )
  FILTER( ?year > 1960 )
}
GROUP BY ?cid ?cause ?year
Try it!
Area Chart Try Out

BarChart

[edit | edit source]
#Battles per year per continent and country last 80 years (animated)
#defaultView:BarChart
SELECT (SAMPLE(?_continentLabel) AS ?contient) (COUNT(?_country) AS ?battles) (SAMPLE(?_countryLabel) AS ?_countryLabel) (SAMPLE(?year) AS ?year)  WHERE {
  ?subj wdt:P31 wd:Q178561.
  OPTIONAL { ?subj wdt:P580 ?d1. }
  OPTIONAL { ?subj wdt:P585 ?d2. }
  OPTIONAL { ?subj wdt:P582 ?d3. }
  BIND(IF(!BOUND(?d1), IF(!BOUND(?d2), ?d3, ?d2), ?d1) AS ?date)
  BIND(str(YEAR(?date)) AS ?year)
  ?subj wdt:P276 ?_loc.
  ?_loc wdt:P17 ?_country.
  ?_country wdt:P30 ?_continent.
  ?_country rdfs:label ?_countryLabel.
  ?_continent rdfs:label ?_continentLabel.
  BIND((NOW()) - ?date AS ?distance)
  FILTER(BOUND(?year))
  FILTER((LANG(?_countryLabel)) = "en")
  FILTER((LANG(?_continentLabel)) = "en")
  FILTER((0 <= ?distance) && (?distance < 31 * 12 * 80))
}
GROUP BY ?year ?_country
ORDER BY ?_countryLabel
Try it!
Bar Chart (Animated)Try Out

ScatterChart

[edit | edit source]
# Scatter Chart river watershed area in km² plotted over length in km
PREFIX wikibaseβ: <http://wikiba.se/ontology-beta#>
#defaultView:ScatterChart
# river watershed area in km² plotted over length in km, using new normalized units support
# (rivers shorter than 500 km excluded to avoid crashing the browser when trying to render too many results)
SELECT ?river (SAMPLE(?length) AS ?length) (SAMPLE(?area) AS ?area) ?riverLabel WHERE {
  ?river wdt:P31/wdt:P279* wd:Q355304;
         p:P2043/psn:P2043/(wikibase:quantityAmount|wikibaseβ:quantityAmount) ?lengthInMetres;
         p:P2053/psn:P2053/(wikibase:quantityAmount|wikibaseβ:quantityAmount) ?areaInSquareMetres.
  BIND(?lengthInMetres / 1000 AS ?length).
  BIND(?areaInSquareMetres / 1000000 AS ?area).
  FILTER(?length > 500).
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
GROUP BY ?river ?riverLabel
Try it!
#Number of films by year and genre
#defaultView:ScatterChart
SELECT   ?year  (COUNT(?_genre) AS ?count ) (SAMPLE(?_genreLabel) AS ?label )  (?year as ?year_shown) WHERE {
  ?item wdt:P31 wd:Q11424.
  ?item wdt:P577 ?_publication_date.
  ?item wdt:P136 ?_genre.
  ?_genre rdfs:label ?_genreLabel.
  BIND(str(YEAR(?_publication_date)) AS ?year)
  FILTER((LANG(?_genreLabel)) = "en")
  FILTER (?_publication_date >= "2000-00-00T00:00:00Z"^^xsd:dateTime)
}
GROUP BY ?_genreLabel ?year
HAVING (?count > 30)
Try it!
Scatter Chart (animated) Try Out

Bubble Chart

[edit | edit source]

The #defaultView:BubbleChart shows a bubble chart where the size and the color of the bubble can configured.

When an item is provided a bubble is linked to Wikidata.org.

Variable Data Type Mapping
* Label Bubble Label
* Number Bubble Size
* Item Bubble Link (optional)
?rgb Color Bubble Color (optional)
#Overall causes of death ranking
#added before 2016-10
#defaultView:BubbleChart
SELECT ?cid ?cause (count(*) as ?count)
WHERE
{
	?pid wdt:P31 wd:Q5 .
	?pid wdt:P509 ?cid .
	OPTIONAL {
		?cid rdfs:label ?cause 
		filter (lang(?cause) = "en")
	}
}
GROUP BY ?cid ?cause
ORDER BY DESC(?count) ASC(?cause)
Try it!
Bubble Chart Try Out

Tree Map

[edit | edit source]

#defaultView:TreeMap shows a zoomable tree map.

Variable Data Type Mapping Description
* Label Label 1st label in a row will be 1st level in tree map.
2nd label will be 2nd level.
3rd label will be 3rd level etc..
* Item Link (optional)
#Popular television series (in treeview)
#added before 2016-10
#defaultView:TreeMap
SELECT ?show ?showLabel  ?season ?seasonLabel ?episode ?episodeLabel
WHERE 
{
	{	BIND(wd:Q886 as ?show) .      # The Simpsons
		?season wdt:P361 ?show .
		?episode wdt:P361 ?season .
	} UNION 
	{	BIND(wd:Q16538 as ?show) .    # South Park
		?season wdt:P361 ?show .
  		?episode wdt:P361 ?season .
	} UNION
	{	BIND(wd:Q147235 as ?show) .   # How I Met Your Mother 
		?season wdt:P361 ?show .
  		?episode wdt:P361 ?season .
	} 
	SERVICE wikibase:label {  bd:serviceParam wikibase:language "en" }
}
Try it!
Tree Map Try Out

#defaultView:Tree shows an expandable tree.

Variable Data Type Mapping Description
* Item Node ID First item will be a root node
The following items in the same row will be nested.
* Label Node Label (optional) Labels will be appended to current node.
* Number Node Label (optional)
* Commons Media Item image
#defaultView:Tree
SELECT ?continent ?continentFlag ?continentLabel ?country ?countryLabel ?countryFlag ?region ?regionLabel ?regionFlag ?city ?cityLabel ( 'Population:' as ?popLabel )  ?pop  ?cityImage 
WHERE
{
  {
    SELECT * WHERE {
      ?continent wdt:P31 wd:Q5107.
      ?country wdt:P30 ?continent.
      ?country wdt:P31 wd:Q6256.
      ?country wdt:P150 ?region.
      OPTIONAL {
        ?continent wdt:P242 ?continentFlag.
        ?country wdt:P41 ?countryFlag.
        ?region wdt:P41 ?regionFlag.
      }
      OPTIONAL {
        ?region wdt:P36 ?city.
        ?city wdt:P31 wd:Q515.
        ?city wdt:P1082 ?pop.
        ?city wdt:P18 ?cityImage.
      }
    }
  }
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
Try it!
Tree Try Out

Timeline

[edit | edit source]

#defaultView:Timeline shows an explorable timeline.

Variable Data Type Mapping Description
* DateTime Item Point in Time
Item End Point in Time
First date in a row sets the item position in the timeline.
If a second date is present a start and end date will be set.
* Commons Media Item image

Options:

Option name Type Description
hide One or more variable names (strings, starting with ?), single value or array Don’t show these variables in the result.
#defaultView:Timeline
SELECT ?item ?itemLabel ?launchdate (SAMPLE(?image) AS ?image)
WHERE
{
    ?item wdt:P31 wd:Q26529 .      # Space probe
    ?item wdt:P619 ?launchdate .
    OPTIONAL { ?item wdt:P18 ?image }
    SERVICE wikibase:label { bd:serviceParam wikibase:language "en" }
}
GROUP BY ?item ?itemLabel ?launchdate
Try it!
Timeline (of an other query) Try Out

Dimensions

[edit | edit source]

#defaultView:Dimensions shows dimension and their links.

It also allows filtering them by selecting an axis fraction.

Variable Data Type Mapping Description
* Label Dimension Every column in the result row is mapped to a dimension
* Number
* DateTime
#defaultView:Dimensions
SELECT ?elementLabel ?_boiling_point ?_melting_point ?_electronegativity ?_density ?_mass WHERE {
  ?element wdt:P31 wd:Q11344.
  ?element wdt:P2102 ?_boiling_point.
  ?element wdt:P2101 ?_melting_point.
  ?element wdt:P1108 ?_electronegativity.
  ?element wdt:P2054 ?_density.
  ?element wdt:P2067 ?_mass. 
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
LIMIT 100
Try it!
Dimensions Try Out

Graph

[edit | edit source]

Shows an interactive graph that allow hierarchically ordering and exploring items.

Variable Data Type Mapping Description
* Item Node ID First item in a row will have an arrow pointing to the items in the same row
* Label Node Label (optional)
* Commons Media Node Image (optional)
?rgb Color Node Color (optional)
* Number Node Size (optional)
#US presidents and spouses
#defaultView:Graph
SELECT ?p ?pLabel ?ppicture ?w ?wLabel ?wpicture
WHERE 
{
   wd:Q30 p:P6/ps:P6 ?p .
   ?p wdt:P26 ?w .
   OPTIONAL{
    ?p wdt:P18 ?ppicture .
    ?w wdt:P18 ?wpicture .
   }
   SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
Try it!
Graph of US presidents and spouses Try Out
#Children of Genghis Khan
#added before 2016-10
#defaultView:Graph
PREFIX gas: <http://www.bigdata.com/rdf/gas#>
SELECT ?item ?itemLabel ?pic ?linkTo ?linkToLabel
WHERE
{
  SERVICE gas:service {
    gas:program gas:gasClass "com.bigdata.rdf.graph.analytics.SSSP" ;
                gas:in wd:Q720 ;
                gas:traversalDirection "Forward" ;
                gas:out ?item ;
                gas:out1 ?depth ;
                gas:maxIterations 4 ;
                gas:linkType wdt:P40 .
  }
  OPTIONAL { ?item wdt:P40 ?linkTo }
  OPTIONAL { ?item wdt:P18 ?pic }
  SERVICE wikibase:label {bd:serviceParam wikibase:language "en,ru,ja,zh,az,ca" }
}
Try it!
Graph of children of Genghis Khan Try Out

Additional Information

[edit | edit source]

RDF Data Type Mapping

[edit | edit source]
RDF Type RDF Data Type Constraint Data Type
URI String starts with http://www.wikidata.org/entity/Q Entity
URI String starts with http://commons.wikimedia.org/wiki/Special:FilePath Commons Media
Literal http://www.w3.org/2001/XMLSchema#dateTime DateTime
Literal http://www.opengis.net/ont/geosparql#wktLiteral Coordinate
Literal

http://www.w3.org/2001/XMLSchema#double
http://www.w3.org/2001/XMLSchema#float
http://www.w3.org/2001/XMLSchema#decimal
http://www.w3.org/2001/XMLSchema#integer
http://www.w3.org/2001/XMLSchema#long
http://www.w3.org/2001/XMLSchema#int
http://www.w3.org/2001/XMLSchema#short
http://www.w3.org/2001/XMLSchema#nonNegativeInteger
http://www.w3.org/2001/XMLSchema#positiveInteger
http://www.w3.org/2001/XMLSchema#unsignedLong
http://www.w3.org/2001/XMLSchema#unsignedInt
http://www.w3.org/2001/XMLSchema#unsignedShort
http://www.w3.org/2001/XMLSchema#nonPositiveInteger
http://www.w3.org/2001/XMLSchema#negativeInteger

Number
Literal Has no RDF Data Type Label
Literal Has property xml:lang Label
Literal String matches /^[0-9A-F]{6}$/ Color

Formatting

[edit | edit source]

There is special formatting for the following data types:

Data Type Format
Entity Link to Wikidata.org
Label is shortened to wd:Qxx
Explore an item (magnify icon)
DateTime MM DD, YYYY
Tooltip shows raw ISO timestamp
Commons Media Link to Commons.Wikimedia.org
Displaying Gallery ( image icon )

Mobile / Responsive

[edit | edit source]

Result views are made to work on different devices and resolutions. Their appearance may change depending on the space they can use.

Embed Mode

[edit | edit source]

Result views can be linked or embedded via iframes. You can get the regarding link or code by clicking Link in the above menu. See the difference: link to a query and link to a result

or the embed code:

<iframe style="width:80vw; height:50vh;" scrolling="yes" frameborder="0" src="http://tinyurl.com/zwf4k2w">

Download results

[edit | edit source]
Download results buttons

There are several ways to export the results of a query.

JSON, TSV and CSV are three different open-standard formats to download the data.

You can also download some of the graphic views as a SVG file: this feature works with all the views except table, image grid, timeline, graph builder, map, and graph.

Default View

[edit | edit source]

The default view can be changed for each SPARQL query by using a comment:

#defaultView:[NAME]

For example:

#defaultView:ImageGrid

Result view options

[edit | edit source]

Result views can be configured with various options, which are specified in JSON (Q2063) format, either following the defaultView comment or following a separate view comment.

#defaultView:Map{"layer": "?typeLabel"}
#view:ImageGrid{"hide": "?coords"}

External tools

[edit | edit source]

See Wikidata:Tools/Visualize data for a list of external tools to visualize data.

References

[edit | edit source]


FAQ

The most popular questions-and-answers can be obtained through external links:


  1. You can see, by example, that these 2 queries don't give exactly the same values. Some values are missing in the second one.
    #Humans with height > 2,25 m
    SELECT ?taillem ?item
    WHERE
    {?item wdt:P31 wd:Q5 ;
           p:P2048 [psv:P2048 ?t ] .
      ?t wikibase:quantityAmount ?taille .
      ?t wikibase:quantityUnit/p:P2370/psv:P2370 [wikibase:quantityAmount ?conversion ; wikibase:quantityUnit wd:Q11573] .
      BIND(?taille * ?conversion AS ?taillem).
     filter(?taillem > 2.25)
     }
    order by desc (?taillem)
    
    Try it!
    #Humans with height > 2,25 m
    SELECT ?taille ?item
    WHERE
    {?item wdt:P31 wd:Q5 ;
           p:P2048/psn:P2048 [wikibase:quantityAmount ?taille ].
     filter(?taille > 2.25)
     }
    order by desc (?taille)
    
    Try it!