XPath

The current, editable version of this book is available in Wikibooks, the open-content textbooks collection, at
https://en.wikibooks.org/wiki/XPath

Permission is granted to copy, distribute, and/or modify this document under the terms of the Creative Commons Attribution-ShareAlike 3.0 License.

Basic Syntax

Basic XPath Syntax

Expressions that start with a forward slash "/" are called absolute expressions. They start at the root of the document. All other expressions are relative to the current position within an XML document.

Expressions are created by creating a list of step expressions of the form

  step[predicate]/step[predicate]/step[predicate]

You can think of the predicate as a filter or conditional expression that service like a WHERE clause in SQL.

Sample XML file

Many of the examples use a "books" example such as the following:

http://raw.github.com/dmccreary/learn-xquery/master/data/books.xml

In general the books file has the following structure:

<books>
  <book>
    <title>XQuery</title>
    <format>wikibook</format>
  </book>
</books>

Basic XPath Expressions

The root document node

Note that the forward slash returns the document root, not the full books element.

The root node that contains all the books:

 /books

All book elements:

 /books/book
 //book

The first version is with an absolute path. The second uses a relative path - book elements at any level of the file.

Note that the first expression is faster in unindexed XML but within indexed native XML databases the second is faster.

A count of the number of books:

  count(//book)

All the book titles:

  //book/title

The second book in the collection:

  //book[2]

The title of the second book:

  //book[2]/title

The third author of the second book

  //book[2]/author[3]

All books with the format "wikibook":

  //book[format='wikibook']

Get a list of all the publishers

  //publisher

Get a distinct list of the publishers (duplicates removed)

  distinct-values(//publisher)

Books that have at least one price over 30

  //book[list-price > 30]

XPath abbreviations

. represents the current node

.. represents the nearest parent node

@ represents the attribute delimiter

$ represents the variable delimiter

[n] represents the n-th child of the current node

ancestor::div represents the set of parent div nodes

normalize-space(firstname)="Paul" matches Paul regardless of whitespace delimiters

boolean(string($myvar) ) checks for empty strings

/ represents the absolute path of the root node

@* represents all attributes of the current node

-Return all values using a union of attributes, node names, and text values:

@*|node()|text()

-Return all of a node's siblings using a union of the preceding-sibling and following-sibling axes:

preceding-sibling::node() | following-sibling::node()

-Return the adjacent sibling of a specific type

//div/following-sibling::h3

-Check string value of current node

[. = "Matthew Bob"]

-Node identity can be checked using the count() function to see if the intersection of two node-sets of the same length equals the length of either of the node sets(or in the case of a single node set whether it is equal to 1). For example, the following query returns TRUE in this case because both nodes are the same:

count(/bk:books | /bk:books/bk:book[1]/parent::*) = 1

CSS Equivalents

:disabled Equivalent

//*[@disabled] represented :disabled

:checked Equivalent

//*[@checked] represents :checked

:selected Equivalent

//*[@selected] represents :selected

:text Equivalent

//*[@type="text"] represents :text

:contains Equivalent

//*[contains(text(),"you")] represents :contains("you")

:only-of-type query

//p[contains(@me,"you")] represents p[me*="you"]

Starts with Equivalent

//p[starts-with(@me,"you")] represents p[me^="you"]

Contains Equivalent

//p[starts-with(@me,concat("you",'-'))] represents p[me|="you"]

Ends with Equivalent

//p[substring(@me,string-length(@me)-2)="you"] represents p[me$="you"]

Like Equivalent

//p[contains(concat(" ",@me, " ")," you ")] represents p[me~="you"]

Negation Attribute Equivalent

//p[@me!="you"] represents p[me!="you"]

ID Equivalent

//p[@id="me"] represents p#me

:not Equivalent

//p[not(@id="me")] represents p:not(#me)

Class Equivalent

//p[contains(concat(" ", @class, " "), " me ")] represents p.me

Descendant Equivalent

//div//p represents div p

Child Equivalent

//div/p represents div > p

Adjacent Sibling Equivalent

//h1/following-sibling::div represents h1 + div

General Sibling Equivalent

//h1/following-sibling::*[count(div)] represents h1 ~ div

CSS3 / jQuery Equivalents

:nth-last-child(n) query

//*[count(child::node() ) > 0] represents hasChildNodes

:root query

/*[1] represents :root

:first-child query

descendant::*[1] represents :first-child

:last-child query

//*[last()] represents :last-child

:nth-last-child(n) query

//*[count(*)=1] represents :only-child

:empty query

//*[count(*) = 0] represents :empty

:nth-child(n) query

//*[position() mod n = 1] represents :nth-child(n)

:nth-child(odd) query

//*[(position() mod 2)=1] represents :nth-child(odd)

:nth-child(even) query

//*[(position() mod 2)=0] represents :nth-child(even)

:nth-last-child(n) query

//*[(count() - position()) mod n = 1] represents :nth-last-child(n)

:nth-of-type(n) query

//p[n] represents :nth-of-type(n)

:nth-last-of-type(n) query

//p[(count() - position()) mod n = 1] represents :nth-last-of-type(n)

:first-of-type query

descendant::p[1] represents :first-of-type

:last-of-type query

//p[last()] represents :last-of-type

:only-of-type query

//p[count(*)=1] represents :only-of-type

-moz-any/-webkit-any query

[local-name()='h1' or local-name()='h4']/node() represents -moz-any(h1,h4) *

DOM Equivalents

//mytag represents getElementsByTagName

//*[@class=$myclass] represents getElementsByClassName

//* represents childNodes

preceding-sibling::*[1] represents previousSibling

following-sibling::*[1] represents nextSibling

./following-sibling::* represents generalSibling

Conditional Logic Equivalents

-A predicate is like a SQL where clause

-A pipe is like a SQL union clause

-An axis is like a SQL t1.col = t2.col

-Use a predicate with a boolean variable check as an if statement

-Use a pipe with a tagname search as a range checker

//h2 | //h3 | //h4

-Use a pipe with a negation predicate variable check as an else statement

//var[1] | //var[not //var]

-Use a repeating axis to skip levels in the tree to retrieve nodes at every other branch

child::*/child::*

-Use variables to store individual checks within complex and/or conditional tests

-Use variables within loops to store iteration dependent variables and separate the logic from the output

-Use string-length to test the existence of functions

-Use separate tests like (a and c) or (b and c) instead of nested conditions like ((a or b) and c)

-Use string(.), local-name(.),string-length(concat(., '') ), number(.), and boolean(.)[boolean(.)] to test node values, names and existence

-Use //node()[local-name(.) = $myvar] to test for the existence of form values

-Use //node()[local-name(.) = $myvar][boolean(node())] to skip empty form values

-Always skip empty nodes when debugging

-Use local-name(.)[boolean(.)] to test for empty tags in the context node

-Use boolean(@*[not aa or not bb]) to filter known attributes

-Use {boolean((string$myvar))} to test interpolated variables

-Use boolean(following::*[1] or following::. or following::self::*) to test closing tag failures

-Use count(//*[1 | last() = 1]) to count nodes with only one child

-Use (table1 | table2)[col=val]/* to do a join

-Use *[not(@*)] to return nodes that have no attributes

-Use number(.) - number(.) to suppress number values

-Use substring(., 0, string-length(.) ) to suppress string values

-Use substring('0', 1, not($myvar) ) to set an undefined variable to zero

-Use normalize-whitespace($myvar) to remove tab characters in lists

-Use translate($myvar, $ABCvar, $abcvar) with variables storing A-Z and a-z to ignore case for node queries

-Use string-length instead of contains when checking for list position to make logic data independent

-Use use-attribute-sets to share boilerplate arguments with multiple elements, such as table or list rows

-Use * or self::* whenever selecting to return a nodeset or single node

-Use string(.) to get a node from an XML file called with the document function

-Use different delimiters between each list item to make substring-before/after logic more readable

-Use string concatenation of node/class names with counter/node values to generate ID attributes

-Use a variable to store the previous index value with a comma to make substring-after work with recursion

-Use a predicate with the boolean value of the node as a guard operator

smyvar[$myvar]

-Use XSL nodes to store local values and test XML nodes in predicates with a pipe to emulate a default operator $dynamic[$var] | $default[not($var)]

-Use the child axis instead of slash for grouping child nodes

child::(boy | girl)

-Avoid (*) because it walks the tree before testing child nodes

-Use node() instead of . when searching all elements using //

-Use a predicate check of the generate-id of the node itself and a node variable to do intersect and except set operations

-Use concat with a node and a dummy param to check for node existence

-Use string(number(.))=NaN to check existence of numeric node values

-Use not($a=$b) instead of $a !=$b when comparing variables that contain more than one node

-Use newlines after parentheses to avoid leaving one open ended

-Use predicates to check if form field names exist in the XML doc

-Use qname to get the namespace binding of a tag

-Use id() to get generate-id values instead of variable interpolation

SQL Equivalents

-XPath cannot do join-like queries, but can do union, intersection, subset, and difference like SQL

-XPath supports set operations like SQL using variations of the Union operation and the count function:

a UNION b: $a | $b

b UNION c: $b | $c

a INTERSECTION b: $a[count(.|$b) = count($b)]

a INTERSECTION c: $a[count(.|$c) = count($c)]

(Intersection takes the union of $b with any node in $a and returns the set of nodes in $a that are also in $b)

a DIFFERENCE b: $a[count(.|$b) != count($b)] | $b[count(.|$a) != count($a)]

a DIFFERENCE c: $a[count(.|$c) != count($c)] | $c[count(.|$a) != count($a)]

(Difference takes the union of the differences of $a with $b or $c and returns the set of nodes unique to $a versus $b or $c)

a SYM DIFFERENCE b: $a[count(. | $b) != count($b)] | $b[count(. | $a) != count($a)]

(Symmetrical difference takes the union of the differences from both sides and returns the set of nodes unique to both $a and $b)

a SUBSET OF b: count($b | $a) = count($b) and count($b) > count($a)

b SUBSET OF a: count($b | $a) = count($a) and count($a) > count($b)

(Subset means that the union of $a with $b returns the same set of nodes and either $a or $b is larger)

XPath can be embedded in an xpointer to make a smart url:

http://www.abcpub.co.uk/sitemap.xml#xpointer(//url)

References

Introduction to XPath

Things to Know and Avoid When Querying XML Documents with XPath

XPath and Namespace Primer

Little Black Corners of XPath

Executing XPath Queries with Namespaces in the URL

One-Based Indexes in XPath

Are multiple XPath Predicates the same as the Boolean "and" Operator

XPath in JavaScript, Part 3

XPath in JavaScript, Part 2

XPath in JavaScript, Part 1

Implementing XPath

Implementing XPath, Part II

XPath Tips

What's New in XPath 2.0

XQuery and Data Abstraction

XPath Functions and Numeric Operators

Cool things you can do with XPath in XForms

Practical data binding: XPath as data binding tool, Part 1

Practical data binding: XPath as data binding tool, Part 2

Working XML: Get started with XPath 2.0

How XQuery extends XPath

Avoid the dangers of XPath injection

Introduction to using XPath in JavaScript

PrintTicket, names, and XPath

XPath - What is an XmlNode, and what does node() return?

Deep XML Geekery: XPath and not()

XPath Powers: Calculating Totals

Powerful Declarative Logic: Phone Number Parsing

Enforcing unique values in a repeating list

Use XPath to Perform a Case-Insensitive Search with MSXML

XPath Visualizer

Using XPath with cURL

Using XPath with PHP to Scrape Web Pages

XPath CSS Class Matching

XPath vs CSS Selectors

More XPath vs CSS Selectors

XPath in Five Paragraphs