Jump to content

XML - Managing Data Exchange/XPath

From Wikibooks, open books for an open world



Previous Chapter Next Chapter
XHTML XLink



Learning objectives

  • Be able to conceptualize an XML document as a node tree
  • Refer groups of elements in an XML document
  • Understand the differences between abbreviated and unabbreviated XPath syntax
  • Understand the differences between absolute and relative Paths
  • Be able to use XPath predicates and functions to refine an XPath's node-set

Introduction

[edit | edit source]

Throughout the previous chapters you have learned the basic concepts of XSL and how you must refer to nodes in an XML document when performing an XSL transformation. Up to this point you have been using a straightforward syntax for referring to nodes in an XML document. Although the syntax you have used so far has been XPath there are many more functions and capabilities that you will learn in this chapter. As you begin to comprehend how path language is used for referring to nodes in an XML document your understanding of XML as a tree structure will begin to fall into place. This chapter contains examples that demonstrate many of the common uses of XPath, but for the full XPath specification, see the latest version of the standard at:

http://www.w3.org/TR/xpath

XSL uses XPath heavily.

When you go to copy a file or ‘cd’ into a directory at a command prompt you often type something along the lines of ‘/home/darnell/’ to refer to folders. This enables you to change into or refer to folders throughout your computer’s file system. XML has a similar way of referring to elements in an XML document. This special syntax is called XPath, which is short for XML Path Language.

XPath is a language for finding information in an XML document. XPath is used to navigate through elements and attributes in an XML document.

XPath, although used for referring to nodes in an XML tree, is not itself written in XML. This was a wise choice on the part of the W3C, because trying to specify path information in XML would be a very cumbersome task. Any characters that form XML syntax would need to be escaped so that it is not confused with XML when being processed. XPath is also very succinct, allowing you to call upon nodes in the XML tree with a great degree of specificity without being unnecessarily verbose.

XML as a tree structure

[edit | edit source]

The great benefit about XML is that the document itself describes the structure of data. If any of you have researched your family history, you have probably come across a family tree. At the top of the tree is some early ancestor and at the bottom of the tree are the latest children.

With a tree structure you can see which children belong to which parents, which grandchildren belong to which grandparents and many other relationships.

The neat thing about XML is that it also fits nicely into this tree structure, often referred to as an XML Tree.

Understanding node relationships

[edit | edit source]

We will use the following example to demonstrate the different node relationships.

<bookstore>
	<book>
		<title>Less Than Zero</title>
		<author>Bret Easton Ellis</author>
		<year>1985</year>
		<price>13.95</price>
	</book>
</bookstore>
Parent
Each element and attribute has one parent.
The book element is the parent of the title, author, year, and price:
Children
Element nodes may have zero, one or more children.
The title, author, year, and price elements are all children of the book element:
Siblings
Nodes that have the same parent.
The title, author, year, and price elements are all siblings:
Ancestors
A node's parent, parent's parent, etc.
The ancestors of the title element are the book element and the bookstore element:
Descendants
A node's children, children's children, etc.
Descendants of the bookstore element are the book, title, author, year, and price elements:

Also, it is still useful in some ways to think of an XML file as simultaneously being a serialized file, like you would view it in an XML editor. This is so you can understand the concepts of preceding and following nodes. A node is said to precede another if the original node is before the other in document order. Likewise, a node follows another if it is after that node in document order. Ancestors and descendants are not considered to be either preceding or following a node. This concept will come in handy later when discussing the concept of an axis.

Abbreviated vs. Unabbreviated XPath syntax

[edit | edit source]

XPath was created so that nodes can be referred to very succinctly, while retaining the ability to search on many options. Most uses of XPath will involve searching for child nodes, parent nodes, or attribute nodes of a particular node. Because these uses are so common, an abbreviated syntax can be used to refer to these commonly-searched nodes. Following is an XML document that simulates a tree (the type that has leaves and branches.) It will be used to demonstrate the different types of syntax.

<?xml version="1.0" encoding="UTF-8"?>
    <trunk name="the_trunk">
        <bigBranch name="bb1" thickness="thick">
            <smallBranch name="sb1">
                <leaf name="leaf1" color="brown" />
		<leaf name="leaf2" weight="50" />
		<leaf name="leaf3" />
	    </smallBranch>
	    <smallBranch name="sb2">
                <leaf name="leaf4" weight="90" />
		<leaf name="leaf5" color="purple" />
            </smallBranch>
        </bigBranch>
        <bigBranch name="bb2">
            <smallBranch name="sb3">
		<leaf name="leaf6" />
	    </smallBranch>
	    <smallBranch name="sb4">
		<leaf name="leaf7" />
		<leaf name="leaf8" />
		<leaf name="leaf9" color="black" />
		<leaf name="leaf10" weight="100" />
            </smallBranch>
        </bigBranch>
    </trunk>

Exhibit 9.2: tree. xml – Example XML page

Following are a few examples of XPath location paths in English, Abbreviated XPath, then Unabbreviated XPath.

Selection 1:

English: All <leaf> elements in this document that are children of <smallBranch> elements that are children of <bigBranch> elements, that are children of the trunk, which is a child of the root.
Abbreviated: /trunk/bigBranch/smallBranch/leaf
Unabbreviated: /child::trunk/child::bigBranch/child::smallBranch/child::leaf

Selection 2:

English: The <bigBranch> elements with ‘name’ attribute equal to ‘bb3,’ that are children of the trunk element, which is a child of the root.
Abbreviated: /trunk/bigBranch[@name=’bb3’]
Unabbreviated: /child::trunk/child::bigBranch[attribute::name=’bb3’]

Notice how we can specify which bigBranch objects we want by using a predicate in the previous example. This narrows the search down to only bigBranch nodes that satisfy the predicate. The predicate is the part of the XPath statement that is in square brackets. In this case, the predicate is asking for bigBranch nodes with their ‘name’ attribute set to ‘bb3’.

The last two examples assume we want to specify the path from the root. Let’s now assume that we are specifying the path from a <smallBranch> node.

Selection 3:

English:The parent node of the current <smallBranch>. (Notice that this selection is relative to a <smallBranch>)
Abbreviated: ..
Unabbreviated: parent::node()

When using the Unabbreviated Syntax, you may notice that you are calling a parent or child followed by two colons (::). Each of those are called an axis. You will learn more about axes shortly.

Also, this may be a good time to explain the concept of a location path. A location path is the series of location steps taken to reach the node/nodes being selected. Location steps are the parts of XPath statements separated by / characters. They are one step on the way to finding the nodes you would like to select.

Location steps are comprised of three parts: an axis (child, parents, descendant, etc.), a node test (name of a node, or a function that retrieves one or more nodes), and a series of predicates (tests on the retrieved nodes that narrow the results, eliminating nodes that do not pass the predicate’s test).

So, in a location path, each of its location steps returns a node-list. If there are further steps on the path after a location step, the next step is executed on all the nodes returned by that step.

Relative vs. Absolute paths

[edit | edit source]

When specifying a path with XPath, there are times when you will already be ‘in’ a node. But other times, you will want to select nodes starting from the root node. XPath lets you do both. If you have ever worked with websites in HTML, it works the same way as referring to other files in HTML hyperlinks. In HTML, you can specify an Absolute Path for the hyperlink, describing where another page is with the server name, folders, and filename all in the URL. Or, if you are referring to another file on the same site, you need not enter the server name or all of the path information. This is called a Relative Path. The concept can be applied similarly in XPath.

You can tell the difference by whether there is a ‘/’ character at the beginning of the XPath expression. If so, the path is being specified from the root, which makes it an Absolute Path. But if there is no ‘/’ at the beginning of the path, you are specifying a Relative Path, which describes where the other nodes are relative to the context node, or the node for which the next step is being taken.

Below is an XSL stylesheet (Exhibit 9.3) for use with our tree.xml file above (Exhibit 9.2).

<?xml version="1.0" encoding="UTF-8" ?>
 <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:output method="html"/>

<!-- Example of an absolute link. The element '/child::trunk'
 is being specified from the root element. -->

 <xsl:template match="/child::trunk">

<html>
    <head>
        <title>XPath Tree Tests</title>
    </head>
     <body>

<!-- Example of a relative link. The <for-each> xsl statement will
    execute for every <bigBranch> node in the
    ‘current’ node, which is the <trunk>node. -->

 <xsl:for-each select="child::bigBranch">

         <xsl:call-template name="print_out" />
           </xsl:for-each>
        </body>
   </html>
</xsl:template>
      <xsl:template name="print_out">
             <xsl:value-of select="attribute::name" /> <br />
   </xsl:template>
 </xsl:stylesheet>

Exhibit 9.3: xsl_tree.xsl – Example of both a relative and absolute path

Four types of XPath location paths

[edit | edit source]

In the last two sections you learned about two different distinctions to separate out different location paths: Unabbreviated vs. Abbreviated and Relative vs. Absolute. Combining these two concepts could be helpful when talking about XPath location paths. Not to mention, it could make you sound really smart in front of your friends when you say things like:

  1. Abbreviated Relative Location Paths- Use of abbreviated syntax while specifying a relative path.
  2. Abbreviated Absolute Location Paths- Use of abbreviated syntax while specifying a absolute path.
  3. Unabbreviated Relative Location Paths- Use of unabbreviated syntax while specifying a relative path.
  4. Unabbreviated Absolute Location Paths- Use of unabbreviated syntax while specifying a absolute path.

I only mention this four-way distinction now because it could come in handy while reading the specification, or other texts on the subject.

XPath axes

[edit | edit source]

In XPath, there are some node selections whose performance requires the Unabbreviated Syntax. In this case, you will be using an axis to specify each location step on your way through the location path.

From any node in the tree, there are 13 axes along which you can step. They are as follows:

Axes Meaning
ancestor:: Parents of the current node up to the root node
ancestor-or-self:: Parents of the current node up to the root node and the current node
attribute:: Attributes of the current node
child:: Immediate children of the current node
descendant:: Children of the current node (including children's children)
descendant-or-self:: Children of the current node (including children's children) and the current node
following:: Nodes after the current node (excluding children)
following-sibling:: Nodes after the current node (excluding children) at the same level
namespace:: XML namespace of the current node
parent:: Immediate parent of the current node
preceding:: Nodes before the current node (excluding children)
preceding-sibling:: Nodes before the current node (excluding children) at the same level
self:: The current node

XPath predicates and functions

[edit | edit source]

Sometimes, you may want to use a predicate in an XPath Location Path to further filter your selection. Normally, you would get a set of nodes from a location path. A predicate is a small expression that gets evaluated for each node in a set of nodes. If the expression evaluates to ‘false’, then the node is not included in the selection. An example is as follows:

//p[@class=‘alert’]

In the preceding example, every <p> tag in the document is checked to see if its ‘class’ attribute is set to ‘alert’. Only those <p> tags with a ‘class’ attribute with value ‘alert’ are included in the set of nodes for this location path.

The following example uses a function, which can be used in a predicate to get information about the context node.

/book/chapter[position()=3]

This previous example selects only the chapter of the book in the third position. So, for something to be returned, the current <book> element must have at least 3 <chapter> elements.

Also notice that the position function returns an integer. There are many functions in the XPath specification. For a complete list, see the W3C specification at http://www.w3.org/TR/xpath#corelib

Here are a few more functions that may be helpful:

number last() – last node in the current node set

number position() – position of the context node being tested

number count(node-set) – the number of nodes in a node-set

boolean starts-with(string, string) – returns true if the first argument starts with the second

boolean contains(string, string) – returns true if the first argument contains the second

number sum(node-set) – the sum of the numeric values of the nodes in the node-set

number floor(number) – the number, rounded down to the nearest integer

number ceiling(number) – the number, rounded up to the nearest integer

number round(number) – the number, rounded to the nearest integer

Example

[edit | edit source]

The following XML document, XSD schemas, and XSL stylesheet examples are to help you put everything you have learned in this chapter together using real life data. As you study this example you will notice how XPath can be used in the stylesheet to call and modify the output of specific information from the document.

Below is an XML document (Exhibit 9.4)

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet href="movies.xsl" type="text/xsl" media="screen"?>
<movieCollection xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:noNamespaceSchemaLocation="movies.xsd">

<movie>
    <movieTitle>Meet the Parents</movieTitle>
    <movieSynopsis>
    Greg Focker is head over heels in love with his girlfriend Pam, and is ready to
    pop the big question. When his attempt to propose is thwarted by a phone call
    with the news that Pam's younger sister is getting married, Greg realizes that
    the key to Pam's hand in marriage lies with her formidable father.
    </movieSynopsis>
    <role>
        <roleIDREF>bs1</roleIDREF>
        <roleType>Lead Actor</roleType>
    </role>
    <role>
        <roleIDREF>tp1</roleIDREF>
        <roleType>Lead Actress</roleType>
    </role>
    <role>
        <roleIDREF>rd1</roleIDREF>
        <roleType>Lead Actor</roleType>
    </role>
    <role>
        <roleIDREF>bd1</roleIDREF>
        <roleType>Supporting Actress</roleType>
    </role>
</movie>

<movie>
    <movieTitle>Elf</movieTitle>
    <movieSynopsis>
    One Christmas Eve, a long time ago, a small baby at an orphanage crawled into
    Santa’s bag of toys, only to go undetected and accidentally carried back to Santa’s
    workshop in the North Pole. Though he was quickly taken under the wing of a surrogate
    father and raised to be an elf, as he grows to be three sizes larger than everyone else,
    it becomes clear that Buddy will never truly fit into the elf world. What he needs is
    to find his real family. This holiday season, Buddy decides to find his true place in the
    world and sets off for New York City to track down his roots.
    </movieSynopsis>
    <role>
        <roleIDREF>wf1</roleIDREF>
        <roleType>Lead Actor</roleType>
    </role>
    <role>
        <roleIDREF>jc1</roleIDREF>
        <roleType>Supporting Actor</roleType>
    </role>
    <role>
        <roleIDREF>zd1</roleIDREF>
        <roleType>Lead Actress</roleType>
    </role>
    <role>
        <roleIDREF>ms1</roleIDREF>
        <roleType>Supporting Actress</roleType>
    </role>
    </movie>

<castMember>
    <castMemberID>rd1</castMemberID>
    <castFirstName>Robert</castFirstName>
    <castLastName>De Niro</castLastName>
    <castSSN>489-32-5984</castSSN>
    <castGender>male</castGender>
</castMember>

<castMember>
    <castMemberID>bs1</castMemberID>
    <castFirstName>Ben</castFirstName>
    <castLastName>Stiller</castLastName>
    <castSSN>590-59-2774</castSSN>
    <castGender>male</castGender>
</castMember>

<castMember>
    <castMemberID>tp1</castMemberID>
    <castFirstName>Teri</castFirstName>
    <castLastName>Polo</castLastName>
    <castSSN>099-37-8765</castSSN>
    <castGender>female</castGender>
</castMember>

<castMember>
    <castMemberID>bd1</castMemberID>
    <castFirstName>Blythe</castFirstName>
    <castLastName>Danner</castLastName>
    <castSSN>273-44-8690</castSSN>
    <castGender>male</castGender>
</castMember>

<castMember>
    <castMemberID>wf1</castMemberID>
    <castFirstName>Will</castFirstName>
    <castLastName>Ferrell</castLastName>
    <castSSN>383-56-2095</castSSN>
    <castGender>male</castGender>
</castMember>

<castMember>
    <castMemberID>jc1</castMemberID>
    <castFirstName>James</castFirstName>
    <castLastName>Caan</castLastName>
    <castSSN>389-49-3029</castSSN>
    <castGender>male</castGender>
</castMember>

<castMember>
    <castMemberID>zd1</castMemberID>
    <castFirstName>Zooey</castFirstName>
    <castLastName>Deschanel</castLastName>
    <castSSN>309-49-4005</castSSN>
    <castGender>female</castGender>
</castMember>

<castMember>
    <castMemberID>ms1</castMemberID>
    <castFirstName>Mary</castFirstName>
    <castLastName>Steenburgen</castLastName>
    <castSSN>988-43-4950</castSSN>
    <castGender>female</castGender>
</castMember>

</movieCollection>

Exhibit 9.4: movies_xpath.xml

Below is the second XML document (Exhibit 9.5)

<?xml version="1.0" encoding="UTF-8"?>

<cities xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:noNamespaceSchemaLocation="cities.xsd">

<city>
    <cityID>c2</cityID>
    <cityName>Mandal</cityName>
    <cityPopulation>13840</cityPopulation>
    <cityCountry>Norway</cityCountry>
    <tourismDescription>A small town with a big atmosphere.  Mandal provides comfort
away from normal luxuries.
    </tourismDescription>
    <capitalCity>c3</capitalCity>
</city>

<city>
    <cityID>c3</cityID>
    <cityName>Oslo</cityName>
    <cityPopulation>533050</cityPopulation>
    <cityCountry>Norway</cityCountry>
    <tourismDescription>Oslo is the capital of Norway for many reasons.
    It is also the capital location for tourism.  The culture, shopping,
    and attractions can all be experienced in Oslo.  Just remember
    to bring your wallet.
    </tourismDescription>
</city>

</cities>

Exhibit 9.5: cites__xpath.xml

Below is the Movies schema (Exhibit 9.6)

<?xml version="1.0" encoding="UTF-8"?>

<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" elementFormDefault="unqualified">

  <!--Movie Collection-->

  <xsd:element name="movieCollection">
    <xsd:complexType>
      <xsd:sequence>
        <xsd:element name="movie" type="movieDetails" minOccurs="1" maxOccurs="unbounded"/>

      </xsd:sequence>
    </xsd:complexType>
  </xsd:element>

  <!--This contains the movie details.-->

  <xsd:complexType name="movieDetails">
    <xsd:sequence>
      <xsd:element name="movieTitle" type="xsd:string" minOccurs="1" maxOccurs="unbounded"/>
      <xsd:element name="movieSynopsis" type="xsd:string"/>
      <xsd:element name="role" type="roleDetails" minOccurs="1" maxOccurs="unbounded"/>
    </xsd:sequence>
  </xsd:complexType>

 <!--The contains the genre details.-->

  <xsd:complexType name="roleDetails">
    <xsd:sequence>
       <xsd:element name="roleIDREF" type="xsd:IDREF"/>
       <xsd:element name="roleType" type="xsd:string"/>
    </xsd:sequence>
  </xsd:complexType>

  <xsd:simpleType name="ssnType">
       <xsd:restriction base="xsd:string">
           <xsd:pattern value="\d{3}-\d{2}-\d{4}"/>
       </xsd:restriction>
   </xsd:simpleType>

 <xsd:complexType name="castDetails">
    <xsd:sequence>
       <xsd:element name="castMemberID" type="xsd:ID"/>
       <xsd:element name="castFirstName" type="xsd:string"/>
       <xsd:element name="castLastName" type="xsd:string"/>
       <xsd:element name="castSSN" type="ssnType"/>
       <xsd:element name="castGender" type="xsd:string"/>
    </xsd:sequence>
  </xsd:complexType>

</xsd:schema>

Exhibit 9.6: movies.xsd

Below is the Cities schema (Exhibit 9.7)

<?xml version="1.0" encoding="UTF-8"?>

<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" elementFormDefault="qualified"
attributeFormDefault="unqualified">

<xsd:element name="cities">
  <xsd:complexType>
    <xsd:sequence>
      <xsd:element name="city" type="cityType" maxOccurs="unbounded"/>
    </xsd:sequence>
  </xsd:complexType>
</xsd:element>
<xsd:complexType name="cityType">
  <xsd:sequence>
    <xsd:element name="cityID" type="xsd:ID"/>
     <xsd:element name="cityName" type="xsd:string"/>
     <xsd:element name="cityPopulation" type="xsd:integer"/>
     <xsd:element name="cityCountry" type="xsd:string"/>
     <xsd:element name="tourismDescription" type="xsd:string"/>
     <xsd:element name="capitalCity" type="xsd:IDREF" minOccurs="0" maxOccurs="1"/>
  </xsd:sequence>
</xsd:complexType>
</xsd:schema>

Exhibit 9.7: cities.xsd

Below is the XSL stylesheet (Exhibit 9.8)

<?xml version="1.0" encoding="UTF-8"?>

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:key name="castList" match="castMember" use="castMemberID"/>
<xsl:output method="html"/>

<!-- example of using an abbreviated absolute path to pull info
from cities_xpath.xml for the city "Oslo" specifically -->

<!-- specify absolute path to select cityName and assign it the variable "city" -->
<xsl:variable name="city" select="document('cities_xpath.xml')
/cities/city[cityName='Oslo']/cityName" />

<!-- specify absolute path to select cityCountry and assign it the variable "country" -->
<xsl:variable name="country" select="document('cities_xpath.xml')
/cities/city[cityName='Oslo']/cityCountry" />

<!-- specify absolute path to select tourismDescription and assign it the variable "description" -->
<xsl:variable name="description" select="document('cities_xpath.xml')
/cities/city[cityName='Oslo']/tourismDescription" />

<xsl:template match="/">
<html>
    <head>
        <title>Movie Collection</title>
    </head>
    <body>
        <h2>Movie Collection</h2>
    <xsl:apply-templates select="movieCollection"/>
    </body>
</html>
</xsl:template>
<xsl:template match="movieCollection">

<!-- let's say we just want to see the actors. -->
<!--
<xsl:for-each select="movie">
<hr />
<br />
<b><xsl:text>Movie Title: </xsl:text></b>
<xsl:value-of select="movieTitle"/>
<br />
<br />
<b><xsl:text>Movie Synopsis: </xsl:text></b>
<xsl:value-of select="movieSynopsis"/>
<br />
<br />-->

<!-- actor info begins here. -->
<b><xsl:text>Cast: </xsl:text></b>
<br />
<!-- specify an abbreviated relative path here for "role."
NOTE: there is no predicate in this one; it's just a path. -->

<xsl:for-each select="movie/role">
<xsl:sort select="key('castList',roleIDREF)/castLastName"/>
<xsl:number value="position()" format="&#xa; 0. " />
<xsl:value-of select="key('castList',roleIDREF)/castFirstName"/>
<xsl:text>   </xsl:text>
<xsl:value-of select="key('castList',roleIDREF)/castLastName"/>
<xsl:text>,   </xsl:text>
<xsl:value-of select="roleType"/>
<br />
<xsl:value-of select="key('castList',roleIDREF)/castGender"/>
<xsl:text>,   </xsl:text>
<xsl:value-of select="key('castList',roleIDREF)/castSSN"/>
<br />
<br />
</xsl:for-each>
<!--
</xsl:for-each>-->
<hr />

<!--calling the variables -->

<span style="color:red;">
<p><b>Travel Advertisement</b></p>

<!-- reference the city, followed by a comma, and then the country -->
<p><xsl:value-of select="$city" />, <xsl:value-of select="$country" /></p>

<!-- reference the description -->
<xsl:value-of select="$description" />

</span>
</xsl:template>
</xsl:stylesheet>

Exhibit 9.6: movies.xsl

Summary

[edit | edit source]
Throughout the chapter we have learned many of the features and capabilities of the XML Path Language. You should now have a good understanding of node relationships though the use of the XML tree structure. Using the concept of Abbreviated and Unabbreviated location paths allows us to narrow our searches down to only a particular element by satisfying the predicate in the square brackets. Relative and Absolute are used for specifying the path to your location. The Relative path gives the file location in relation to the current working directory while the Absolute path gives an exact location of a file or directory name within a computer or file system. Both of these concepts can be combined to come up with four types of XPath location paths: Abbreviated Relative, Abbreviated Absolute, Unabbreviated Relative, and lastly Unabbreviated Absolute. If further filtering is required XPath predicates and functions can be used. These allow for the predicate to be evaluated for such things as true/false and count functions. When used correctly XPath can be a very powerful tool in the XML language.