XPath Expressions¶
As you saw in the basic examples section, the findnodes()
method takes an XPath expression and finds nodes in the DOM that
match the expression. There are two ways to call calling the findnodes()
method:
- on the object representing the whole document, or
- on an element from the DOM - the element on which you call the method is called the context element
If your XPath expression starts with a ‘/’ then the search will start at
top-most element in the document - even if you call findnodes()
on a
different context element.
Start your XPath expression with ‘.’ to search down through the children of the context element.
The remainder of this section simply includes examples of XPath expressions and descriptions of what they match.
Note
You can try out different XPath expressions in the XPath sandbox. The sandbox doesn’t actually use Perl or libxml, it simply uses Javascript to access the XPath engine built into your browser. However, the expression matching should work just as it would in your Perl scripts.
/playlistTry it!
Match the top-most element of the document if (and only if) it is a
<playlist>
element.
//titleTry it!
Match every <title>
element in the document.
//movie/titleTry it!
Match every <title>
element that is the direct child of a <movie>
element.
./title
Match every <title>
element that is the direct child of the context
element, e.g.:
foreach my $movie ($dom->findnodes('//movie')) {
say 'Title: ', $movie->findvalue('./title');
}
//title/..Try it!
Match any element which is the parent of a <title>
element.
/*Try it!
Match the top-most element of the document regardless of the element name.
//person/@roleTry it!
Match the attribute named role
on every <person>
element.
//person/@*Try it!
Match every attribute on every <person>
element.
//person[@role]Try it!
Match every <person>
element that has an attribute named role
.
//*[@url]Try it!
Match every element that has an attribute named url
.
//*[@*]Try it!
Match every element that has an attribute of any name.
/playlist//*[not(@*)]Try it!
Match every element that is a descendant of the top-level <playlist>
element and which does not have any attributes.
//movie[@id="tt0307479"]Try it!
Match every <movie>
element that has an attribute named id
with the
value tt0307479
.
//movie[not(@id="tt0307479")]Try it!
Match every <movie>
element that does not have an attribute named
id
with the value tt0307479
(including elements that do not have
an id
attribute at all).
//*[@id="tt0307479"]Try it!
Match every element that has an attribute named id
with the value
tt0307479
.
//movie[@id="tt0307479"]//synopsisTry it!
Match every synopsis
element within every <movie>
element that has
an attribute named id
with the value tt0307479
.
//person[position()=2]Try it!
Match the second <person>
element in each sequence of adjacent
<person>
elements. Note that the first element in a sequence is at
position 1 not 0.
//person[2]Try it!
This is simply a shorthand form of the position()=2
expression above.
//person[position()<3]Try it!
Match the first two <person>
elements in each sequence of adjacent
<person>
elements.
//person[last()]Try it!
Match the last <person>
element in each sequence of adjacent
<person>
elements.
//cast[count(person)=3]Try it!
Match every <cast>
element which contains exactly 3 <person>
elements.
//*[name()='genre']Try it!
Match every element with the name genre
- exactly equivalent to
//genre
.
//*[starts-with(name(), 'running')]Try it!
Match every element with a name starting with the word running
.
//person[contains(@name, 'Matt')]Try it!
Match every <person>
element that has an attribute named name
which contains the text Matt
anywhere in the attribute value.
//person[contains(@name, 'matt')]Try it!
Same as above except for the casing of the text to match. Matching is case-sensitive.
//person[not(contains(@name, 'e'))]Try it!
Match every <person>
element that has an attribute named name
which does not contain the letter e
anywhere in the attribute value.
//person[starts-with(@name, 'K')]Try it!
Match every <person>
element that has an attribute named name
with
a value that starts with the letter K
.
//director/text()Try it!
Match every text node which is a direct child of a <director>
element.
//cast/text()Try it!
Match every text node which is a direct child of a <cast>
element.
You might imagine that this would not match anything, since in the sample
document the <cast>
elements contain only <person>
elements. But
if you look carefully, you’ll see that in between each <person>
element
there is some whitespace - a newline after the preceding element and then
some spaces at the start of the next line. This whitespace is text and is
therefore matched.
//person[contains(@name,'Matt')]/parent::*Try it!
Match the parent of every <person>
element which contains Matt
in
the name
attribute. (You could also use /..
for the parent). The
syntax parent::*
means any element on the parent axis.
//person[contains(@name,'Matt')]/ancestor::movieTry it!
Match every <movie>
element which is an ancestor of a <person>
element which contains Matt
in the name
attribute. The syntax
ancestor::*
means any element on the ancestor axis.
//genre[text()='drama']/following-sibling::*Try it!
Match every element of any name, which is a sibling of a <genre>
element whose complete text content is drama
and which follows that
element in document order.
//genre[text()='drama']/following-sibling::genreTry it!
Match every <genre>
element, which is a sibling of a <genre>
element whose complete text content is drama
and which follows that
element in document order.
//genre[text()='drama']/preceding-sibling::genreTry it!
Match every <genre>
element, which is a sibling of a <genre>
element whose complete text content is drama
and which comes before
that element in document order.
//movie[@id="tt0112384"]/following::titleTry it!
Match every <title>
element, which comes after a <movie>
element
with tt0112384
as the value of the id
attribute. Note that ‘after’
means after the closing tag so a <title>
element inside the matching
<movie>
would not be included.
//movie[.//score/text() < 7.5]Try it!
Match every <movie>
element which contains a <score>
element with
text content numerically less than 7.5.
//movie[.//score/text() > 8.0]//synopsisTry it!
Match every <synopsis>
element in every <movie>
element which
contains a <score>
element with text content numerically greater than
8.0.
//director or //genreTry it!
Match every element which is a <director>
or a <genre>
.
//person[contains(@name, 'Bill') and contains(@role, 'Fred')]Try it!
Match every <person>
element which contains Bill
in the name
attribute and contains Fred
in the role attribute.
//person[@name='Kevin Bacon']/../person[@name!='Kevin Bacon']Try it!
Find every person who has played alongside Kevin Bacon. First find every
<person>
element with a name attribute equal to Kevin Bacon
. Then
find the parent of each matching element and look for its child
<person>
elements with a name attribute which is not equal to Kevin
Bacon
.
XPath Functions¶
Some of the examples above used XPath functions. It’s worth noting that the underlying libxml2 library only supports XPath version 1.0 and there are no plans to support 2.0.
XPath 1.0 does not include the lower-case()
or upper-case()
functions,
so nasty workarounds like this are required if you need case-insensitive
matching:
my $query = q{
//person[
contains(
translate(
@name,
'ABCDEFGHIJKLMNOPQRSTUVWXZY',
'abcdefghijklmnopqrstuvwxyz'
),
'matt'
)
]
};
foreach my $person ($dom->findnodes($query)) {
say "Person: $person->{name}";
}
Alternatively, you can use the Perl API to register custom XPath functions.