XPath Expressions¶

As you saw in the basic examples section, the findnodes() method takes an XPath expression and finds nodes in the DOM that match the expression. There are two ways to call calling the findnodes() method:

on the object representing the whole document, or
on an element from the DOM - the element on which you call the method is called the context element

If your XPath expression starts with a ‘/’ then the search will start at top-most element in the document - even if you call findnodes() on a different context element.

Start your XPath expression with ‘.’ to search down through the children of the context element.

The remainder of this section simply includes examples of XPath expressions and descriptions of what they match.

Note

You can try out different XPath expressions in the XPath sandbox. The sandbox doesn’t actually use Perl or libxml, it simply uses Javascript to access the XPath engine built into your browser. However, the expression matching should work just as it would in your Perl scripts.

/playlistTry it!

Match the top-most element of the document if (and only if) it is a <playlist> element.

//titleTry it!

Match every <title> element in the document.

//movie/titleTry it!

Match every <title> element that is the direct child of a <movie> element.

./title

Match every <title> element that is the direct child of the context element, e.g.:

foreach my $movie ($dom->findnodes('//movie')) {
    say 'Title: ', $movie->findvalue('./title');
}

//title/..Try it!

Match any element which is the parent of a <title> element.

/*Try it!

Match the top-most element of the document regardless of the element name.

//person/@roleTry it!

Match the attribute named role on every <person> element.

//person/@*Try it!

Match every attribute on every <person> element.

//person[@role]Try it!

Match every <person> element that has an attribute named role.

//*[@url]Try it!

Match every element that has an attribute named url.

//*[@*]Try it!

Match every element that has an attribute of any name.

/playlist//*[not(@*)]Try it!

Match every element that is a descendant of the top-level <playlist> element and which does not have any attributes.

//movie[@id="tt0307479"]Try it!

Match every <movie> element that has an attribute named id with the value tt0307479.

//movie[not(@id="tt0307479")]Try it!

Match every <movie> element that does not have an attribute named id with the value tt0307479 (including elements that do not have an id attribute at all).

//*[@id="tt0307479"]Try it!

Match every element that has an attribute named id with the value tt0307479.

//movie[@id="tt0307479"]//synopsisTry it!

Match every synopsis element within every <movie> element that has an attribute named id with the value tt0307479.

//person[position()=2]Try it!

Match the second <person> element in each sequence of adjacent <person> elements. Note that the first element in a sequence is at position 1 not 0.

//person[2]Try it!

This is simply a shorthand form of the position()=2 expression above.

//person[position()<3]Try it!

Match the first two <person> elements in each sequence of adjacent <person> elements.

//person[last()]Try it!

Match the last <person> element in each sequence of adjacent <person> elements.

//cast[count(person)=3]Try it!

Match every <cast> element which contains exactly 3 <person> elements.

//*[name()='genre']Try it!

Match every element with the name genre - exactly equivalent to //genre.

//*[starts-with(name(), 'running')]Try it!

Match every element with a name starting with the word running.

//person[contains(@name, 'Matt')]Try it!

Match every <person> element that has an attribute named name which contains the text Matt anywhere in the attribute value.

//person[contains(@name, 'matt')]Try it!

Same as above except for the casing of the text to match. Matching is case-sensitive.

//person[not(contains(@name, 'e'))]Try it!

Match every <person> element that has an attribute named name which does not contain the letter e anywhere in the attribute value.

//person[starts-with(@name, 'K')]Try it!

Match every <person> element that has an attribute named name with a value that starts with the letter K.

//director/text()Try it!

Match every text node which is a direct child of a <director> element.

//cast/text()Try it!

Match every text node which is a direct child of a <cast> element. You might imagine that this would not match anything, since in the sample document the <cast> elements contain only <person> elements. But if you look carefully, you’ll see that in between each <person> element there is some whitespace - a newline after the preceding element and then some spaces at the start of the next line. This whitespace is text and is therefore matched.

//person[contains(@name,'Matt')]/parent::*Try it!

Match the parent of every <person> element which contains Matt in the name attribute. (You could also use /.. for the parent). The syntax parent::* means any element on the parent axis.

//person[contains(@name,'Matt')]/ancestor::movieTry it!

Match every <movie> element which is an ancestor of a <person> element which contains Matt in the name attribute. The syntax ancestor::* means any element on the ancestor axis.

//genre[text()='drama']/following-sibling::*Try it!

Match every element of any name, which is a sibling of a <genre> element whose complete text content is drama and which follows that element in document order.

//genre[text()='drama']/following-sibling::genreTry it!

Match every <genre> element, which is a sibling of a <genre> element whose complete text content is drama and which follows that element in document order.

//genre[text()='drama']/preceding-sibling::genreTry it!

Match every <genre> element, which is a sibling of a <genre> element whose complete text content is drama and which comes before that element in document order.

//movie[@id="tt0112384"]/following::titleTry it!

Match every <title> element, which comes after a <movie> element with tt0112384 as the value of the id attribute. Note that ‘after’ means after the closing tag so a <title> element inside the matching <movie> would not be included.

//movie[.//score/text() < 7.5]Try it!

Match every <movie> element which contains a <score> element with text content numerically less than 7.5.

//movie[.//score/text() > 8.0]//synopsisTry it!

Match every <synopsis> element in every <movie> element which contains a <score> element with text content numerically greater than 8.0.

//director or //genreTry it!

Match every element which is a <director> or a <genre>.

//person[contains(@name, 'Bill') and contains(@role, 'Fred')]Try it!

Match every <person> element which contains Bill in the name attribute and contains Fred in the role attribute.

//person[@name='Kevin Bacon']/../person[@name!='Kevin Bacon']Try it!

Find every person who has played alongside Kevin Bacon. First find every <person> element with a name attribute equal to Kevin Bacon. Then find the parent of each matching element and look for its child <person> elements with a name attribute which is not equal to Kevin Bacon.

XPath Functions¶

Some of the examples above used XPath functions. It’s worth noting that the underlying libxml2 library only supports XPath version 1.0 and there are no plans to support 2.0.

XPath 1.0 does not include the lower-case() or upper-case() functions, so nasty workarounds like this are required if you need case-insensitive matching:

my $query = q{
    //person[
        contains(
            translate(
                @name,
                'ABCDEFGHIJKLMNOPQRSTUVWXZY',
                'abcdefghijklmnopqrstuvwxyz'
            ),
            'matt'
        )
    ]
};

foreach my $person ($dom->findnodes($query)) {
    say "Person: $person->{name}";
}

Alternatively, you can use the Perl API to register custom XPath functions.