Working with XML NamespacesΒΆ

Using the findnodes() method as described in the basic examples section doesn’t work when the XML document uses ‘namespaces’. This section describes the extra steps you need to take to work with namespaces in XML.

XML ‘namespaces’ allow you to build documents using elements from more than one vocabulary. For example one XML document might include both SVG elements to describe a drawing, as well as Dublin Core elements to define metadata about the drawing. The two different vocabularies are defined by separate bodies - the W3C and the DCMI respectively. Associating each element in your document with a namespace allows a processor to distinguish elements that use the same element names.

The scripts in this section will use the SVG document: xml-libxml.svg. Which starts like this:

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<svg
   xmlns="http://www.w3.org/2000/svg"
   xmlns:svg="http://www.w3.org/2000/svg"
   xmlns:dc="http://purl.org/dc/elements/1.1/"
   xmlns:cc="http://creativecommons.org/ns#"
   xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
   xmlns:xlink="http://www.w3.org/1999/xlink"
   xmlns:sodipodi="http://sodipodi.sourceforge.net/DTD/sodipodi-0.dtd"
   xmlns:inkscape="http://www.inkscape.org/namespaces/inkscape"
   width="1031.3961"
   height="278.02112"
   id="svg2"
   sodipodi:version="0.32"
   inkscape:version="0.48.4 r9939"
   sodipodi:docname="xml-libxml.svg"
   inkscape:output_extension="org.inkscape.output.svg.inkscape"
   version="1.0"
   inkscape:export-filename="/home/grant/Desktop/xml-libxml.png"
   inkscape:export-xdpi="79.860001"
   inkscape:export-ydpi="79.860001">

Because the top-level <svg> element uses xmlns="http://www.w3.org/2000/svg" to declare a default namespace , every other element will be in that namespace unless the element name includes a prefix for a different namespace, or unless an element declares a different default namespace for itself and its children.

The first child element in the document is a <title> element with no namespace prefix, so it is associated with the default namespace URI: http://www.w3.org/2000/svg.

  <title id="title5798">Example SVG File</title>

A later section of the document includes a <title> element with the dc: namespace prefix, so it is associated with the URI: http://purl.org/dc/elements/1.1/.

        <dc:title>XML::LibXML Logo</dc:title>

You can confirm using the XPath sandbox that the XPath expression //title does not match either of the <title> elements in the test document:

//titleTry it!

You can also use the following Perl code to confirm that findnodes() does not return any matches for the XPath expression //title:

my $match_count = $dom->findnodes('//title')->size;
say "XPath: //title  Matching node count: $match_count";

Output:

XPath: //title  Matching node count: 0

When an element in a document is associated with a namespace URI it will only match an XPath expression that includes a prefix that is also associated with the same namespace URI. However it’s important to stress that it’s not the prefix that is being matched, but the URI associated with the prefix.

Using the XPath sandbox, you can confirm that if we register the ‘Dublin Core’ namespace URI with the prefix dc, the XPath expression //dc:title will match the <title> element in the <metadata> section:

//dc:titleTry it!

However if we register the same URI with the prefix dublin instead then we can match the same element using the dublin prefix in our XPath:

//dublin:titleTry it!

In order to associate namespace prefixes in XPath expressions with namespace URIs, we need to use an XML::LibXML::XPathContext object. This is a multi-step process:

  1. create an XPathContext object associated with the document you want to search
  2. register a prefix and associated URI for each namespace you want to include in your query
  3. call the findnodes() method on the XPathContext object rather than directly on the DOM object
use XML::LibXML;
use XML::LibXML::XPathContext;

my $filename = 'xml-libxml.svg';
my $dom = XML::LibXML->load_xml(location => $filename);

my $xpc = XML::LibXML::XPathContext->new($dom);
$xpc->registerNs('vg',  'http://www.w3.org/2000/svg');
$xpc->registerNs('dub', 'http://purl.org/dc/elements/1.1/');

my($match1) = $xpc->findnodes('//vg:title');
say 'XPath: //vg:title   Matched: ', $match1;

my($match2) = $xpc->findnodes('//dub:title');
say 'XPath: //dub:title  Matched: ', $match2;

Output:

XPath: //vg:title   Matched: <title id="title5798">Example SVG File</title>
XPath: //dub:title  Matched: <dc:title>XML::LibXML Logo</dc:title>

You’ll recall from earlier examples that you can search within a node by calling findnodes() on the element node (rather than the document) and using an XPath expression like ./child where the dot refers to the context node. However when you’re dealing with namespaces that won’t work, because you need to call findnodes() on the XPathContext object. The solution is to pass findnodes() a second argument, after the XPath expression. The additional argument is the element to use as a context node:

use XML::LibXML;
use XML::LibXML::XPathContext;

my $filename = 'xml-libxml.svg';
my $dom = XML::LibXML->load_xml(location => $filename, no_blanks => 1);

my $xpc = XML::LibXML::XPathContext->new($dom);
$xpc->registerNs('svg', 'http://www.w3.org/2000/svg');
$xpc->registerNs('dc',  'http://purl.org/dc/elements/1.1/');

my($metadata) = $xpc->findnodes('//svg:metadata') or die "No metadata";

foreach my $el ($xpc->findnodes('.//dc:*', $metadata)) {
    my $name  = $el->localname;
    my $value = $el->to_literal or next;
    say "$name=$value";
}

Output:

format=image/svg+xml
title=XML::LibXML Logo
creator=Grant McLean
date=2016-03-26
subject=perlxmllibxml
description=An SVG file created as an example for parsing XML with namespaces.

One small feature of that script which is worth noting is the use of $el->localname to get the name of the element without the namespace prefix. The more commonly used $el->nodeName method does include the namespace prefix as it appears in the document.