Working with XML NamespacesΒΆ
Using the findnodes()
method as described in the
basic examples section doesn’t work when the XML document
uses ‘namespaces’. This section describes the extra steps you need to take
to work with namespaces in XML.
XML ‘namespaces’ allow you to build documents using elements from more than one vocabulary. For example one XML document might include both SVG elements to describe a drawing, as well as Dublin Core elements to define metadata about the drawing. The two different vocabularies are defined by separate bodies - the W3C and the DCMI respectively. Associating each element in your document with a namespace allows a processor to distinguish elements that use the same element names.
The scripts in this section will use the SVG document:
xml-libxml.svg
. Which starts like this:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<svg
xmlns="http://www.w3.org/2000/svg"
xmlns:svg="http://www.w3.org/2000/svg"
xmlns:dc="http://purl.org/dc/elements/1.1/"
xmlns:cc="http://creativecommons.org/ns#"
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:xlink="http://www.w3.org/1999/xlink"
xmlns:sodipodi="http://sodipodi.sourceforge.net/DTD/sodipodi-0.dtd"
xmlns:inkscape="http://www.inkscape.org/namespaces/inkscape"
width="1031.3961"
height="278.02112"
id="svg2"
sodipodi:version="0.32"
inkscape:version="0.48.4 r9939"
sodipodi:docname="xml-libxml.svg"
inkscape:output_extension="org.inkscape.output.svg.inkscape"
version="1.0"
inkscape:export-filename="/home/grant/Desktop/xml-libxml.png"
inkscape:export-xdpi="79.860001"
inkscape:export-ydpi="79.860001">
Because the top-level <svg>
element uses
xmlns="http://www.w3.org/2000/svg"
to declare a default namespace ,
every other element will be in that namespace unless the element name includes
a prefix for a different namespace, or unless an element declares a different
default namespace for itself and its children.
The first child element in the document is a <title>
element with no
namespace prefix, so it is associated with the default namespace URI:
http://www.w3.org/2000/svg
.
<title id="title5798">Example SVG File</title>
A later section of the document includes a <title>
element with the dc:
namespace prefix, so it is associated with the URI:
http://purl.org/dc/elements/1.1/
.
<dc:title>XML::LibXML Logo</dc:title>
You can confirm using the XPath sandbox that the XPath expression //title
does not match either of the <title>
elements in the test document:
//titleTry it!
You can also use the following Perl code to confirm that findnodes()
does
not return any matches for the XPath expression //title
:
my $match_count = $dom->findnodes('//title')->size;
say "XPath: //title Matching node count: $match_count";
Output:
XPath: //title Matching node count: 0
When an element in a document is associated with a namespace URI it will only match an XPath expression that includes a prefix that is also associated with the same namespace URI. However it’s important to stress that it’s not the prefix that is being matched, but the URI associated with the prefix.
Using the XPath sandbox, you can confirm that if we register the ‘Dublin Core’
namespace URI with the prefix dc
, the XPath expression //dc:title
will
match the <title>
element in the <metadata>
section:
//dc:titleTry it!
However if we register the same URI with the prefix dublin
instead then
we can match the same element using the dublin
prefix in our XPath:
//dublin:titleTry it!
In order to associate namespace prefixes in XPath expressions with namespace URIs, we need to use an XML::LibXML::XPathContext object. This is a multi-step process:
- create an XPathContext object associated with the document you want to search
- register a prefix and associated URI for each namespace you want to include in your query
- call the
findnodes()
method on the XPathContext object rather than directly on the DOM object
use XML::LibXML;
use XML::LibXML::XPathContext;
my $filename = 'xml-libxml.svg';
my $dom = XML::LibXML->load_xml(location => $filename);
my $xpc = XML::LibXML::XPathContext->new($dom);
$xpc->registerNs('vg', 'http://www.w3.org/2000/svg');
$xpc->registerNs('dub', 'http://purl.org/dc/elements/1.1/');
my($match1) = $xpc->findnodes('//vg:title');
say 'XPath: //vg:title Matched: ', $match1;
my($match2) = $xpc->findnodes('//dub:title');
say 'XPath: //dub:title Matched: ', $match2;
Output:
XPath: //vg:title Matched: <title id="title5798">Example SVG File</title>
XPath: //dub:title Matched: <dc:title>XML::LibXML Logo</dc:title>
You’ll recall from earlier examples that you can search within a node by
calling findnodes()
on the element node (rather than the document) and
using an XPath expression like ./child
where the dot refers to the
context node. However when you’re dealing with namespaces that won’t work,
because you need to call findnodes()
on the XPathContext object. The
solution is to pass findnodes()
a second argument, after the XPath
expression. The additional argument is the element to use as a context node:
use XML::LibXML;
use XML::LibXML::XPathContext;
my $filename = 'xml-libxml.svg';
my $dom = XML::LibXML->load_xml(location => $filename, no_blanks => 1);
my $xpc = XML::LibXML::XPathContext->new($dom);
$xpc->registerNs('svg', 'http://www.w3.org/2000/svg');
$xpc->registerNs('dc', 'http://purl.org/dc/elements/1.1/');
my($metadata) = $xpc->findnodes('//svg:metadata') or die "No metadata";
foreach my $el ($xpc->findnodes('.//dc:*', $metadata)) {
my $name = $el->localname;
my $value = $el->to_literal or next;
say "$name=$value";
}
Output:
format=image/svg+xml
title=XML::LibXML Logo
creator=Grant McLean
date=2016-03-26
subject=perlxmllibxml
description=An SVG file created as an example for parsing XML with namespaces.
One small feature of that script which is worth noting is the use of
$el->localname
to get the name of the element without the namespace
prefix. The more commonly used $el->nodeName
method does include the
namespace prefix as it appears in the document.