Xerces::DOM - A Perl module for parsing XML documents with the W3C DOM API.
# Here is a simple example script to count the nodes in an # XML document. The document file name is passed on the # command line.
use Xerces::DOM;
my $parser = new DOM::Parser(); $parser->parse($ARGV[0]); my $document = $parser->getDocument(); my $element_count = $document->getElementsByTagName("*")->getLength(); print "$file: ($element_count elems)\n";
This module provides ways to read in and parse XML files using the W3C standard DOM APIs. It is built on the industry standard Apache Xerces C++ Parser (formerly IBM XML4C). It supports validation for well-formedness and optionally for correctness against a corresponding DTD file.
DOM objects are exposed as Perl5 objects with APIs modeled after the W3C standard Java language binding (the only official language binding at the time of developing this). The Parser has full Unicode support and can read various standard encodings. Strings are represented internally as UTF16 and are accessed through Perl APIs as UTF8 strings if you are using this module with Perl5 experimental build 5.00560 with Unicode support. Use the UTF8 pragma to enable UTF8 strings in your scripts, e.g.
use utf8;
DOM::Node
None
Returns a string.
Returns true (1) or false (0).
Returns a string.
DOM::Text
None
None
DOM::Node
None
Returns a string.
Returns an integer.
Returns a string. May throw a DOM exception.
May throw a DOM exception.
May throw a DOM exception.
May throw a DOM exception.
May throw a DOM exception.
May throw a DOM exception.
DOM::CharacterData
None
None
DOM::Node
Returns a new DOM::Document.
Returns a new DOM::Entity.
Returns a new DOM::Element.
May throw an exception.
Returns a new DOM::DocumentFragment.
Returns a new DOM::Text.
Returns a new DOM::Comment.
Returns a new DOM::CDATASection.
Returns a new DOM::DocumentType.
Returns a new DOM::Notation.
Returns a new DOM::ProcessingInstruction.
May throw a DOM exception.
Returns a new DOM::Attr.
May throw a DOM exception.
Returns a new DOM::EntityRefernce.
May throw a DOM exception.
Returns a DOM::DocumentType.
Returns a DOM::DOMImplementation.
Returns a DOM::Element.
Returns a DOM::NodeList.
Returns a DOM::Node.
DOM::Node
None
None
DOM::Node
None
Returns a string.
Returns a DOM::NamedNodeMap.
Returns a DOM::NamedNodeMap.
None
Returns a DOM::DOMImplementation.
Returns true (1) or false (0).
DOM::Node
None
Returns a string.
Returns a string.
Returns a DOM::Attr.
Returns a DOM::NodeList.
May throw a DOM exception.
Returns a DOM::Attr.
May throw a DOM exception.
Returns a DOM::Attr.
May throw a DOM exception.
May throw a DOM exception.
DOM::Node
None
Returns a string.
Returns a string.
Returns a string.
DOM::Node
None
None
None
None
Returns a DOM::Node.
May throw a DOM exception.
Returns an integer.
Returns a DOM::Node.
Returns an integer.
Removes a DOM::Node.
May throw a DOM exception.
None
None
Returns a string.
Returns a string.
May throw a DOM exception.
Returns an integer.
Returns a DOM::Node.
Returns a DOM::NodeList.
Returns a DOM::Node.
Returns a DOM::Node.
Returns a DOM::Node.
Returns a DOM::Node.
Returns a DOM::NamedNodeMap.
Returns a DOM::Document.
Returns a DOM::Node.
Returns a DOM::Node.
May throw a DOM exception.
Returns a DOM::Node.
May throw a DOM exception.
Returns a DOM::Node.
May throw a DOM exception.
Returns a DOM::Node.
May throw a DOM exception.
Returns true (1) or false (0).
Returns true (1) or false (0).
May throw a DOM exception.
None
None
Returns a DOM::Node.
Returns an integer.
DOM::Node
None
Returns a string.
Returns a string.
DOM::Node
None
Returns a string.
Returns a string.
May throw a DOM exception.
DOM::CharacterData
None
Returns a DOM::Text.
May throw a DOM exception.
Xerces DOM methods never return NULL object references, i.e. you never have
to call defined()
to check them. However, in certain error
situations the objects returned represent NULL objects. You can test for
this condition by calling the DOM::Node->isNull() method.
Exceptions are caught from the underlying C++ parser and are rethrown using Croak calls (see Carp). All exceptions are thrown as strings messages which include the following standard DOM identifiers.
Exception strings also include the DOM class and method from which the exception was thrown, and the line number of your calling script, e.g.
INDEX_SIZE_ERR in DOM::Text::splitText at YourScript.pl line 11
You can let exceptions float to the top as error messages or you can catch and handle exceptions as follows:
eval { # your code goes here };
if ( $@ ) { # exception caught # exception message in $@ as a string }
You can get complete stack crawls in exception messages by invoking the Perl interpreter with the ``-MCarp=verbose'' option, e.g.
perl -MCarp=verbose script.pl
The Xerces Perl DOM is built on the very powerful Unicode-enabled Xerces C++ XML Parser. The C++ parser can read various Unicode encodings and it represents strings internally in UTF16. But you pass strings in and out of methods as standard Perl strings. If you plan to take advantage of Unicode, make sure to do two things:
When you do these things, all strings that you pass in and out will be UTF8 strings and the Xerces Perl DOM will automatically convert these to UTF16 internally.
You create DOM::Parser and DOM::Document objects using their respective static new methods. All other DOM objects are created by calling the corresponding static factory methods on DOM::Document, or are returned from various other methods on DOM objects. This is standard DOM stuff, but I thought I'd mention it so you don't pull your hair out looking for new methods on the other objects.
Basically, you need not worry about this. All memory management of underlying C++ and Perl memory is managed for you. When you no longer have any references to a Xerces DOM object, its destructor is called automatically and it cleans up all associated memory.
Tom Watson <rtwatson@us.ibm.com> wrote version 1.0 and submitted to the XML Apache project <http://xml.apache.org>, where you can contribute to future versions and where the corresponding C++ and Java compilers are also developed as OpenSource projects.