NAME

Xerces::DOM - A Perl module for parsing XML documents with the W3C DOM API.


SYNOPSIS

        # Here is a simple example script to count the nodes in an 
        # XML document.  The document file name is passed on the 
        # command line.

        use Xerces::DOM;

        my $parser = new DOM::Parser();
        $parser->parse($ARGV[0]);
        my $document = $parser->getDocument();
        my $element_count = $document->getElementsByTagName("*")->getLength();
        print "$file: ($element_count elems)\n";


DESCRIPTION

This module provides ways to read in and parse XML files using the W3C standard DOM APIs. It is built on the industry standard Apache Xerces C++ Parser (formerly IBM XML4C). It supports validation for well-formedness and optionally for correctness against a corresponding DTD file.

DOM objects are exposed as Perl5 objects with APIs modeled after the W3C standard Java language binding (the only official language binding at the time of developing this). The Parser has full Unicode support and can read various standard encodings. Strings are represented internally as UTF16 and are accessed through Perl APIs as UTF8 strings if you are using this module with Perl5 experimental build 5.00560 with Unicode support. Use the UTF8 pragma to enable UTF8 strings in your scripts, e.g.

        use utf8;


CLASSES


DOM::Attr

Base Class:

DOM::Node

Class Methods:

None

Instance Methods:
getName ()

Returns a string.

getSpecified ()

Returns true (1) or false (0).

getValue ()

Returns a string.

setValue (string value)
setSpecified (int specified)


DOM::CDATASection

Base Class:

DOM::Text

Class Methods:

None

Instance Methods:

None


DOM::CharacterData

Base Class:

DOM::Node

Class Methods:

None

Instance Methods:
getData ()

Returns a string.

getLength ()

Returns an integer.

substringData (int offset, int count)

Returns a string. May throw a DOM exception.

appendData (string data)

May throw a DOM exception.

insertData (int offset, string data)

May throw a DOM exception.

deleteData (int offset, int count)

May throw a DOM exception.

replaceData (int offset, int count, string arg)

May throw a DOM exception.

setData (string data)

May throw a DOM exception.


DOM::Comment

Base Class:

DOM::CharacterData

Class Methods:

None

Instance Methods:

None


DOM::Document

Base Class:

DOM::Node

Class Methods:
createDocument ();

Returns a new DOM::Document.

Instance Methods:
createEntity (string name);

Returns a new DOM::Entity.

createElement (string tag_name);

Returns a new DOM::Element.

May throw an exception.

createDocumentFragment ();

Returns a new DOM::DocumentFragment.

createTextNode (string data);

Returns a new DOM::Text.

createComment (string data);

Returns a new DOM::Comment.

createCDATASection (string data)

Returns a new DOM::CDATASection.

createDocumentType (string name)

Returns a new DOM::DocumentType.

createNotation (string name)

Returns a new DOM::Notation.

createProcessingInstruction (string target, string data)

Returns a new DOM::ProcessingInstruction.

May throw a DOM exception.

createAttribute (string name)

Returns a new DOM::Attr.

May throw a DOM exception.

createEntityReference (string name)

Returns a new DOM::EntityRefernce.

May throw a DOM exception.

getDoctype ()

Returns a DOM::DocumentType.

getImplementation ()

Returns a DOM::DOMImplementation.

getDocumentElement()

Returns a DOM::Element.

getElementsByTagName(string tag_name)

Returns a DOM::NodeList.

importNode (DOM::Node source, bool deep)

Returns a DOM::Node.


DOM::DocumentFragment

Base Class:

DOM::Node

Class Methods:

None

Instance Methods:

None


DOM::DocumentType

Base Class:

DOM::Node

Class Methods:

None

Instance Methods:
getName ()

Returns a string.

getEntities ()

Returns a DOM::NamedNodeMap.

getNotations ()

Returns a DOM::NamedNodeMap.


DOM::DOMImplementation

Base Class:

None

Class Methods:
getImplementation ();

Returns a DOM::DOMImplementation.

Instance Methods:
hasFeature (string feature, string version)

Returns true (1) or false (0).


DOM::Element

Base Class:

DOM::Node

Class Methods:

None

Instance Methods:
getTagName ()

Returns a string.

getAttribute (string name)

Returns a string.

getAttributeNode (string name)

Returns a DOM::Attr.

getElementsByTagName (string name)

Returns a DOM::NodeList.

setAttribute (string name, strnig value)

May throw a DOM exception.

setAttributeNode (DOM::Attr newAttr)

Returns a DOM::Attr.

May throw a DOM exception.

removeAttributeNode (DOM::Attr oldAttr)

Returns a DOM::Attr.

May throw a DOM exception.

normalize ()
removeAttribute (string name)

May throw a DOM exception.


DOM::Entity

Base Class:

DOM::Node

Class Methods:

None

Instance Methods:
getPublicId ()

Returns a string.

getSystemId ()

Returns a string.

getNotationName ()

Returns a string.

setNotationName (string name)
setPublicId (string id)
setSystemId (string id)


DOM::EntityReference

Base Class:

DOM::Node

Class Methods:

None

Instance Methods:

None


DOM::NamedNodeMap

Base Class:

None

Class Methods:

None

Instance Methods:
setNamedItem (DOM::Node arg)

Returns a DOM::Node.

May throw a DOM exception.

item (int index)

Returns an integer.

getNamedItem (string name)

Returns a DOM::Node.

getLength ()

Returns an integer.

removeNamedItem (string name)

Removes a DOM::Node.

May throw a DOM exception.


DOM::Node

Base Class:

None

Class Constants:
ELEMENT_NODE = 1
ATTRIBUTE_NODE = 2
TEXT_NODE = 3
CDATA_SECTION_NODE = 4
ENTITY_REFERENCE_NODE = 5
ENTITY_NODE = 6
PROCESSING_INSTRUCTION_NODE = 7
COMMENT_NODE = 8
DOCUMENT_NODE = 9
DOCUMENT_TYPE_NODE = 10
DOCUMENT_FRAGMENT_NODE = 11
NOTATION_NODE = 12
Class Methods:

None

Instance Methods:
getNodeName ()

Returns a string.

getNodeValue ()

Returns a string.

May throw a DOM exception.

getNodeType ()

Returns an integer.

getParentNode ()

Returns a DOM::Node.

getChildNodes ()

Returns a DOM::NodeList.

getFirstChild ()

Returns a DOM::Node.

getLastChild ()

Returns a DOM::Node.

getPreviousSibling ()

Returns a DOM::Node.

getNextSibling ()

Returns a DOM::Node.

getAttributes ()

Returns a DOM::NamedNodeMap.

getOwnerDocument ()

Returns a DOM::Document.

cloneNode (boolean deep)

Returns a DOM::Node.

insertBefore (DOM::Node new_child, DOM::Node ref_child)

Returns a DOM::Node.

May throw a DOM exception.

replaceChild (DOM::Node new_child, DOM::Node old_child)

Returns a DOM::Node.

May throw a DOM exception.

removeChild (DOM::Node old_child)

Returns a DOM::Node.

May throw a DOM exception.

appendChild (DOM::Node new_child)

Returns a DOM::Node.

May throw a DOM exception.

hasChildNodes ()

Returns true (1) or false (0).

boolean isNull ()

Returns true (1) or false (0).

setNodeValue (string node_value)

May throw a DOM exception.


DOM::NodeList

Base Class:

None

Class Methods:

None

Instance Methods:
item (int index)

Returns a DOM::Node.

getLength ()

Returns an integer.


DOM::Notation

Base Class:

DOM::Node

Class Methods:

None

Instance Methods:
getPublicId ()

Returns a string.

getSystemId ()

Returns a string.

setPublicId (string id)
setSystemId (string id)
Base Class:

DOM::Node

Class Methods:

None

Instance Methods:
getTarget ()

Returns a string.

getData ()

Returns a string.

setData (string data)

May throw a DOM exception.


DOM::Text

Base Class:

DOM::CharacterData

Class Methods:

None

Instance Methods:
splitText (int offset)

Returns a DOM::Text.

May throw a DOM exception.


OTHER DETAILS


Null References

Xerces DOM methods never return NULL object references, i.e. you never have to call defined() to check them. However, in certain error situations the objects returned represent NULL objects. You can test for this condition by calling the DOM::Node->isNull() method.


Exception Handling

Exceptions are caught from the underlying C++ parser and are rethrown using Croak calls (see Carp). All exceptions are thrown as strings messages which include the following standard DOM identifiers.

"INDEX_SIZE_ERR"
"DOMSTRING_SIZE_ERR"
"HIERARCHY_REQUEST_ERR"
"WRONG_DOCUMENT_ERR"
"INVALID_CHARACTER_ERR"
"NO_DATA_ALLOWED_ERR"
"NO_MODIFICATION_ALLOWED_ERR"
"NOT_FOUND_ERR"
"NOT_SUPPORTED_ERR"
"INUSE_ATTRIBUTE_ERR"
"Unknown exception"

Exception strings also include the DOM class and method from which the exception was thrown, and the line number of your calling script, e.g.

        INDEX_SIZE_ERR in DOM::Text::splitText
                at YourScript.pl line 11

You can let exceptions float to the top as error messages or you can catch and handle exceptions as follows:

        eval {
                # your code goes here
        };

        if ( $@ ) {
                # exception caught
                # exception message in $@ as a string
        }

You can get complete stack crawls in exception messages by invoking the Perl interpreter with the ``-MCarp=verbose'' option, e.g.

        perl -MCarp=verbose script.pl


Unicode Strings

The Xerces Perl DOM is built on the very powerful Unicode-enabled Xerces C++ XML Parser. The C++ parser can read various Unicode encodings and it represents strings internally in UTF16. But you pass strings in and out of methods as standard Perl strings. If you plan to take advantage of Unicode, make sure to do two things:

  1. Use the experimental Perl5 build 5.00560 with Unicode support.
  2. Use the "Use UTF8" pragma in your scripts.

When you do these things, all strings that you pass in and out will be UTF8 strings and the Xerces Perl DOM will automatically convert these to UTF16 internally.


Object Creation

You create DOM::Parser and DOM::Document objects using their respective static new methods. All other DOM objects are created by calling the corresponding static factory methods on DOM::Document, or are returned from various other methods on DOM objects. This is standard DOM stuff, but I thought I'd mention it so you don't pull your hair out looking for new methods on the other objects.


Object Destruction

Basically, you need not worry about this. All memory management of underlying C++ and Perl memory is managed for you. When you no longer have any references to a Xerces DOM object, its destructor is called automatically and it cleans up all associated memory.


AUTHORS

Tom Watson <rtwatson@us.ibm.com> wrote version 1.0 and submitted to the XML Apache project <http://xml.apache.org>, where you can contribute to future versions and where the corresponding C++ and Java compilers are also developed as OpenSource projects.