|
|
Apache Cocoon has the capability to utilise an entity resolution mechanism.
External entities (e.g. Document Type Definitions (DTDs), character entity
sets, XML sub-documents) are resources that are declared by an XML instance
document - they exist as separate objects. An entity catalog assists with
entity management and the resolution of entities to accessible resources.
It also reduces the necessity for expensive and failure-prone network
retrieval of the required resources.
|
"Entities" represent the physical structure of an XML instance document,
whereas "elements" represent the logical structure. The complete entity
structure of the document defines which pieces need to be incorporated, so
as to build the final document. Those entities are objects from some
accessible place, e.g. local file system, local network, remote network,
generated from a database. Example entities are: DTDs, XML sub-documents,
sets of character entities to represent symbols and other glyphs, image
files.
So how are you going to define the accessible location of all those pieces?
How will you ensure that those resources are reliably available? Entity
resolution catalogs to the rescue. These are simple standards-based
plain-text files to map public identifiers and system identifiers to local
or other resources.
Do you wonder why we cannot use the sitemap to resolve these resources?
This is because the resolution of all entities that compose the XML
document is under the direct control of the guts of the parser and the XML
structure. The parser has no choice - it must incorporate all of the defined pieces. If it cannot retrieve them, then it is broken and reports an error.
With the powerful catalog support there are no such problems. This document
provides the following sections to explain Cocoon capability for
resolving entities ...
|
The following article eloquently describes the need for all parsers and
XML frameworks to be capable of utilising entity resolvers.
"If You Can Name It, You Can Claim It!"
by Norman Walsh. Please read that document, then return here to apply
entity catalogs to Cocoon.
(Note: That article (and Java classes) evolved to become the Sun
resolver.zip Java package that has been added to Cocoon
- a more recent version of the article is available with the Sun download
(see below). The API javadocs for resolver have further information.
However, you do not need to know the gory details to understand catalogs
and configure them.)
|
A default catalog and some base entities (e.g. ISO*.pen character
entity sets) are included in the Cocoon distribution at
webapps/cocoon/resources/entities/
- the default catalog is automatically loaded when Cocoon starts.
If you suspect problems, then you can raise the level of the
verbosity property (to 2 or 3) and watch the messages going
to standard output when Cocoon starts and operates. You would also do
this to detect any misconfiguration of your own catalogs.
|
The SAX Parser interface provides an entityResolver
hook to allow an application to resolve the external entities. The Sun
Microsystems Java code "com.sun.resolver " provides a
Catalog Resolver. This is incorporated into Cocoon via
org.apache.cocoon.components.resolver
Default configuration is achieved via
org.apache.cocoon.components.resolver.ResolverImpl.java
which initialises the catalog resolver and loads a default system catalog.
The ResolverImpl.java enables local
configuration by applying properties from the
CatalogManager.properties file and then further configuration
from cocoon.xconf parameters.
|
Assistance is required with the following outstanding development issues ...
- 5) ? What other default entities need to be shipped with the Cocoon
distribution? We already have some character entity sets (ISO*.pen).
Probably also need the documentation DTDs.
- 7)
Some core Cocoon FIXME notes can be now be addressed by catalog ...
- the first FIXME note in document-1.0.dtd re how to include
entities without hardwiring
- there are various other hard-coded pathnames to XML resources
- this needs further investigation after basic catalog support is
fully settled
|
- OASIS Catalogs (TR 9401:1995 Entity Management) are plain-text files
with a simple delimited format. There is also a new standard being
developed for XML Catalogs, using an xml-based structured plain-text file
(gee :-). Links to both standards are provided below. Both catalog formats
can be currently used with this entity resolver. However, the latter
standard is not yet settled. OASIS TR9401 catalogs will suffice.
- There has been a recent flood of XML tools - unfortunately, many do not
implement entity resolution (other than by brute-force retrieval), so
those tools are crippled and cannot be used for serious XML processing.
Please ensure that you choose
proper XML tools
for the preparation and validation of your XML instance documents.
- The default catalog that is shipped with the Cocoon distribution is
deliberately basic. You will need to supplement it with your own catalog
devised to suit your particular needs.
|
Most XML documents that we would want to serve with Cocoon are already
in existence in another information system. The XML document instances have
a declaration of their DTD Document Type Definition as an external file.
This external DTD also includes entity sets such as ISOnum, ISOlat1, etc.
Also the DTD declaration has a Formal Public Identifier and a System
Identifier which points to a remote URL. These XML instance documents cannot
be altered to make workaround solutions like
../dtd/document-1.0.dtd
Entity management is effected by providing a standards-based mechanism to
resolve public identifiers and system identifiers to local filenames or
other identifiers or even to other remote network resources. So references
to external DTDs, sets of character entities such as mathematical symbols,
fragments of XML documents, complete sub-documents, non-xml data chunks
(like images), etc. can all be centrally managed and resolved locally.
|
Here are some links to documents which extol entity management:
|
|
|