|
|
This document presents Apache Cocoon's dynamic markup language
framework and its use in implementing XSP:
|
 |  |  |
 | XSP and Cocoon Generators |  |
 |  |  |
 |  |  |
 | The Programming Language Processor |  |
 |  |  |
A Cocoon's
ProgrammingLanguage
processor exposes the
following methods:
-
load .
Load a program from a file in a given directory,
compiling it, if necessary, using a given encoding.
-
instantiate
Create a new instance of a previously loaded program
-
unload
Discard a previously loaded program performing any
necessary cleanup
-
getSourceExtension
Return the canonical source file extension used by
this programming language
-
getCodeFormatter
Return an (optional) instance of
CodeFormatter
used to beautify source code written in this programming language
-
quoteString
Escape a string constant according to the programming language rules
A default implementation (
AbstractProgrammingLanguage
) is
provided that extends
AbstractNamedComponent
and retrieves language-related sitemap parameters.
load and unload are passed a file/directory
pair used to locate the program.
The baseDirectory should be an absolute pathname
pointing to the top-level directory (also known as repository)
containing the program file.
The filename is a path, relative to the
baseDirectory , pointing to the program file.
Source program filenames are built by concatenating the repository's
baseDirectory name, the given filename ,
the dot extension separator and the language-specific source or
object extensions. The cross-platform
File.separator is used to ensure portability.
 |
The filename must not contain any
source or object extension. It may, though, contain subdirectories
depending on its position within the repository tree. Also,
programming languages must define a source extension
even when their actual compilers/interpreters do not enforce this. This
is also true of object extensions for compiled languages.
Furthermore, the dot character is always used as the
extension separator.
|
Finally, the (optional) encoding argument specifies the
how the source program file contents are encoded. This argument can be
null to specify the platform's default encoding.
|
Currently, programs returned by the load operation are
"plain" Java Object 's and are not required to implement
any interface or to extend any particular class.
 |
This may change in the future so that the loaded program may be
required to provide dependency information (for automatic reloading)
as well as source code information (for debugging purposes).
|
Compiled programs attempt to locate the object program first.
If found, it's loaded in a language-specific way and then returned to
the calling environment.
Failing that, the source file is located and the language-specific
compiler is invoked prior to actual
program loading.
Of course, it is an error for the source program file not to exist as
a readable, regular operating system file.
|
When a previously loaded program is no longer needed (or becomes
"outdated" as explained below) the language processor may need to
perform cleanup actions, such as releasing memory or (in the case
of Java-like compiled languages)
reinstantiating the class loader.
Loaded programs may become outdated as a consequence of events external
to the programming language processor. In a server pages environment,
this is the result of the source XML document (or any of the files
it depends on) having changed on disk.
The base class
AbstractProgrammingLanguage
implements
this method as final to delete the unloaded
source program file and delegate actual unloading to
method doUnload .
Method doUnload is not defined as
abstract in order to relieve interpreted subclasses
from having to implement an empty method when no cleanup is
required.
 |
Currently, only the program object is being passed
to unload . It may be possible for some interpreted
languages to also require knowing what file the program was originally
loaded from. In this case, instantiation should take place through
the program object itself, rather than through the language processor
(see Program Instantiation below)
|
|
The program object returned by load must
act as an factory capable of creating program instance
objects on demand.
Currently, instantiation is performed by the language processor
given a previously loaded program .
Compiled programs use a language-specified
class loader to create
a new program instance.
 |
For compiled languages, it is possible to guarantee that a
generated program implements a given interface or extends a
given class. For interpreted languages, though, it may be
necessary to pass an additional prototype object
to load as to ensure that created instances conform
to a given Java type expected behavior.
|
|
All languages are required to return a source extension.
This extension is used to locate source files for subsequent
interpretation or compilation.
|
Method quoteString applies the programming language string
constant escaping rules to its input argument.
This method exists to assist markup language code generators in
escaping Text XML nodes.
|
|
Compiled languages extend the ProgrammingLanguage
abstraction by introducing the notions of compilation
and object extension.
A base implementation
(CompiledProgrammingLanguage )
is provided that adds the following protected variables and
abstract/overridable methods:
- Variable
compilerClass . Used to create instances
of the language's
compiler.
- Variable
deleteSources . Used to state whether
intermediate source files should be deleted after successful
compilation
- Method
getObjectExtension . Used to build object
filenames
- Method
loadProgram . Used to perform actual program
load after source and (possibly) object files have been located
- Method
doUnload . Used to perform cleanup after
program unloading
 |
Object files are not required to be Java class files.
It's up the the compiled programming language processor to handle
object files.
|
Compiled programming languages must specify their preferred compiler
as a sitemap parameter:
 |  |  |
 |
<component-instance name="java"
class="org.apache.cocoon.components.language.programming.java.JavaLanguage">
. . .
<parameter name="compiler"
value="org.apache.cocoon.components.language.programming.java.Jikes"/>
. . .
</component-instance>
|  |
 |  |  |
All compiled languages are required to return a source extension.
This extension is used to locate object files for subsequent loading.
|
Concrete compiled programming languages must implement the abstract
method loadProgram to actually load an object
program resulting from compilation.
|
Compilation is delegated to a sitemap-specified
LanguageCompiler instance, as explained below.
|
Interface
LanguageCompiler
defines the
initialization and behavior for all compilers.
Methods exposed by this interface are:
-
setFile . Used to specify the source file to
be compiled. This should be an absolute filename
-
setSource . Used to specify the directory where
dependent source files (if any) are stored
-
setDestination . Used to specify the directory where
the generated object files should be placed
-
setClasspath . Used to specify the class loading
path used by the compiler. While this option is named after
Java's classpath system variable, its semantics are
language-independent
-
setEncoding . Used to specify the encoding used
by the input source file
-
compile . The compiler's workhorse (boolean)
-
getErrors . Used to retrieve a list of compilation
error messages should compilation fail
Error message producer by the compiler must be collected and
massaged by the LanguageCompiler in order to
wrap each of them as a CompilerError instance.
Class
CompilerError
exposes the following
methods:
-
getFile . Returns the program filename originating
the error
-
isError . Asserts whether the error is a server
error or simply a warning
-
getStartLine . Returns the starting line of the
offending code
-
getStartColumn . Returns the starting column (within
the starting line) of the offending code
-
getEndLine . Returns the ending line of the
offending code
-
getEndColumn . Returns the ending column (within
the ending line) of the offending code
-
getMessage . Returns the actual error message text
|
For the Java language, 2 pluggable compilers are available:
-
Javac. A wrapper to Sun's builtin compiler
-
Jikes. A wrapper to IBM's Jikes compiler
Both of these compilers are based on
AbstractJavaCompiler .
|
Since
Rhino
Javascript provides its own, only compiler (jsc),
class JavascriptLanguage doesn't use the compiler
class initialized by CompiledProgrammingLanguage .
|
|
CompiledProgrammingLanguage extends the default
implementation provided by
AbstractProgrammingLanguage
by deleting the object program file and
delegating actual unloading to the
doUnload method.
Method doUnload provides an empty default implementation
that can be overridden by derived compiled languages should unloading
cleanup be actually required.
For Java-based compiled languages (i.e., those using
class files as their object format, unloading implies
reinstantiating their
class loader
such that it "forgets" about previously loaded classes thus
becoming able to refresh class files updates since their last
load.
This is a commonly-used workaround for the (somewhat buggy)
standard Java class loader, which doesn't provide for an
explicit method for reloading class files.
|
|
Interpreted languages for which a Java-based interpreter exists
are supported by means of IBM's outstanding
Bean Scripting Framework
(BSF).
Currently, BSF supports:
- Mozilla Rhino
- NetRexx
- Jacl
- JPython
- VBScript (Win32 only)
- JScript (Win32 only)
- PerlScript (Win32 only)
- BML (Not applicable to server pages)
- LotusXSL (Not applicable to server pages)
 |
Interpreted language support is still unimplemented!
While BSF is extremely easy to use and very stable, there's still
a challenge in writing code-generation logicsheets for each of this
languages; this task requires familiarity with XSP internals, XSLT
and, above all, the programming language at hand...
|
 |
Despite being supported by BSF, Rhino Javascript is separately
supported by Cocoon as a compiled language in order to take
advantage of automatic class reloading and persistent class file
storage.
|
 |
Since ProgramGenerator clients will typically require
that program instances implement a given interface or extend a given
class, method instantiate in interface
ProgrammingLanguage may need to be augmented with a
prototype interface that can be used by each language
processor to ensure that the program instance can act as a Java
object of the given type.
|
|
 |  |  |
 | The Markup Language Processor |  |
 |  |  |
A Cocoon's
MarkupLanguage
processor exposes the
following methods:
-
getEncoding .
Return the encoding to be used in program generation and
compilation or null to use the platform's
default encoding
-
generateCode .
Given a DOM Document written in a given
markup language, generate an equivalent program in a given
programming language)
A base markup language processor implementation is provided in
class
AbstractMarkupLanguage .
This class extends
AbstractNamedComponent
to set the markup language's
associated namespace using the following required parameters:
-
prefix .
The markup language's namespace prefix
-
uri .
The markup language's namespace URI
 |  |  |
 |
<component-instance name="xsp"
class="org.apache.cocoon.components.language.markup.xsp.XSPMarkupLanguage">
<parameter name="prefix" value="xsp"/>
<parameter name="uri" value="http://xml.apache.org/xsp"/>
</component-instance>
|  |
 |  |  |
AbstractMarkupLanguage adds a number of
abstract/overridable methods that must be implemented by concrete
markup language processors:
-
preprocessDocument .
Augment the input DOM Document to prepare it for
simpler, faster logicsheet-based code generation
-
getLogicsheets .
Return the list of logicsheets declared in the input document
according to the syntax of the markup language at hand
-
addDependency .
Add a dependency on an external file. This is used to inform
the concrete markup language processor about XML documents
included by means of XInclude as well as any intervening
logicsheet
 |
AbstractMarkupLanguage is currently tied to
logicsheets as the only means of generating source
code. While logicsheets provide a very powerful means for
code generation, good design dictates that the actual code
generation mechanism should be decoupled from the dynamic
markup language abstraction.
|
 |
The current code generation strategy is DOM-based. In principle,
this is adequate because document preprocessing may need random
access to document nodes. Code generation is being reconsidered,
however, to overcome this and make it possible to reuse Cocoon's
SAX-based filtering pipeline.
|
All markup languages must provide a way to declare the XML
document's encoding so that it is preserved during code generation,
beautifying and compilation.
This is required for proper i18n support, where the default
encoding usually replaces "exotic" characters with question marks.
 |
Ideally, it should be possible to determine the source XML document's
encoding from its declaring
<?xml?> processing instruction. Unfortunately,
XML parsers (both DOM and SAX) don't seem to provide access to it,
thus forcing server pages authors to redundantly specify it.
|
|
A logicsheet is an XML filter used to translate user-defined
dynamic markup into equivalent code embedding directives for a given
markup language.
Logicsheets lie at the core of XSP's promise to separate logic from
content and presentation: they make dynamic content generation
capabilities available to content authors not familiar with (and
not interested in) programming.
For a detailed description of logicsheets, see
Logicsheet Concepts.
Logicsheets are represented in class
Logicsheet .
This
class exposes the following methods:
-
setInputSource .
Set the InputSource pointing to the XSLT
stylesheet to be used for dynamic tag transformation
-
apply .
Apply the stylesheet to a given document
Logicsheet takes care of preserving all namespaces
defined in the input document. This is necessary when multiple
logicsheets are applied and multiple namespaces are used in the
input document.
 |
Currently, Logicsheet is a concrete class. It should
be redefined as an interface in order to decouple it from the use
of XSLT stylesheets. Again, while stylesheets are the "obvious" way
to implement logicsheets, a user-supplied XML filter may also be
used in some cases.
The current implementation uses an ugly
hack where a Xalan stylesheet processor is used to perform
the transformation without an intervening stylesheet processor
wrapping abstraction.
|
|
As explained in
Logicsheet Concepts,
logicsheets are typically associated with a single object type whose
methods it wraps to make them available as
markup commands.
Markup commands related to a given object type are grouped under a
single namespace.
Class
NamedLogicsheet
extends Logicsheet
to associate it with a namespace. This class exposes the following
additional methods:
-
setPrefix .
To set the logicsheet's namespace prefix
-
getPrefix .
To retrieve the logicsheet's namespace prefix
-
setUri .
To set the logicsheet's namespace URI
-
getUri .
To retrieve the logicsheet's namespace URI
Named logicsheets are used as
builtin logicsheets
by AbstractMarkupLanguage
to preload logicsheets and make them accessible
to dynamic XML documents without explicit declaration.
This feature relieves page authors from the need to explicitly
declare commonly used logicsheets in their documents. Builtin
logicsheets are automatically applied if the document declares
their same namespace URI.
 |
The current AbstractMarkupLanguage implementation
wrongly binds named logicsheets based on their namespace
prefix instead of their URI!
|
|
 |  |  |
 | Logicsheet Code Generators |  |
 |  |  |
Logicsheets translate dynamic tags to equivalent code-embedding
directives expressed in the markup language at hand. They do not,
however, actually emit the final source code program.
Code generation as such (i.e., the final production of a string
containing a source program written in a programming language) is
the responsibility of class LogicsheetCodeGenerator .
Class
LogicsheetCodeGenerator
exposes the following methods:
-
addLogicsheet .
Add a logicsheet to the generator's logicsheet list.
Logicsheets are applied in the order of their addition.
-
generateCode .
Return a string containing a source program resulting from
successively applying added logicsheets.
Though "regular" logicsheets as such do not emit source code,
LogicsheetCodeGenerator expects its last
stylesheet to produce a single element containing only
a text node.
This final, programming language-specific logicsheet is
responsible for actually expanding code-embedding directives
into source code.
For each supported target programming language, markup languages
must provide a core logicsheet.
 |
LogicsheetCodeGenerator is currently implemented as a
class. It should be defined as an interface in order to the decouple
the code generator abstraction from its logicsheet-based implementation.
This would allow for alternative code-generation strategies to
be plugged.
|
|
 |  |  |
 | Markup Language Definition |  |
 |  |  |
|
So far, programming and markup languages have been described
in general, without explicitly referring to the XSP language.
This section describes how the above described framework is
used to implement XSP in particular. For a description of
logicsheet authoring requirements for XSP in Java, see
XSLT Logicsheets and XSP for Java.
 |
The XSP syntax is being revised to allow for the omission of the
root <xsp:page> element. This is convenient
for the (typical) case in which all logic has been conveniently
placed in logicsheets so that XSP pages do not need to embed any
code. In this case, there should be no need for the
<xsp:page> element.
|
Method getEncoding is implemented by class
XSPMarkupLanguage
by retrieving the attribute named
encoding in the root <xsp:page> element.
 |
In absence of a <xsp:page> root element, the
encoding will be retrieved from an attribute named
xsp:encoding present in the "user" root element.
|
|
XSPMarkupLanguage preprocesses its input document
by:
-
Setting the root element
file-name attribute to the
base filename of its input source.
-
Setting the root element
file-path attribute to the
base directory name of its input source.
-
Setting the root element
creation-date attribute to the
current system time
-
Escaping text nodes according to the rules dictated by the
target programming language. This excludes text nodes enclosed
in
<xsp:logic> and <xsp:expr>
elements, as they are to be output as code.
 |
A feature to be added is collecting all text nodes under the document's
root element and replacing them by references to their relative index
position. This will allow for the generation of
contentHandler.characters method calls that reference
char arrays instead of constant String 's.
In addition to saving execution time, this will result in decreased
program size because common substrings can be output by "reusing"
their containing character arrays along with their corresponding
offsets and lengths.
|
|
File dependencies passed to XSPMarkupLanguage by
its AbstractMarkupLanguage superclass are stored
in top-level <xsp:dependency> elements.
These elements are used by XSP code-generation logicsheets to
populate the File array defined by the generated
classes' AbstractServerPage superclass.
|
XSP for Java currently provides only 2 builtin logicsheets:
request and response , associated
with their corresponding Cocoon counterparts.
 |
A mechanism is needed for Cocoon to pass additional objects
to XSP pages. In particular, for the servlet execution
environment, access to servlet objects is a must.
|
|
|
The
ProgramGenerator
interface exposes a single
load method that takes as arguments a File
pointing to a source XML document, as well as a markup and
programming language name pair.
This method is responsible for locating, loading and instantiating
a program derived from the given source document. Failing this,
the program is generated and stored in an external, persistent
repository.
Once instantiated, the program is kept in an in-memory cache for
speeding up subsequent requests.
For each request, the source XML document is checked for changes
and the program instance is queried for dependency changes so that
the program can be automatically regenerated and reloaded if needed.
This default behavior can be disabled by means of a sitemap
parameter.
 |
Currently, the program instance (as opposed to the
program object itself) is queried for invalidating changes.
This should change as a consequence of defining a separate
Program abstraction as part of the upcoming
addition of debugging support.
|
A default implementation of ProgramGenerator
is provided that uses a
FilesystemStore
as
repository:
ProgramGeneratorImpl .
FilesystemStore is an implementation of the
Store interface that uses a filesystem,
hierarchical directory as its persistence
mechanism.
 |
FilesystemStore implements Store
directly. A higher-level interface (PersistentStore )
should be defined to accommodate other sensible persistent
storage mechanisms such as relational databases or object
databases like
Ozone.
|
FilesystemStore expects the String
representation of its key 's to be filenames
relative to its directory root.
Objects returned by FilesystemStore 's
get method are File 's pointing to
their corresponding entries (or null if their
associated file doesn't exit).
FilesystemStore stores Java objects according
to the following rules:
-
null values generate empty directories
-
String values are dumped to text files
- All other
Object 's are serialized
|
Unless the auto-reload sitemap option is in effect,
ProgramGeneratorImpl will check whether program
instances implement interface Modifiable in order
to assert whether they should be regenerated and reloaded.
Method load uses its markupLanguageName and
programmingLanguage arguments to retrieve the corresponding
NamedComponent
instances.
In server pages mode, these parameters are set by the calling
ServerPagesGenerator from parameters passed via
the sitemap <process> section.
The appropriate MarkupLanguage and
ProgrammingLanguage instances are used to generate and
load a program for which an instance is created and then returned to
the calling environment.
|
|
In order to support pluggable markup and programming languages,
a new abstraction was added to Cocoon's arch
core interfaces:
NamedComponent .
Interface NamedComponent is simply an extension to
Component
that exposes a getName()
method.
NamedComponent 's belong to a collection of components
sharing the same Java type and are individually identified by a
name unique within each collection.
A
NamedComponentManager
is a component responsible
for storing and locating NamedComponent instances.
This interface exposes the following methods:
-
getComponent . Retrieve a NamedComponent
instance given its type and name .
-
getTypes . Return an Enumeration of all
known NamedComponent types.
-
getComponents . Return an Enumeration of
all NamedComponents within a given type .
A default implementation is provided for this interface:
NamedComponentManagerImpl .
Class
AbstractNamedComponent
provides a base implementation
for NamedComponent that extends
Configurable .
This class exposes the following methods:
-
setConfiguration .
Retrieve named-component sitemap configuration values
converting parameter name/value pairs into Parameters
passed to subclasses for easier initialization
-
setParameters .
An empty method to be overridden by subclasses for parameter-based
initialization
-
setAdditionalConfiguration .
An empty method to be overridden by subclasses when parameter-based
initialization is not sufficient because there are nested
configuration elements in the corresponding sitemap entry
-
getRequiredParameter .
A static convenience method that returns a named parameter as
a String throwing an
IllegalArgumentException
if the parameter was not specified in the sitemap configuration
|
 |  |  |
 | XSP Sitemap Configuration |  |
 |  |  |
|
|