XML serialization and parsing support
Juneau supports converting arbitrary POJOs to and from XML using ultra-efficient serializers and parsers.
The XML serializer converts POJOs directly to XML without the need for intermediate DOM objects.
Likewise, the XML parser uses a STaX parser and creates POJOs directly without intermediate DOM objects.
Unlike frameworks such as JAXB, Juneau does not require POJO classes to be annotated to produce and consume
XML.
For example, it can serialize and parse instances of any of the following POJO types:
String
, Integer
, Boolean
,
Float
).
HashSet
, TreeMap
) containing anything
on this list.
Strings
(e.g. classes containing
toString()
, fromString()
, valueOf()
,
constructor(String)
).
In addition to the types shown above, Juneau includes the ability to define transforms to transform
non-standard object and property types to serializable forms (e.g. to transform Calendars
to and
from ISO8601
strings, or byte[]
arrays to and from base-64 encoded strings).
These transforms can be associated with serializers/parsers, or can be associated with classes or bean
properties through type and method annotations.
Refer to POJO Categories for a complete definition of supported POJOs.
While annotations are not required to produce or consume XML, several XML annotations are provided for handling namespaces and fine-tuning the format of the XML produced.
The Juneau XML serialization and parsing support does not require any external prerequisites. It only requires Java 1.6 or above.
The example shown here is from the Address Book resource located in the juneau-examples-rest
microservice project.
The POJO model consists of a List
of Person
beans, with each Person
containing zero or more Address
beans.
When you point a browser at /sample/addressBook
, the POJO is rendered as HTML:
By appending ?Accept=mediaType&plainText=true
to the URL, you can view the data
in the various supported XML formats:
In addition to serializing POJOs to XML, Juneau includes support for serializing the POJO metamodel to XML Schema, with support for multiple namespaces.
{@link org.apache.juneau.xml.XmlSerializer} is the class used to convert POJOs to XML.
{@link org.apache.juneau.xml.XmlDocSerializer} is a subclass that adds an XML declaration element to the output
before the POJO is serialized.
The XML serializer includes many configurable settings.
Static reusable instances of XML serializers are provided with commonly-used settings:
In addition, DTO beans are provided that use the XML serializer and parser for the following languages:
Refer to the package-level Javadocs for more information about those formats.
The examples shown in this document will use single-quote, readable settings.
For brevity, the examples will use public fields instead of getters/setters to reduce the size of the examples.
In the real world, you'll typically want to use standard bean getters and setters.
To start off simple, we'll begin with the following simplified bean and build upon it.
The following code shows how to convert this to simple XML (no namespaces):
Side note: Serializers can also be created by cloning existing serializers:
The code above produces the following output:
The first thing you may notice is how the bean instance is represented by the element
When objects have no name associated with them, Juneau provides a default generalized name that maps to the
equivalent JSON data type.
Some cases when objects do not have names:
The generalized name reflects the JSON-equivalent data type.
Juneau produces JSON-equivalent XML, meaning any valid JSON document can be losslessly converted into an XML
equivalent.
In fact, all of the Juneau serializers and parsers are built upon this JSON-equivalence.
The following examples show how different data types are represented in XML. They mirror how the data structures are represented in JSON.
The representation of loose (not a direct bean property value) simple types are shown below:
Data type | JSON example | XML |
---|---|---|
string | ||
boolean | ||
integer | 123 | |
float | 1.23 | |
null |
Loose maps and beans use the element
Object
or superclass/interface value type).
Data type | JSON example | XML |
---|---|---|
Map<String,String> |
{
k1: |
|
Map<String,Number> |
{
k1: 123,
k2: 1.23,
k3: |
|
Map<String,Object> |
{
k1: |
Loose collections and arrays use the element
Data type | JSON example | XML |
---|---|---|
String[] |
[
|
|
Number[] |
[
123,
1.23,
|
|
Object[] |
[
|
|
String[][] |
[
[ |
|
|
[ 123 ] | |
|
[
|
|
List<String> |
[
|
|
List<Number> |
[
123,
1.23,
|
|
List<Object> |
[
|
Data type | JSON example | XML |
---|---|---|
|
{
a: |
Data type | JSON example | XML |
---|---|---|
|
{
a: {
k1: |
Just because Juneau allows you to serialize ordinary POJOs to XML doesn't mean you are limited to just
JSON-equivalent XML.
Several annotations are provided in the
org.apache.juneau.xml.annotation package for customizing the output.
The {@link org.apache.juneau.annotation.Bean#typeName() @Bean.typeName()} annotation can be used to override the Juneau default name on bean elements. Types names serve two distinct purposes:
Data type | JSON example | Without annotation | With annotation |
---|---|---|---|
|
{
a: |
On bean properties, a
In the following example, a type attribute is used on property 'b' but not property 'a' since
'b' is of type Object
and therefore the bean class cannot be inferred.
Java | Without annotation | With annotation |
---|---|---|
|
string
, number
, boolean
, object
,
array
, and null
are reserved keywords that cannot be used as type names.
Beans with type names are often used in conjunction with the {@link org.apache.juneau.annotation.Bean#beanDictionary() @Bean.beanDictionary()} and {@link org.apache.juneau.annotation.BeanProperty#beanDictionary() @BeanProperty.beanDictionary()} annotations so that the beans can be resolved at parse time. These annotations are not necessary during serialization, but are needed during parsing in order to resolve the bean types.
The following examples show how type names are used under various circumstances.
Pay special attention to when
Java | XML |
---|---|
|
|
|
|
|
Bean type names are also used for resolution when abstract fields are used. The following examples show how they are used in a variety of circumstances.
Java | XML |
---|---|
|
|
|
|
|
|
|
On a side note, characters that cannot be represented in XML 1.0 are encoded using a simple encoding.
Note in the examples below, some characters such as
Java | XML |
---|---|
|
|
|
While it's true that these characters CAN be represented in XML 1.1, it's impossible to parse XML 1.1 text in Java without the XML containing an XML declaration. Unfortunately, this, and the uselessness of the {@link javax.xml.stream.XMLInputFactory#IS_REPLACING_ENTITY_REFERENCES} setting in Java forced us to make some hard design decisions that may not be the most elegant.
The {@link org.apache.juneau.xml.annotation.Xml#childName() @Xml.childName()} annotation can be used to specify the name of XML child elements for bean properties of type collection or array.
Data type | JSON example | Without annotation | With annotation |
---|---|---|---|
|
{
a: [ |
||
|
{ a: [123,456] } |
The {@link org.apache.juneau.xml.annotation.Xml#format() @Xml.format()} annotation can be used to tweak
the XML format of a POJO.
The value is set to an enum value of type {@link org.apache.juneau.xml.annotation.XmlFormat}.
This annotation can be applied to both classes and bean properties.
The {@link org.apache.juneau.xml.annotation.XmlFormat#ATTR} format can be applied to bean properties to
serialize them as XML attributes instead of elements.
Note that this only supports properties of simple types (e.g. strings, numbers, booleans).
Data type | JSON example | Without annotation | With annotation |
---|---|---|---|
|
{
a: |
The {@link org.apache.juneau.xml.annotation.XmlFormat#ATTRS} format can be applied to bean classes to force all bean properties to be serialized as XML attributes instead of child elements.
Data type | JSON example | Without annotation | With annotation |
---|---|---|---|
|
{
a: |
The {@link org.apache.juneau.xml.annotation.XmlFormat#ELEMENT} format can be applied to bean properties to override the {@link org.apache.juneau.xml.annotation.XmlFormat#ATTRS} format applied on the bean class.
Data type | JSON example | Without annotation | With annotation |
---|---|---|---|
|
{
a: |
The {@link org.apache.juneau.xml.annotation.XmlFormat#ATTRS} format can be applied to a single bean
property of type Map<String,Object>
to denote arbitrary XML attribute values on the
element.
These can be mixed with other {@link org.apache.juneau.xml.annotation.XmlFormat#ATTR} annotated
properties, but there must not be an overlap in bean property names and map keys.
Data type | JSON example | Without annotation | With annotation |
---|---|---|---|
|
{
a: {
k1: |
The {@link org.apache.juneau.xml.annotation.XmlFormat#COLLAPSED} format can be applied to bean properties
of type array/Collection.
This causes the child objects to be serialized directly inside the bean element.
This format must be used in conjunction with {@link org.apache.juneau.xml.annotation.Xml#childName()}
to differentiate which collection the values came from if you plan on parsing the output back into beans.
Note that child names must not conflict with other property names.
Data type | JSON example | Without annotation | With annotation |
---|---|---|---|
|
{
a: [ |
The {@link org.apache.juneau.xml.annotation.XmlFormat#ELEMENTS} format can be applied to a single bean
property of either a simple type or array/Collection.
It allows free-form child elements to be formed.
All other properties on the bean MUST be serialized as attributes.
Data type | JSON example | With annotation |
---|---|---|
|
{
a: |
|
|
{
a: |
The {@link org.apache.juneau.xml.annotation.XmlFormat#MIXED} format is similar to
{@link org.apache.juneau.xml.annotation.XmlFormat#ELEMENTS} except elements names on primitive types
(string/number/boolean/null) are stripped from the output.
This format particularly useful when combined with bean dictionaries to produce mixed content.
The bean dictionary isn't used during serialization, but it is needed during parsing to resolve bean
types.
The {@link org.apache.juneau.xml.annotation.XmlFormat#MIXED_PWS} format identical to {@link org.apache.juneau.xml.annotation.XmlFormat#MIXED} except whitespace characters are preserved in the output.
Data type | JSON example | Without annotations | With annotations |
---|---|---|---|
|
{
a: [
|
Whitespace (tabs and newlines) are not added to MIXED child nodes in readable-output mode. This helps ensures strings in the serialized output can be losslessly parsed back into their original forms when they contain whitespace characters. If the {@link javax.xml.stream.XMLInputFactory#IS_REPLACING_ENTITY_REFERENCES} setting was not useless in Java, we could support lossless readable XML for MIXED content. But as of Java 8, it still does not work.
XML suffers from other deficiencies as well that affect MIXED content.
For example,
The examples below show how whitespace is handled under various circumstances:
Data type | XML |
---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
It should be noted that when using
The {@link org.apache.juneau.xml.annotation.XmlFormat#TEXT} format is similar to
{@link org.apache.juneau.xml.annotation.XmlFormat#MIXED} except it's meant for solitary objects that
get serialized as simple child text nodes.
Any object that can be serialize to a String
can be used.
The {@link org.apache.juneau.xml.annotation.XmlFormat#TEXT_PWS} is the same except whitespace is
preserved in the output.
Data type | JSON example | Without annotations | With annotations |
---|---|---|---|
|
{
a: |
The {@link org.apache.juneau.xml.annotation.XmlFormat#XMLTEXT} format is similar to
{@link org.apache.juneau.xml.annotation.XmlFormat#TEXT} except it's meant for strings containing XML
that should be serialized as-is to the document.
Any object that can be serialize to a String
can be used.
During parsing, the element content gets parsed with the rest of the document and then re-serialized to
XML before being set as the property value.
This process may not be perfect (e.g. double quotes may be replaced by single quotes, etc...).
Data type | JSON example | With TEXT annotation | With XMLTEXT annotation |
---|---|---|---|
|
{
a: |
Let's go back to the example of our original Person
bean class:
However, this time we'll leave namespaces enabled on the serializer:
Now when we run this code, we'll see namespaces added to our output:
This isn't too exciting yet since we haven't specified any namespaces yet.
Therefore, everything is defined under the default Juneau
namespace.
Namespaces can be defined at the following levels:
It's typically best to specify the namespaces used at the package level.
We'll do that here for the package containing our test code.
We're defining four namespaces in this package and designating
Take special note that the
Other XML annotations are also modeled after JAXB.
However, since many of the features of JAXB are already implemented for all serializers and parsers
at a higher level through various general annotations such as {@link org.apache.juneau.annotation.Bean}
and {@link org.apache.juneau.annotation.BeanProperty} it was decided to maintain separate Juneau XML
annotations instead of reusing JAXB annotations.
This may change in some future implementation, but for now it was decided that having separate Juneau XML
annotations was less confusing.
On our bean class, we'll specify to use the
Now when we serialize the bean, we get the following:
We can simplify the output by setting the default namespace on the serializer so that all the elements do not need to be prefixed:
This produces the following equivalent where the elements don't need prefixes since they're already in the default document namespace:
One important property on the XML serializer class is
{@link org.apache.juneau.xml.XmlSerializer#XML_autoDetectNamespaces XML_autoDetectNamespaces}.
This property tells the serializer to make a first-pass over the data structure to look for namespaces
defined on classes and bean properties.
In high-performance environments, you may want to consider disabling auto-detection and providing your
own explicit list of namespaces to the serializer to avoid this scanning step.
The following code will produce the same output as before, but will perform slightly better since it avoids this pre-scan step.
The {@link org.apache.juneau.annotation.Bean @Bean} and {@link org.apache.juneau.annotation.BeanProperty @BeanProperty}
annotations are used to customize the behavior of beans across the entire framework.
In addition to using them to identify the resource URI for the bean shown above, they have various other
uses:
For example, we now add a birthDate
property, and associate a transform with it to transform
it to an ISO8601 date-time string in GMT time.
By default, Calendars
are treated as beans by the framework, which is usually not how you want
them serialized.
Using transforms, we can convert them to standardized string forms.
Next, we alter our code to pass in the birthdate:
Now when we rerun the sample code, we'll get the following:
Another useful feature is the {@link org.apache.juneau.annotation.Bean#propertyNamer()} annotation that
allows you to plug in your own logic for determining bean property names.
The {@link org.apache.juneau.PropertyNamerDLC} is an example of an alternate property namer.
It converts bean property names to lowercase-dashed format.
In our example, let's add a list-of-beans property to our sample class:
The Address
class has the following properties defined:
Next, add some quick-and-dirty code to add an address to our person bean:
Now when we run the sample code, we get the following:
Juneau provides the {@link org.apache.juneau.xml.XmlSchemaSerializer} class for generating XML-Schema
documents that describe the output generated by the {@link org.apache.juneau.xml.XmlSerializer} class.
This class shares the same properties as XmlSerializer
.
Since the XML output differs based on settings on the XML serializer class, the XML-Schema serializer
class must have the same property values as the XML serializer class it's describes.
To help facilitate creating an XML Schema serializer with the same properties as the corresponding
XML serializer, the {@link org.apache.juneau.xml.XmlSerializer#getSchemaSerializer()} method
has been added.
XML-Schema requires a separate file for each namespace.
Unfortunately, does not mesh well with the Juneau serializer architecture which serializes to single writers.
To get around this limitation, the schema serializer will produce a single output, but with multiple
schema documents separated by the null character (
Lets start with an example where everything is in the same namespace.
We'll use the classes from before, but remove the references to namespaces.
Since we have not defined a default namespace, everything is defined under the default Juneau namespace.
The code for creating our POJO model and generating XML Schema is shown below:
Now if we add in some namespaces, we'll see how multiple namespaces are handled.
The schema consists of 4 documents separated by a
For convenience, the {@link org.apache.juneau.xml.XmlSchemaSerializer #getValidator(SerializerSession,Object)} method is provided to create a {@link javax.xml.validation.Validator} using the input from the serialize method.
The XML serializer is designed to be used against POJO tree structures.
It expects that there not be loops in the POJO model (e.g. children with references to parents, etc...).
If you try to serialize models with loops, you will usually cause a StackOverflowError
to
be thrown (if {@link org.apache.juneau.serializer.Serializer#SERIALIZER_maxDepth} is not reached
first).
If you still want to use the XML serializer on such models, Juneau provides the
{@link org.apache.juneau.serializer.Serializer#SERIALIZER_detectRecursions} setting.
It tells the serializer to look for instances of an object in the current branch of the tree and skip
serialization when a duplicate is encountered.
For example, let's make a POJO model out of the following classes:
Now we create a model with a loop and serialize the results.
What we end up with is the following, which does not serialize the contents of the c
field:
Without recursion detection enabled, this would cause a stack-overflow error.
Recursion detection introduces a performance penalty of around 20%.
For this reason the setting is disabled by default.
See the following classes for all configurable properties that can be used on this serializer:
The {@link org.apache.juneau.xml.XmlParser} class is the class used to parse Juneau-generated XML back into POJOs.
A static reusable instance of XmlParser
is also provided for convenience:
Let's build upon the previous example and parse the generated XML back into the original bean.
We start with the XML that was generated.
This code produced the following:
The code to convert this back into a bean is:
We print it out to JSON to show that all the data has been preserved:
{
id: 1,
name:
The XML parser is not limited to parsing back into the original bean classes.
If the bean classes are not available on the parsing side, the parser can also be used to parse into a
generic model consisting of Maps
, Collections
, and primitive objects.
You can parse into any Map
type (e.g. HashMap
, TreeMap
), but
using {@link org.apache.juneau.ObjectMap} is recommended since it has many convenience methods
for converting values to various types.
The same is true when parsing collections. You can use any Collection (e.g. HashSet
,
LinkedList
) or array (e.g. Object[]
, String[]
,
String[][]
), but using {@link org.apache.juneau.ObjectList} is recommended.
When the map or list type is not specified, or is the abstract Map
, Collection
,
or List
types, the parser will use ObjectMap
and ObjectList
by
default.
See the following classes for all configurable properties that can be used on this parser:
*** fín ***