The Apache Jakarta Project Jakarta HiveMind Project
 
   

Simple Data Language

One of the frequent criticisms of J2EE is: too much XML. That is, every small aspect of a J2EE application requires a big XML deployment descriptor to be generated (by hand, or generated from code in some way). What's interesting is what's in those XML files: configuration data containing simple strings and identifiers.

XML is overkill for these purposes: its a markup language, designed to add semantic meaning to documents that normally have a literal meaning (that is, documents that are supposed to be read primarily by persons, not other programs). Like many technologies, its intended use has been co-opted (to what degree is debatable). XML for real documents such as XHTML or SVG make sense. The complexity of SOAP mandates an industrial strength syntax to express its complex structure. But for the majority of uses of XML within the J2EE stack, it simply is vastly more complex than is necessary.

The complexity comes at some cost ... XML is very verbose, a tangle of punctuation (such as <, > and quotes) and repetition (start tags and end tags). Even experienced developers often need to take a bit of time to visually and mentally parse an XML snippet.

Through HiveMind release 1.0-alpha-4, HiveMind was as guilty as the next project in XML usage. HiveMind module deployment descriptors would, at least, centralize the XML concerning a service, and enforce some amount of uniformity.

Release 1.0-alpha-5 introduces Simple Data Language, an alternative to the use of XML in HiveMind. XML will continue to be supported as a first class citizen, but in HiveMind, there is not such a compelling reason to use it!

Goals

The goals of SDL are to provide the bare essentials needed for a hierachical data language, but keep is spare and readable. Unecessary typing is to be avoided, so the use of quotes is made optional whereever possible. SDL syntax should be reasonably obvious to an interested observer.

Examples

Before getting bogged down in a formal specification for SDL, a few simple examples will explain just about everything. Compare the following two HiveMind module deployment descriptors, which express identical information:

Traditional XML Format:

<?xml version="1.0"?>

<module id="some.module" version="1.0.0">
  <configuration id="ControlPipeline">
    <schema>
      <element name="processor">
       
         <attribute name="name" required="true"/>
         <attribute name="service-id" required="true" translator="service"/>
         <attribute name="before"/>
         <attribute name="after"/>
         
         <conversion class="some.module.PipelineContribution">
           <map property="controlService" attribute="service-id"/>
         </conversion>
      
      </element>
    </schema>
  </configuration>
</module>	
 

SDL format:

module (id=some.module version="1.0.0")
{
  configuration (id=ControlPipeline)
  {
    schema 
    {
      element (name=processor) 
      {
        attribute (name=name required=true) 
        attribute (name=service-id required=true translator=service) 
        attribute (name=before) 
        attribute (name=after)
        
        conversion (class=some.module.PipelineContribution)
        {
          map (property=controlService attribute=service-id)
        }
      }
    }
  }
}
 

Some observations:

  • SDL uses open and close braces to denote containment of elements within another element
  • Attributes, as a list of name-value pairs, are placed in parenthesis following the element name
  • Elements without attributes can omit the parenthesis (example: schema)
  • Elements that do not contain other elements can omit the open and close braces denoting thier body (example: attribute)
  • Most common strings do not have to be quoted
  • All whitespace not inside quotes is ignored

Whitespace

All whitespace (outside of literals) is ingored. Whitespace is considered to be:

  • Spaces
  • Tabs
  • Newlines
  • Carriage Returns

Comments

Comments are in the format traditional in Java and C:

// This is a comment that extends to the end of the	current line. 

/* This is a multiline 
   comment. */ 
 

Comments may appear anywhere in an SDL document (except within quoted strings) and are always ignored.

Element and Attribute Names

Element and attribute names must be simple ids. They must start with a letter (or underscore) and may contain only letters, digits, underscores and dashes. They may not be enclosed in quotes.

Literal Values

Attribute values may be literal values. Literal values are considered one of the following:

  • simple ids
  • complex ids -- a sequence of simple ids seperated by periods
  • segmented ids -- a sequence of complex ids separated by colons
  • numeric values
  • Symbol references
  • Quoted strings
  • Extended literals

Complex ids have the same format as Java class and interface names (but can, additionally, contain dash characters which are not allowed in Java).

Numeric values consist of an optional sign (+ or - ) followed by a integer or decimal value. In the future, a more expansive definition may be provided.

Symbol references allow Ant-style symbols to be used directly in SDL. Example:

. . .
  set-service (service-id=${symbol.for.service-id})
. . .	 

Support for Ant-style symbols is a convienience (the same syntax is used heavily within HiveMind). There is no difference between ${symbol.for.service-id} and "${symbol.for.service-id}" ... both will be processed identically.

Quoted strings are similar to Java string literals. All whitespace within the string is retained as-is, including line breaks. A subset of the Java escape codes are currently supported:

  • \t (tab)
  • \n (newline)
  • \r (carriage return)
  • \" (quote)
  • \\ (slash)

Any other sequence is passed through normally (unescaped).

Extended literals have a different syntax:

. . .
  description =
<< A long, multiline string
that may contain "quoted" sections. >>	
. . .

Extended literals may contain any character sequence (except >> ). Escape sequences in expanded literals are not interpreted. All whitespace within the delimiters is retained.

Literal Gotcha

The body of an element may contain literal text data, just as with XML. Unlike XML, whitespace is completely removed. Thus the following are equivalent:

first
{ 
  "NowIsTheTime" 
}
second
{
  "Now" "Is" <<The>> <<Time>>
}

This applies to all forms of literals, including numbers. The following are identical:

pi1
{ 
  3.14159
}
pi2
{
  3 .14 159
} 

Inside the body of an element, simple ids are interpreted as elements not string literals. In the following example, root1 and root2 have the same structure (each contains three children and no content). leaf contains no children, and its content is child1child2child3.

root1 
{
  child1 {}
  child2 {}
  child3 {}
}
root2
{
  child1 child2 child3
}
leaf
{
  "child1" "child2" "child3"
}	

TO DO

  • Expand the definition of "character" to properly include Unicode
  • Add Unicode escape patterns in quoted literals
  • Expand the definition of numeric literal to include all Java literals