Apache UIMA Sandbox v2.3.0 Release Notes

Contents

1. What is UIMA?
2. What is the Apache UIMA annotator package?
3. Major Changes in this Release
4. How to Get Involved
5. How to Report Issues
6. List of JIRA Issues Fixed in this Release

1. What is UIMA?

Unstructured Information Management applications are software systems that analyze large volumes of unstructured information in order to discover knowledge that is relevant to an end user. UIMA is a framework and SDK for developing such applications. An example UIM application might ingest plain text and identify entities, such as persons, places, organizations; or relations, such as works-for or located-at. UIMA enables such an application to be decomposed into components, for example "language identification" -> "language specific segmentation" -> "sentence boundary detection" -> "entity detection (person/place names etc.)". Each component must implement interfaces defined by the framework and must provide self-describing metadata via XML descriptor files. The framework manages these components and the data flow between them. Components are written in Java or C++; the data that flows between components is designed for efficient mapping between these languages. UIMA additionally provides capabilities to wrap components as network services, and can scale to very large volumes by replicating processing pipelines over a cluster of networked nodes.

Apache UIMA is an Apache-licensed open source implementation of the UIMA specification (that specification is, in turn, being developed concurrently by a technical committee within OASIS , a standards organization). We invite and encourage you to participate in both the implementation and specification efforts.

UIMA is a component framework for analysing unstructured content such as text, audio and video. It comprises an SDK and tooling for composing and running analytic components written in Java and C++, with some support for Perl, Python and TCL.

2. What is the Apache UIMA annotator package?

The Apache UIMA annotator package is an add-on package for the base UIMA release. The add-on package contains annotator components developed for Apache UIMA. The add-on package fits the Apache UIMA directory structure and adds a directory called "addons/annotator" that contains the following annotator components:
- DictionaryAnnotator
- RegularExpressionAnnotator
- Tagger
- WhitespaceTokenizer
- DictionaryAnnotator
- RegularExpressionAnnotator
- Tagger
- WhitespaceTokenizer
- Bean Scripting Framework (BSF) BSFAnnotator
- ConceptMapper
- ConfigurableFeatureExtractor
- Lucas - an interface to using UIMA with Lucene
- OpenCalaisAnnotator - an sample annotator using the OpenCalais Service
- SnowballAnnotator - an annotator making use of the snowball stemmers
- TikaAnnotator - an annotator using the Tika project text extractors

Additionally the package contains some components to package annotators and for accessing annotators as simple REST service. These are:
- PearPackagingAntTask
- SimpleServer

Finally, there is an addon to the base UIMA: - FsVariables

Each component has a separate LICENSE and NOTICE files; some also have Readme and other documentation (in docs/). Documentation is also available on the UIMA website, in the Sandbox area.

3. Major Changes in this Release

The Apache UIMA annotator package release version 2.3.0 adds the is the first release following components to the previously released ocmponents:
- Bean Scripting Framework (BSF) BSFAnnotator
- ConceptMapper
- ConfigurableFeatureExtractor
- Lucas - an interface to using UIMA with Lucene
- OpenCalaisAnnotator - an sample annotator using the OpenCalais Service
- SnowballAnnotator - an annotator making use of the snowball stemmers
- TikaAnnotator - an annotator using the Tika project text extractors

The PearPackagingMavenPlugin is moved to the base UIMA release package.

The XMLBean support is migrated to version 2.4.0, and all of the projects now use the maven xmlbeans plugin to generate the XML parsers.

Finally, there is an addon to the base UIMA:
- FsVariables

For a list of all JIRA issues fixed with the current Sandbox release, please refer to chapter 6. List of JIRA Issues Fixed in this Release.

4. How to Get Involved

The Apache UIMA project really needs and appreciates any contributions, including documentation help, source code and feedback. If you are interested in contributing, please visit http://incubator.apache.org/uima/get-involved.html.

5. How to Report Issues

The Apache UIMA project uses JIRA for issue tracking. Please report any issues you find at http://issues.apache.org/jira/browse/uima

6. List of JIRA Issues Fixed in this Release

Release Notes - UIMA - Version 2.3S

Bug

Improvement

New Feature

Task

Wish