Apache UIMA Sandbox v2.2.2 Release Notes ----------------------------------------------------------------------- CONTENTS 1. What is the UIMA? 2. What is the Apache UIMA annotator package? 3. Major Changes in this Release 4. How to Get Involved 5. How to Report Issues 6. List of JIRA Issues Fixed in this Release 1. What is UIMA? Unstructured Information Management applications are software systems that analyze large volumes of unstructured information in order to discover knowledge that is relevant to an end user. UIMA is a framework and SDK for developing such applications. An example UIM application might ingest plain text and identify entities, such as persons, places, organizations; or relations, such as works-for or located-at. UIMA enables such an application to be decomposed into components, for example "language identification" -> "language specific segmentation" -> "sentence boundary detection" -> "entity detection (person/place names etc.)". Each component must implement interfaces defined by the framework and must provide self-describing metadata via XML descriptor files. The framework manages these components and the data flow between them. Components are written in Java or C++; the data that flows between components is designed for efficient mapping between these languages. UIMA additionally provides capabilities to wrap components as network services, and can scale to very large volumes by replicating processing pipelines over a cluster of networked nodes. Apache UIMA is an Apache-licensed open source implementation of the UIMA specification (that specification is, in turn, being developed concurrently by a technical committee within OASIS , a standards organization). We invite and encourage you to participate in both the implementation and specification efforts. UIMA is a component framework for analysing unstructured content such as text, audio and video. It comprises an SDK and tooling for composing and running analytic components written in Java and C++, with some support for Perl, Python and TCL. 2. What is the Apache UIMA annotator package? The Apache UIMA annotator package is an add-on package for the base UIMA release. The add-on package contains annotator components developed for Apache UIMA. The add-on package fits the Apache UIMA directory structure and adds a directory called "addons/annotator" that contains the following annotator components: - DictionaryAnnotator - RegularExpressionAnnotator - Tagger - WhitespaceTokenizer 3. Major Changes in this Release The Apache UIMA annotator package release version 2.2.2 is the first release of this package. The package contains the following components: - DictionaryAnnotator - RegularExpressionAnnotator - Tagger - WhitespaceTokenizer - SimpleServer - PearPackagingAntTask - PearPackagingMavenPlugin For a list of all JIRA issues fixed with the current Sandbox release, please refer to chapter 6. "List of JIRA Issues Fixed in this Release". 4. How to Get Involved The Apache UIMA project really needs and appreciates any contributions, including documentation help, source code and feedback. If you are interested in contributing, please visit http://incubator.apache.org/uima/get-involved.html. 5. How to Report Issues The Apache UIMA project uses JIRA for issue tracking. Please report any issues you find at http://issues.apache.org/jira/browse/uima. 6. List of JIRA Issues Fixed in this Release Release Notes - UIMA - Version 2.2.2 ** Bug * [UIMA-444] - Sandbox projects build only when in same directory as uimaj projects * [UIMA-588] - fix RegularExpressionAnnotator tests - add type priorities to get the same results for all JVMs * [UIMA-612] - add License and Notice files * [UIMA-613] - remove compiler warnings after moving to Java 1.5 * [UIMA-614] - remove compiler warnings after moving to Java 1.5 * [UIMA-617] - change POM to work with Java 1.5 * [UIMA-620] - switch concept file parsing from File to InputStream * [UIMA-621] - change the way to add the compiled sources to the PEAR package * [UIMA-625] - update DictionaryAnnotator message catalog * [UIMA-646] - remove classpath as required argument for the PEAR packaging plugin * [UIMA-653] - allow feature normalization also on non-String based features * [UIMA-725] - case sensitive dictionaries do not work correctly * [UIMA-757] - Tagger throws ClassCastException * [UIMA-760] - add regex annotator performance test * [UIMA-762] - rename xmltypes.jar * [UIMA-765] - fix email address regex - escape "-" in regular expression * [UIMA-768] - rename xmltypes.jar * [UIMA-773] - Some files missing license headers * [UIMA-775] - fix Findbugs issues * [UIMA-776] - fix Findbugs issues * [UIMA-778] - fix Findbugs issues * [UIMA-795] - dictionaries created by DictionaryCreator cannot be used * [UIMA-803] - change whitespace character definition * [UIMA-804] - change default multi token separator from \t to | * [UIMA-808] - SimpleServerServlet throws NullPointerExcpetion if no parameter was specified in doGet or doPost * [UIMA-812] - Dictionary annotator does not work with several dictionaries in a single descriptor * [UIMA-819] - rename all shipment jars start with uima- * [UIMA-820] - fix classpath entry "null;" if no classpath was specified * [UIMA-827] - fix NPE for interger based feature values that are null * [UIMA-834] - replace special characters with XML entities when generating dictionaries * [UIMA-855] - java.lang.ArrayIndexOutOfBoundsException in Tagger * [UIMA-864] - update version from 2.2.2-incubating-SNAPSHOT to 2.2.2-incubating * [UIMA-882] - rename Tagger XML descriptor and add license header * [UIMA-883] - Missing license headers in JCas files and some XML files of simple server * [UIMA-887] - minor update for money detection regular expression * [UIMA-909] - Model files are contained twice in pear file * [UIMA-917] - Simple Server test annotator misses last character in text * [UIMA-940] - change way of deleting files in the PearPackagingMavenPlugin * [UIMA-942] - Regex performance test doesn't run on Linux * [UIMA-943] - DictionaryAnnotator tests doesn't run on Linux * [UIMA-944] - simple server notice file contains redundant uima reference * [UIMA-945] - move LICENSE and NOTICE files to toplevel dir for annotator package * [UIMA-947] - Documentation: resources path in web.xml incorrect * [UIMA-953] - "\" in regex variables are not escaped * [UIMA-970] - update annotator package release files * [UIMA-971] - minor documentation updates for PearPackagingMavenPlugin * [UIMA-972] - fix jar file name in pear ant taks documentation * [UIMA-973] - annotator package jars does not have correct Manifest information ** Improvement * [UIMA-350] - add performance test for WhitespaceTokenizer * [UIMA-550] - Sandbox components: use UIMA artifacts from the repository * [UIMA-577] - split up the Sandbox documentation build * [UIMA-590] - change the way the RegularExpressionAnnotator load the configuration files * [UIMA-592] - add feature value normalization for RegEx Annotator * [UIMA-594] - update RegexAnnotator with custom anntoation validator * [UIMA-602] - add PEAR packaging task for RegexAnnotaor * [UIMA-610] - minor documentation updates - added some real world examples * [UIMA-615] - update DictionaryBuilder tests to work with XML dictionary formats * [UIMA-618] - add documentation infrastructure for the DictionaryAnnotator * [UIMA-631] - Switch dictionary file parsing from File input to InputStream * [UIMA-634] - improve DictionaryAnnotator exception handling * [UIMA-635] - add documentation for the PearPackagingMavenPlugin * [UIMA-637] - add multi-word separator configuration for the DictionaryAnnotator * [UIMA-644] - update RegexAnnotator tests after test coverage analysis * [UIMA-647] - add DictionaryAnnotator tests * [UIMA-666] - update feature normalization interface - add additional information * [UIMA-691] - add DictionaryCreator command line * [UIMA-696] - build documentation automatically during the component build * [UIMA-717] - minor performance improvements * [UIMA-719] - Current Version of the HMM Tagger * [UIMA-728] - add money amount detection for regex annotator - use match group names * [UIMA-753] - Some improvements in the algorithm, structural changes as well as docbook update * [UIMA-758] - Make the tagger runtime read its properties from the descriptor, not a properties file * [UIMA-763] - Automatically build PEAR file for Tagger * [UIMA-779] - Some modifications in the tagger code (esp. in the implementation of the SuffixTree.EDGE class) * [UIMA-791] - Patch containing some improvements * [UIMA-806] - use Java NumberFormat to convert string numbers to float or integer * [UIMA-840] - change uima-as version to 2.2.2-incubating-SNAPSHOT (match uimaj version), add script to update version * [UIMA-877] - Reverse multiple copyright statements in docbooks, per request at previous release vote * [UIMA-918] - Fix version number in sandbox docs ** New Feature * [UIMA-95] - add sandbox infrastructure * [UIMA-151] - Add project for uima whitespace tokenizer implementation * [UIMA-384] - create a pear packaging ant task * [UIMA-539] - implement UIMA RegularExpressionAnnotator * [UIMA-555] - add documentation for the RegularExpressionAnnotator * [UIMA-595] - add Rule to the RegexAnnotator to detect credit card numbers * [UIMA-600] - add new DictionaryAnnotator implementation * [UIMA-601] - initial import of the PEAR packaging maven plugin * [UIMA-603] - update Sandbox documentation build * [UIMA-604] - Create HMM POS project in the sandbox * [UIMA-605] - UIMA Sandbox tagger initial code drop * [UIMA-642] - allow RegularExpressionAnnotator to match on featurePath values * [UIMA-645] - minor code updates for WhitespaceTokenizer * [UIMA-651] - add regex variables to the concept file syntax * [UIMA-669] - update WhitespaceTokenizer to be sofa aware * [UIMA-685] - Create documentation for SimpleServer * [UIMA-692] - allow DictionaryAnnotator to match on featurePath values * [UIMA-695] - allow DictionaryAnnotator to filter the inputMatch annotations * [UIMA-697] - add DictionaryAnnotator documentation * [UIMA-724] - allow match group names for regular expressions * [UIMA-770] - add PEAR build to WhitespaceTokenizer POM * [UIMA-771] - call documentation build from POM * [UIMA-772] - add new Sandbox-dist project that contains the Sandbox build * [UIMA-884] - Add default output abilities to simple server * [UIMA-907] - Add SimpleServer to sandbox distribution ** Task * [UIMA-682] - update Sandbox components to work on the new uimaj-2.2.1-incubating release