Tika Change Log Release 0.1-incubating - 12/27/2007 1. TIKA-5 - Port Metadata Framework from Nutch (mattmann) 2. TIKA-11 - Consolidate test classes into a src/test/java directory tree (mattmann) 3. TIKA-15 - Utils.print does not print a Content having no value (jukka) 4. TIKA-19 - org.apache.tika.TestParsers fails (bdelacretaz) 5. TIKA-16 - Issues with data files used for testing by TestParsers (bdelacretaz) 6. TIKA-14 - MimeTypeUtils.getMimeType() returns the default mime type for .odt (Open Office) file (bdelacretaz) 7. TIKA-12 - Add URL capability to MimeTypesUtils (jukka) 8. TIKA-13 - Fix obsolete package names in config.xml (siren) 9. TIKA-10 - Remove MimeInfoException catch clauses and import from TestParsers (siren) 10. TIKA-8 - Replaced the jmimeinfo dependency with a trivial mime type detector (jukka) 11. TIKA-7 - Added the Lius Lite code. Added missing dependencies to POM (jukka) 12. TIKA-18 - "Office" interface should be renamed "MSOffice" (mattmann) 13. TIKA-23 - Decouple Parser from ParserConfig (jukka) 14. TIKA-6 - Port Nutch (or better) MimeType detection system into Tika (J. Charron & mattmann) 15. TIKA-25 - Removed hardcoded reference to C:\oo.xml in OpenOfficeParser (K. Bennett & jukka) 16. TIKA-17 - Need to support URL's for input resources. (K. Bennett & mattmann) 17. TIKA-22 - Remove @author tags from the java source (mattmann) 18. TIKA-21 - Simplified configuration code (jukka) 19. TIKA-17 - Rename all "Lius" classes to be "Tika" classes (jukka) 20. TIKA-30 - Added utility constructors to TikaConfig (K. Bennett & jukka) 21. TIKA-28 - Rename config.xml to tika-config.xml or similar (mattmann) 22. TIKA-26 - Use Map instead of List (jukka) 23. TIKA-31 - protected Parser.parse(InputStream stream, Iterable contents) (jukka & K. Bennett) 24. TIKA-36 - A convenience method for getting a document's content's text would be helpful (K. Bennett & mattmann) 25. TIKA-33 - Stateless parsers (jukka) 26. TIKA-38 - TXTParser adds a space to the content it reads from a file (K. Bennett & ridabenjelloun) 27. TIKA-35 - Extract MsOffice properties, use RereadableInputStream devloped by K. Bennett (ridabenjelloun & K. Bennett) 28. TIKA-39 - Excel parsing improvements (siren & ridabenjelloun) 29. TIKA-34 - Provide a method that will return a default configuration (TikaConfig) (K. Bennett & mattmann) 30. TIKA-42 - Content class needs (String, String, String) constructor (K. Bennett) 31. TIKA-43 - Parser interface (jukka) 32. TIKA-47 - Remove TikaLogger (jukka) 33. TIKA-46 - Use Metadata in Parser (jukka & mattmann) 34. TIKA-48 - Merge MS Extractors and Parsers (jukka) 35. TIKA-45 - RereadableInputStream needs to be able to read to the end of the original stream on first rewind. (K. Bennett) 36. TIKA-41 - Resource files occur twice in jar file. (jukka) 37. TIKA-49 - Some files have old-style license headers, fixed (Robert Burrell Donkin & bdelacretaz) 38. TIKA-51 - Leftover temp files after running Tika tests, fixed (bdelacretaz) 39. TIKA-40 - Tika needs to support diverse character encodings (jukka) 40. TIKA-55 - ParseUtils.getParser() method variants should have consistent parameter orders (K. Bennett) 41. TIKA-52 - RereadableInputStream needs to support not closing the input stream it wraps. (K. Bennett via bdelacretaz) 42. TIKA-53 - XHTML SAX events from parsers (jukka) 43. TIKA-57 - Rename org.apache.tika.ms to org.apache.tika.parser.ms (jukka) 44. TIKA-62 - Use TikaConfig.getDefaultConfig() instead of a hardcoded config path in TestParsers (jukka) 45. TIKA-58 - Replace jtidy html parser with nekohtml based parser (siren) 46. TIKA-60 - Rename Microsoft parser classes (jukka) 47. TIKA-63 - Avoid multiple passes over the input stream in Microsoft parsers (jukka) 48. TIKA-66 - Use Java 5 features in org.apache.tika.mime (jukka) 49. TIKA-56 - Mime type detection fails with upper case file extensions such as "PDF" (mattmann) 50. TIKA-65 - Add encode detection support for HTML parser (siren) 51. TIKA-68 - Add dummy parser classes to be used as sentinels (jukka) 52. TIKA-67 - Add an auto-detecting Parser implementation (jukka) 53. TIKA-70 - Better MIME information for the Open Document formats (jukka) 54. TIKA-71 - Remove ParserConfig and ParserFactory (jukka) 55. TIKA-83 - Create a org.apache.tika.sax package for SAX utilities (jukka) 56. TIKA-84 - Add MimeTypes.getMimeType(InputStream) (jukka) 57. TIKA-85 - Add glob patterns from the ASF svn:eol-style documentation (jukka) 58. TIKA-100 - Structured PDF parsing (jukka) 59. TIKA-101 - Improve site and build (mattmann) 60. TIKA-102 - Parser implementations loading a large amount of content into a single String could be problematic (Niall Pemberton) 61. TIKA-107 - Remove use of assertions for argument checking (Niall Pemberton) 62. TIKA-104 - Add utility methods to throw IOException with the caused intialized (jukka & Niall Pemberton) 63. TIKA-106 - Remove dependency on Jakarta ORO - use JDK 1.4 Regex (Niall Pemberton) 64. TIKA-111 - Missing license headers (jukka)