public class EncodingDetector
extends java.lang.Object
Broadly this encompasses two functions, which are distinctly separate:
A caller will often have some extra information about what the encoding might be (e.g. from the HTTP header or HTML meta-tags, often wrong but still potentially useful clues). The types of clues may differ from caller to caller. Thus a typical calling sequence is:
Modifier and Type | Field and Description |
---|---|
static java.lang.String |
MIN_CONFIDENCE_KEY |
static int |
NO_THRESHOLD |
Constructor and Description |
---|
EncodingDetector(Configuration conf) |
Modifier and Type | Method and Description |
---|---|
void |
addClue(java.lang.String value,
java.lang.String source) |
void |
addClue(java.lang.String value,
java.lang.String source,
int confidence) |
void |
autoDetectClues(Content content,
boolean filter) |
void |
clearClues()
Clears all clues.
|
java.lang.String |
guessEncoding(Content content,
java.lang.String defaultValue)
Guess the encoding with the previously specified list of clues.
|
static void |
main(java.lang.String[] args) |
static java.lang.String |
parseCharacterEncoding(java.lang.String contentType)
Parse the character encoding from the specified content type header.
|
static java.lang.String |
resolveEncodingAlias(java.lang.String encoding) |
public static final int NO_THRESHOLD
public static final java.lang.String MIN_CONFIDENCE_KEY
public EncodingDetector(Configuration conf)
public void autoDetectClues(Content content, boolean filter)
public void addClue(java.lang.String value, java.lang.String source, int confidence)
public void addClue(java.lang.String value, java.lang.String source)
public java.lang.String guessEncoding(Content content, java.lang.String defaultValue)
content
- Content instancedefaultValue
- Default encoding to return if no encoding can be detected with
enough confidence. Note that this will not be normalized
with resolveEncodingAlias(java.lang.String)
public void clearClues()
public static java.lang.String resolveEncodingAlias(java.lang.String encoding)
public static java.lang.String parseCharacterEncoding(java.lang.String contentType)
null
is returned. contentType
- a content type headerpublic static void main(java.lang.String[] args) throws java.io.IOException
java.io.IOException
Copyright © 2018 The Apache Software Foundation