|
|||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectnet.nutch.analysis.lang.NGramProfile
This class runs a ngram analysis over submitted text, results might be used for automatic language identifiaction. The similarity calculation is at experimental level. You have been warned. Methods are provided to build new NGramProfiles profiles.
Field Summary | |
static Logger |
LOG
|
Constructor Summary | |
NGramProfile(String name)
Construct a new ngram profile |
|
NGramProfile(String name,
int minlen,
int maxlen)
Construct a new ngram profile |
Method Summary | |
void |
addFromToken(Token t)
Add ngrams from a token to this profile |
void |
addNGrams(StringBuffer word)
Add ngrams from a single word to this profile |
void |
analyze(StringBuffer text)
Analyze a piece of text |
static NGramProfile |
createNgramProfile(String name,
InputStream is,
String encoding)
Create a new Language profile from (preferably quite large) text file |
String |
getName()
|
float |
getSimilarity(NGramProfile another)
Calculate a score how well NGramProfiles match each other |
Vector |
getSorted()
Return sorted vector of ngrams (sort done by 1. |
void |
load(InputStream is)
Loads a ngram profile from InputStream (assumes UTF-8 encoded content) |
static void |
main(String[] args)
main method used for testing only |
protected void |
normalize()
Normalize profile |
void |
save(OutputStream os)
Writes NGramProfile content into OutputStream, content is outputted with UTF-8 encoding |
void |
setName(String name)
|
String |
toString()
Return ngramprofile as text |
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait |
Field Detail |
public static final Logger LOG
Constructor Detail |
public NGramProfile(String name)
name
- Name of profilepublic NGramProfile(String name, int minlen, int maxlen)
name
- Name of profileminlen
- min length of ngram sequencesmaxlen
- max length of ngram sequencesMethod Detail |
public void addFromToken(Token t)
t
- Token to be addedpublic void analyze(StringBuffer text)
text
- the text to be analyzedprotected void normalize()
public void addNGrams(StringBuffer word)
word
- public Vector getSorted()
public String toString()
public float getSimilarity(NGramProfile another)
another
- ngram profile to compare against
public void load(InputStream is) throws IOException
IOException
public static NGramProfile createNgramProfile(String name, InputStream is, String encoding)
name
- name of profileis
- encoding
- encoding of streampublic void save(OutputStream os) throws IOException
os
- Stream to output to
IOException
public static void main(String[] args)
args
- public String getName()
public void setName(String name)
name
- The name to set.
|
|||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |