|
|||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectorg.apache.mailet.GenericMailet
org.apache.james.transport.mailets.BayesianAnalysis
Spam detection mailet using bayesian analysis techniques.
Sets an email message header indicating the probability that an email message is SPAM.
Based upon the principals described in: A Plan For Spam by Paul Graham. Extended to Paul Grahams' Better Bayesian Filtering.
The analysis capabilities are based on token frequencies (the Corpus)
learned through a training process (see BayesianAnalysisFeeder
)
and stored in a JDBC database.
After a training session, the Corpus must be rebuilt from the database in order to
acquire the new frequencies.
Every 10 minutes a special thread in this mailet will check if any
change was made to the database by the feeder, and rebuild the corpus if necessary.
A org.apache.james.spam.probability
mail attribute will be created
containing the computed spam probability as a Double
.
The headerName
message header string will be created containing such
probability in floating point representation.
Sample configuration:
<mailet match="All" class="BayesianAnalysis">
<repositoryPath>db://maildb</repositoryPath>
<!--
Set this to the header name to add with the spam probability
(default is "X-MessageIsSpamProbability").
-->
<headerName>X-MessageIsSpamProbability</headerName>
<!--
Set this to true if you want to ignore messages coming from local senders
(default is false).
By local sender we mean a return-path with a local server part (server listed
in <servernames> in config.xml).
-->
<ignoreLocalSender>true</ignoreLocalSender>
<!--
Set this to the maximum message size (in bytes) that a message may have
to be considered spam (default is 100000).
-->
<maxSize>100000</maxSize>
</mailet>
The probability of being spam is pre-pended to the subject if it is > 0.1 (10%).
The required tables are automatically created if not already there (see sqlResources.xml). The token field in both the ham and spam tables is case sensitive.
BayesianAnalysisFeeder
,
BayesianAnalyzer
,
JDBCBayesianAnalyzer
Constructor Summary | |
BayesianAnalysis()
|
Method Summary | |
long |
getLastCorpusLoadTime()
Getter for property lastCorpusLoadTime. |
String |
getMailetInfo()
Return a string describing this mailet. |
int |
getMaxSize()
Getter for property maxSize. |
void |
init()
Mailet initialization routine. |
void |
service(Mail mail)
Scans the mail and determines the spam probability. |
void |
setMaxSize(int maxSize)
Setter for property maxSize. |
Methods inherited from class org.apache.mailet.GenericMailet |
destroy, getInitParameter, getInitParameter, getInitParameterNames, getMailetConfig, getMailetContext, getMailetName, init, log, log |
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Constructor Detail |
public BayesianAnalysis()
Method Detail |
public String getMailetInfo()
getMailetInfo
in interface Mailet
getMailetInfo
in class GenericMailet
public int getMaxSize()
public void setMaxSize(int maxSize)
maxSize
- New value of property maxSize.public long getLastCorpusLoadTime()
public void init() throws MessagingException
init
in class GenericMailet
MessagingException
- if a problem arisespublic void service(Mail mail) throws MessagingException
service
in interface Mailet
service
in class GenericMailet
mail
- The Mail message to be scanned.
MessagingException
- if a problem arises
|
|||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |