|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectorg.apache.mahout.classifier.sgd.TrainNewsGroups
public final class TrainNewsGroups
Reads and trains an adaptive logistic regression model on the 20 newsgroups data. The first command line argument gives the path of the directory holding the training data. The optional second argument, leakType, defines which classes of features to use. Importantly, leakType controls whether a synthetic date is injected into the data as a target leak and if so, how.
The value of leakType % 3 determines whether the target leak is injected according to the following table:
0 | No leak injected |
1 | Synthetic date injected in MMM-yyyy format. This will be a single token and is a perfect target leak since each newsgroup is given a different month |
2 | Synthetic date injected in dd-MMM-yyyy HH:mm:ss format. The day varies and thus there are more leak symbols that need to be learned. Ultimately this is just as big a leak as case 1. |
Leaktype also determines what other text will be indexed. If leakType is greater than or equal to 6, then neither headers nor text body will be used for features and the leak is the only source of data. If leakType is greater than or equal to 3, then subject words will be used as features. If leakType is less than 3, then both subject and body text will be used as features.
A leakType of 0 gives no leak and all textual features.
See the following table for a summary of commonly used values for leakType
leakType | Leak? | Subject? | Body? |
0 | no | yes | yes |
1 | mmm-yyyy | yes | yes |
2 | dd-mmm-yyyy | yes | yes |
3 | no | yes | no |
4 | mmm-yyyy | yes | no |
5 | dd-mmm-yyyy | yes | no |
6 | no | no | no |
7 | mmm-yyyy | no | no |
8 | dd-mmm-yyyy | no | no |
Method Summary | |
---|---|
static void |
main(String[] args)
|
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Method Detail |
---|
public static void main(String[] args) throws IOException
IOException
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |