org.apache.mahout.ga.watchmaker.cd.hadoop
Class DatasetSplit

java.lang.Object
  extended by org.apache.mahout.ga.watchmaker.cd.hadoop.DatasetSplit

public class DatasetSplit
extends java.lang.Object

Separate the input data into a training and testing set.


Nested Class Summary
static class DatasetSplit.DatasetTextInputFormat
          org.apache.hadoop.mapred.TextInputFormat TextInputFormat that uses a {@link RndLineRecordReader RndLineRecordReader} as a RecordReader
static class DatasetSplit.RndLineRecordReader
          a LineRecordReader that skips some lines from the input.
 
Constructor Summary
DatasetSplit(double threshold)
           
DatasetSplit(org.apache.hadoop.mapred.JobConf conf)
           
DatasetSplit(long seed, double threshold)
           
 
Method Summary
 long getSeed()
           
 double getThreshold()
           
 boolean isTraining()
           
 void setTraining(boolean training)
           
 void storeJobParameters(org.apache.hadoop.mapred.JobConf conf)
           
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

DatasetSplit

public DatasetSplit(long seed,
                    double threshold)
Parameters:
seed -
threshold - fraction of the total dataset that will be used for training

DatasetSplit

public DatasetSplit(double threshold)

DatasetSplit

public DatasetSplit(org.apache.hadoop.mapred.JobConf conf)
Method Detail

getSeed

public long getSeed()

getThreshold

public double getThreshold()

isTraining

public boolean isTraining()

setTraining

public void setTraining(boolean training)

storeJobParameters

public void storeJobParameters(org.apache.hadoop.mapred.JobConf conf)


Copyright © 2008-2010 The Apache Software Foundation. All Rights Reserved.