org.apache.mahout.classifier.sgd
Class CsvRecordFactory

java.lang.Object
  extended by org.apache.mahout.classifier.sgd.CsvRecordFactory
All Implemented Interfaces:
RecordFactory

public class CsvRecordFactory
extends java.lang.Object
implements RecordFactory

Converts csv data lines to vectors. Use of this class proceeds in a few steps.


Constructor Summary
CsvRecordFactory(java.lang.String targetName, java.util.Map<java.lang.String,java.lang.String> typeMap)
          Construct a parser for CSV lines that encodes the parsed data in vector form.
 
Method Summary
 void defineTargetCategories(java.util.List<java.lang.String> values)
          Defines the values and thus the encoding of values of the target variables.
 void firstLine(java.lang.String line)
          Processes the first line of a file (which should contain the variable names).
 java.lang.Iterable<java.lang.String> getPredictors()
          Returns a list of the names of the predictor variables.
 java.util.List<java.lang.String> getTargetCategories()
           
 java.util.Map<java.lang.String,java.util.Set<java.lang.Integer>> getTraceDictionary()
           
 CsvRecordFactory includeBiasTerm(boolean useBias)
           
 CsvRecordFactory maxTargetValue(int max)
          Defines the number of target variable categories, but allows this parser to pick encodings for them as they appear.
 int processLine(java.lang.String line, Vector featureVector)
          Decodes a single line of csv data and records the target and predictor variables in a record.
 boolean usesFirstLineAsSchema()
           
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

CsvRecordFactory

public CsvRecordFactory(java.lang.String targetName,
                        java.util.Map<java.lang.String,java.lang.String> typeMap)
Construct a parser for CSV lines that encodes the parsed data in vector form.

Parameters:
targetName - The name of the target variable.
typeMap - A map describing the types of the predictor variables.
Method Detail

defineTargetCategories

public void defineTargetCategories(java.util.List<java.lang.String> values)
Defines the values and thus the encoding of values of the target variables. Note that any values of the target variable not present in this list will be given the value of the last member of the list.

Specified by:
defineTargetCategories in interface RecordFactory
Parameters:
values - The values the target variable can have.

maxTargetValue

public CsvRecordFactory maxTargetValue(int max)
Defines the number of target variable categories, but allows this parser to pick encodings for them as they appear.

Specified by:
maxTargetValue in interface RecordFactory
Parameters:
max - The number of categories that will be excpeted. Once this many have been seen, all others will get the encoding max-1.

usesFirstLineAsSchema

public boolean usesFirstLineAsSchema()
Specified by:
usesFirstLineAsSchema in interface RecordFactory

firstLine

public void firstLine(java.lang.String line)
Processes the first line of a file (which should contain the variable names). The target and predictor column numbers are set from the names on this line.

Specified by:
firstLine in interface RecordFactory
Parameters:
line - Header line for the file.

processLine

public int processLine(java.lang.String line,
                       Vector featureVector)
Decodes a single line of csv data and records the target and predictor variables in a record. As a side effect, features are added into the featureVector. Returns the value of the target variable.

Specified by:
processLine in interface RecordFactory
Parameters:
line - The raw data.
featureVector - Where to fill in the features. Should be zeroed before calling processLine.
Returns:
The value of the target variable.

getPredictors

public java.lang.Iterable<java.lang.String> getPredictors()
Returns a list of the names of the predictor variables.

Specified by:
getPredictors in interface RecordFactory
Returns:
A list of variable names.

getTraceDictionary

public java.util.Map<java.lang.String,java.util.Set<java.lang.Integer>> getTraceDictionary()
Specified by:
getTraceDictionary in interface RecordFactory

includeBiasTerm

public CsvRecordFactory includeBiasTerm(boolean useBias)
Specified by:
includeBiasTerm in interface RecordFactory

getTargetCategories

public java.util.List<java.lang.String> getTargetCategories()
Specified by:
getTargetCategories in interface RecordFactory


Copyright © 2008-2010 The Apache Software Foundation. All Rights Reserved.