org.apache.mahout.classifier.df.data
Class Dataset

java.lang.Object
  extended by org.apache.mahout.classifier.df.data.Dataset
All Implemented Interfaces:
org.apache.hadoop.io.Writable

public class Dataset
extends Object
implements org.apache.hadoop.io.Writable

Contains informations about the attributes.


Nested Class Summary
static class Dataset.Attribute
          Attributes type
 
Method Summary
 boolean equals(Object obj)
           
 int[] getIgnored()
           
 double getLabel(Instance instance)
           
 int getLabelId()
           
 String getLabelString(double code)
          Returns the label value in the data This method can be used when the criterion variable is the categorical attribute.
 int hashCode()
           
 boolean isNumerical(int attr)
          Is this a numerical attribute ?
 int labelCode(String label)
          Returns the code used to represent the label value in the data
 String[] labels()
           
static Dataset load(org.apache.hadoop.conf.Configuration conf, org.apache.hadoop.fs.Path path)
          Loads the dataset from a file
 int nbAttributes()
           
 int nbInstances()
           
 int nblabels()
           
 int nbValues(int attr)
           
static Dataset read(DataInput in)
           
 void readFields(DataInput in)
           
 int valueOf(int attr, String token)
          Converts a token to its corresponding int code for a given attribute
 void write(DataOutput out)
           
 
Methods inherited from class java.lang.Object
clone, finalize, getClass, notify, notifyAll, toString, wait, wait, wait
 

Method Detail

nbValues

public int nbValues(int attr)

labels

public String[] labels()

nblabels

public int nblabels()

getLabelId

public int getLabelId()

getLabel

public double getLabel(Instance instance)

nbInstances

public int nbInstances()

labelCode

public int labelCode(String label)
Returns the code used to represent the label value in the data

Parameters:
label - label's value to code
Returns:
label's code

getLabelString

public String getLabelString(double code)
Returns the label value in the data This method can be used when the criterion variable is the categorical attribute.

Parameters:
code - label's code
Returns:
label's value

valueOf

public int valueOf(int attr,
                   String token)
Converts a token to its corresponding int code for a given attribute

Parameters:
attr - attribute's index

getIgnored

public int[] getIgnored()

nbAttributes

public int nbAttributes()
Returns:
number of attributes

isNumerical

public boolean isNumerical(int attr)
Is this a numerical attribute ?

Parameters:
attr - index of the attribute to check
Returns:
true if the attribute is numerical

equals

public boolean equals(Object obj)
Overrides:
equals in class Object

hashCode

public int hashCode()
Overrides:
hashCode in class Object

load

public static Dataset load(org.apache.hadoop.conf.Configuration conf,
                           org.apache.hadoop.fs.Path path)
                    throws IOException
Loads the dataset from a file

Throws:
IOException

read

public static Dataset read(DataInput in)
                    throws IOException
Throws:
IOException

readFields

public void readFields(DataInput in)
                throws IOException
Specified by:
readFields in interface org.apache.hadoop.io.Writable
Throws:
IOException

write

public void write(DataOutput out)
           throws IOException
Specified by:
write in interface org.apache.hadoop.io.Writable
Throws:
IOException


Copyright © 2008-2012 The Apache Software Foundation. All Rights Reserved.