org.apache.mahout.df.data
Class Dataset

java.lang.Object
  extended by org.apache.mahout.df.data.Dataset
All Implemented Interfaces:
org.apache.hadoop.io.Writable

public class Dataset
extends java.lang.Object
implements org.apache.hadoop.io.Writable

Contains informations about the attributes.


Nested Class Summary
static class Dataset.Attribute
          Attributes type
 
Constructor Summary
protected Dataset(Dataset.Attribute[] attrs, java.util.List<java.lang.String>[] values, int nbInstances)
          Should only be called by a DataLoader
 
Method Summary
protected static int countAttributes(Dataset.Attribute[] attrs)
          Counts the number of attributes, except IGNORED and LABEL
 boolean equals(java.lang.Object obj)
           
 int[] getIgnored()
           
 java.lang.String getLabel(int code)
           
 int getLabelId()
           
 int hashCode()
           
 boolean isNumerical(int attr)
          Is this a numerical attribute ?
 int labelCode(java.lang.String label)
          Returns the code used to represent the label value in the data
 java.lang.String[] labels()
           
static Dataset load(org.apache.hadoop.conf.Configuration conf, org.apache.hadoop.fs.Path path)
          Loads the dataset from a file
 int nbAttributes()
           
 int nbInstances()
           
 int nblabels()
           
static Dataset read(java.io.DataInput in)
           
 void readFields(java.io.DataInput in)
           
 int valueOf(int attr, java.lang.String token)
          Converts a token to its corresponding int code for a given attribute
 void write(java.io.DataOutput out)
           
 
Methods inherited from class java.lang.Object
clone, finalize, getClass, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

Dataset

protected Dataset(Dataset.Attribute[] attrs,
                  java.util.List<java.lang.String>[] values,
                  int nbInstances)
Should only be called by a DataLoader

Parameters:
attrs - attributes description
values - distinct values for all CATEGORICAL attributes
nbInstances -
Method Detail

labels

public java.lang.String[] labels()

nblabels

public int nblabels()

getLabelId

public int getLabelId()

nbInstances

public int nbInstances()

labelCode

public int labelCode(java.lang.String label)
Returns the code used to represent the label value in the data

Parameters:
label - label's value to code
Returns:
label's code

getLabel

public java.lang.String getLabel(int code)

valueOf

public int valueOf(int attr,
                   java.lang.String token)
Converts a token to its corresponding int code for a given attribute

Parameters:
attr - attribute's index

getIgnored

public int[] getIgnored()

countAttributes

protected static int countAttributes(Dataset.Attribute[] attrs)
Counts the number of attributes, except IGNORED and LABEL

Returns:
number of attributes that are not IGNORED or LABEL

nbAttributes

public int nbAttributes()
Returns:
number of attributes

isNumerical

public boolean isNumerical(int attr)
Is this a numerical attribute ?

Parameters:
attr - index of the attribute to check
Returns:
true if the attribute is numerical

equals

public boolean equals(java.lang.Object obj)
Overrides:
equals in class java.lang.Object

hashCode

public int hashCode()
Overrides:
hashCode in class java.lang.Object

load

public static Dataset load(org.apache.hadoop.conf.Configuration conf,
                           org.apache.hadoop.fs.Path path)
                    throws java.io.IOException
Loads the dataset from a file

Throws:
java.io.IOException

read

public static Dataset read(java.io.DataInput in)
                    throws java.io.IOException
Throws:
java.io.IOException

readFields

public void readFields(java.io.DataInput in)
                throws java.io.IOException
Specified by:
readFields in interface org.apache.hadoop.io.Writable
Throws:
java.io.IOException

write

public void write(java.io.DataOutput out)
           throws java.io.IOException
Specified by:
write in interface org.apache.hadoop.io.Writable
Throws:
java.io.IOException


Copyright © 2008-2010 The Apache Software Foundation. All Rights Reserved.