org.apache.mahout.df.data
Class DataLoader

java.lang.Object
  extended by org.apache.mahout.df.data.DataLoader

public class DataLoader
extends java.lang.Object

Converts the input data to a Vector Array using the information given by the Dataset.
Generates for each line a Vector that contains :


adds an IGNORED first attribute that will contain a unique id for each instance, which is the line number of the instance in the input data


Method Summary
protected static Data constructData(Dataset.Attribute[] attrs, java.util.List<Instance> vectors, java.util.List<java.lang.String>[] values)
          constructs the data
static Dataset generateDataset(java.lang.String descriptor, org.apache.hadoop.fs.FileSystem fs, org.apache.hadoop.fs.Path path)
          Generates the Dataset by parsing the entire data
static Dataset generateDataset(java.lang.String descriptor, java.lang.String[] data)
          Generates the Dataset by parsing the entire data
static Data loadData(Dataset dataset, org.apache.hadoop.fs.FileSystem fs, org.apache.hadoop.fs.Path fpath)
          Loads the data from a file
static Data loadData(Dataset dataset, java.lang.String[] data)
          Loads the data from a String array
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Method Detail

loadData

public static Data loadData(Dataset dataset,
                            org.apache.hadoop.fs.FileSystem fs,
                            org.apache.hadoop.fs.Path fpath)
                     throws java.io.IOException
Loads the data from a file

Parameters:
dataset -
fs - file system
fpath - data file path
Returns:
Throws:
java.io.IOException - if any problem is encountered

loadData

public static Data loadData(Dataset dataset,
                            java.lang.String[] data)
Loads the data from a String array


generateDataset

public static Dataset generateDataset(java.lang.String descriptor,
                                      org.apache.hadoop.fs.FileSystem fs,
                                      org.apache.hadoop.fs.Path path)
                               throws DescriptorException,
                                      java.io.IOException
Generates the Dataset by parsing the entire data

Parameters:
descriptor - attributes description
fs - file system
path - data path
Throws:
DescriptorException
java.io.IOException

generateDataset

public static Dataset generateDataset(java.lang.String descriptor,
                                      java.lang.String[] data)
                               throws DescriptorException
Generates the Dataset by parsing the entire data

Parameters:
descriptor - attributes description
data -
Throws:
DescriptorException

constructData

protected static Data constructData(Dataset.Attribute[] attrs,
                                    java.util.List<Instance> vectors,
                                    java.util.List<java.lang.String>[] values)
constructs the data

Parameters:
attrs - attributes description
vectors - data elements
values - used to convert CATEGORICAL attributes to Integer
Returns:
Throws:
java.lang.RuntimeException - if no LABEL is found in the attributes description


Copyright © 2008-2010 The Apache Software Foundation. All Rights Reserved.