|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectorg.apache.mahout.df.data.Data
public class Data
Holds a list of vectors and their corresponding Dataset. contains various operations that deals with the vectors (subset, count,...)
Constructor Summary | |
---|---|
Data(Dataset dataset,
java.util.List<Instance> instances)
|
Method Summary | |
---|---|
Data |
bagging(java.util.Random rng)
if data has N cases, sample N cases at random -but with replacement. |
Data |
bagging(java.util.Random rng,
boolean[] sampled)
if data has N cases, sample N cases at random -but with replacement. |
Data |
clone()
|
boolean |
contains(Instance v)
|
void |
countLabels(int[] counts)
Counts the number of occurrences of each label value |
boolean |
equals(java.lang.Object obj)
|
int[] |
extractLabels()
extract the labels of all instances |
static int[] |
extractLabels(Dataset dataset,
org.apache.hadoop.fs.FileSystem fs,
org.apache.hadoop.fs.Path path)
extract the labels of all instances from a data file |
Instance |
get(int index)
Returns the element at the specified position |
Dataset |
getDataset()
|
int |
hashCode()
|
boolean |
identicalLabel()
checks if all the vectors have identical label values |
int |
indexof(Instance v)
|
boolean |
isEmpty()
|
boolean |
isIdentical()
checks if all the vectors have identical attribute values |
int |
majorityLabel(java.util.Random rng)
finds the majority label, breaking ties randomly |
Data |
rsplit(java.util.Random rng,
int subsize)
Splits the data in two, returns one part, and this gets the rest of the data. |
Data |
rsubset(java.util.Random rng,
double ratio)
|
int |
size()
|
Data |
subset(Condition condition)
|
double[] |
values(int attr)
finds all distinct values of a given attribute |
Methods inherited from class java.lang.Object |
---|
finalize, getClass, notify, notifyAll, toString, wait, wait, wait |
Constructor Detail |
---|
public Data(Dataset dataset, java.util.List<Instance> instances)
Method Detail |
---|
public int size()
public boolean isEmpty()
public boolean contains(Instance v)
v
- element whose presence in this list if to be searched
public int indexof(Instance v)
v
- element to search for
public Instance get(int index)
index
- index of element to return
java.lang.IndexOutOfBoundsException
- if the index is out of rangepublic Data subset(Condition condition)
public Data rsubset(java.util.Random rng, double ratio)
rng
- Random number generatorratio
- [0,1]
public Data bagging(java.util.Random rng)
rng
- public Data bagging(java.util.Random rng, boolean[] sampled)
rng
- sampled
- indicating which instance has been sampled
public Data rsplit(java.util.Random rng, int subsize)
rng
- public boolean isIdentical()
public boolean identicalLabel()
public double[] values(int attr)
attr
- public Data clone()
clone
in class java.lang.Object
public boolean equals(java.lang.Object obj)
equals
in class java.lang.Object
public int hashCode()
hashCode
in class java.lang.Object
public int[] extractLabels()
public static int[] extractLabels(Dataset dataset, org.apache.hadoop.fs.FileSystem fs, org.apache.hadoop.fs.Path path) throws java.io.IOException
dataset
- fs
- file systempath
- data path
java.io.IOException
public int majorityLabel(java.util.Random rng)
public void countLabels(int[] counts)
counts
- will contain the results, supposed to be initialized at 0public Dataset getDataset()
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |