org.apache.mahout.math.jet.stat.quantile
Class QuantileFinderFactory

java.lang.Object
  extended by org.apache.mahout.math.jet.stat.quantile.QuantileFinderFactory

Deprecated. until unit tests are in place. Until this time, this class/interface is unsupported.

@Deprecated
public class QuantileFinderFactory
extends java.lang.Object


Method Summary
protected static long[] known_N_compute_B_and_K_quick(long N, double epsilon)
          Deprecated. Computes the number of buffers and number of values per buffer such that quantiles can be determined with a guaranteed approximation error no more than epsilon.
protected static long[] known_N_compute_B_and_K_slow(long N, double epsilon, double delta, int quantiles, double[] returnSamplingRate)
          Deprecated. Computes the number of buffers and number of values per buffer such that quantiles can be determined with an approximation error no more than epsilon with a certain probability.
static long[] known_N_compute_B_and_K(long N, double epsilon, double delta, int quantiles, double[] returnSamplingRate)
          Deprecated. Computes the number of buffers and number of values per buffer such that quantiles can be determined with an approximation error no more than epsilon with a certain probability.
static DoubleQuantileFinder newDoubleQuantileFinder(boolean known_N, long N, double epsilon, double delta, int quantiles, RandomEngine generator)
          Deprecated. Returns a quantile finder that minimizes the amount of memory needed under the user provided constraints.
static DoubleArrayList newEquiDepthPhis(int quantiles)
          Deprecated. Convenience method that computes phi's for equi-depth histograms.
protected static long[] unknown_N_compute_B_and_K_raw(double epsilon, double delta, int quantiles)
          Deprecated. Computes the number of buffers and number of values per buffer such that quantiles can be determined with an approximation error no more than epsilon with a certain probability.
static long[] unknown_N_compute_B_and_K(double epsilon, double delta, int quantiles)
          Deprecated. Computes the number of buffers and number of values per buffer such that quantiles can be determined with an approximation error no more than epsilon with a certain probability.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Method Detail

known_N_compute_B_and_K

public static long[] known_N_compute_B_and_K(long N,
                                             double epsilon,
                                             double delta,
                                             int quantiles,
                                             double[] returnSamplingRate)
Deprecated. 
Computes the number of buffers and number of values per buffer such that quantiles can be determined with an approximation error no more than epsilon with a certain probability. Assumes that quantiles are to be computed over N values. The required sampling rate is computed and stored in the first element of the provided returnSamplingRate array, which, therefore must be at least of length 1.

Parameters:
N - the number of values over which quantiles shall be computed (e.g 10^6).
epsilon - the approximation error which is guaranteed not to be exceeded (e.g. 0.001) (0 <= epsilon <= 1). To get exact result, set epsilon=0.0;
delta - the probability that the approximation error is more than than epsilon (e.g. 0.0001) (0 <= delta <= 1). To avoid probabilistic answers, set delta=0.0.
quantiles - the number of quantiles to be computed (e.g. 100) (quantiles >= 1). If unknown in advance, set this number large, e.g. quantiles >= 10000.
returnSamplingRate - output parameter, a double[1] where the sampling rate is to be filled in.
Returns:
long[2] - long[0]=the number of buffers, long[1]=the number of elements per buffer, returnSamplingRate[0]=the required sampling rate.

known_N_compute_B_and_K_quick

protected static long[] known_N_compute_B_and_K_quick(long N,
                                                      double epsilon)
Deprecated. 
Computes the number of buffers and number of values per buffer such that quantiles can be determined with a guaranteed approximation error no more than epsilon. Assumes that quantiles are to be computed over N values.

Parameters:
N - the anticipated number of values over which quantiles shall be determined.
epsilon - the approximation error which is guaranteed not to be exceeded (e.g. 0.001) (0 <= epsilon <= 1). To get exact result, set epsilon=0.0;
Returns:
long[2] - long[0]=the number of buffers, long[1]=the number of elements per buffer.

known_N_compute_B_and_K_slow

protected static long[] known_N_compute_B_and_K_slow(long N,
                                                     double epsilon,
                                                     double delta,
                                                     int quantiles,
                                                     double[] returnSamplingRate)
Deprecated. 
Computes the number of buffers and number of values per buffer such that quantiles can be determined with an approximation error no more than epsilon with a certain probability. Assumes that quantiles are to be computed over N values. The required sampling rate is computed and stored in the first element of the provided returnSamplingRate array, which, therefore must be at least of length 1.

Parameters:
N - the anticipated number of values over which quantiles shall be computed (e.g 10^6).
epsilon - the approximation error which is guaranteed not to be exceeded (e.g. 0.001) (0 <= epsilon <= 1). To get exact result, set epsilon=0.0;
delta - the probability that the approximation error is more than than epsilon (e.g. 0.0001) (0 <= delta <= 1). To avoid probabilistic answers, set delta=0.0.
quantiles - the number of quantiles to be computed (e.g. 100) (quantiles >= 1). If unknown in advance, set this number large, e.g. quantiles >= 10000.
returnSamplingRate - a double[1] where the sampling rate is to be filled in.
Returns:
long[2] - long[0]=the number of buffers, long[1]=the number of elements per buffer, returnSamplingRate[0]=the required sampling rate.

newDoubleQuantileFinder

public static DoubleQuantileFinder newDoubleQuantileFinder(boolean known_N,
                                                           long N,
                                                           double epsilon,
                                                           double delta,
                                                           int quantiles,
                                                           RandomEngine generator)
Deprecated. 
Returns a quantile finder that minimizes the amount of memory needed under the user provided constraints. Many applications don't know in advance over how many elements quantiles are to be computed. However, some of them can give an upper limit, which will assist the factory in choosing quantile finders with minimal memory requirements. For example if you select values from a database and fill them into histograms, then you probably don't know how many values you will fill, but you probably do know that you will fill at most S elements, the size of your database.

Parameters:
known_N - specifies whether the number of elements over which quantiles are to be computed is known or not.
N - if known_N==true, the number of elements over which quantiles are to be computed. if known_N==false, the upper limit on the number of elements over which quantiles are to be computed. If such an upper limit is a-priori unknown, then set N = Long.MAX_VALUE.
epsilon - the approximation error which is guaranteed not to be exceeded (e.g. 0.001) (0 <= epsilon <= 1). To get exact result, set epsilon=0.0;
delta - the probability that the approximation error is more than than epsilon (e.g. 0.0001) (0 <= delta <= 1). To avoid probabilistic answers, set delta=0.0.
quantiles - the number of quantiles to be computed (e.g. 100) (quantiles >= 1). If unknown in advance, set this number large, e.g. quantiles >= 10000.
generator - a uniform random number generator. Set this parameter to null to use a default generator.
Returns:
the quantile finder minimizing memory requirements under the given constraints.

newEquiDepthPhis

public static DoubleArrayList newEquiDepthPhis(int quantiles)
Deprecated. 
Convenience method that computes phi's for equi-depth histograms. This is simply a list of numbers with i / (double)quantiles for i={1,2,...,quantiles-1}.

Returns:
the equi-depth phi's

unknown_N_compute_B_and_K

public static long[] unknown_N_compute_B_and_K(double epsilon,
                                               double delta,
                                               int quantiles)
Deprecated. 
Computes the number of buffers and number of values per buffer such that quantiles can be determined with an approximation error no more than epsilon with a certain probability.

Parameters:
epsilon - the approximation error which is guaranteed not to be exceeded (e.g. 0.001) (0 <= epsilon <= 1). To get exact results, set epsilon=0.0;
delta - the probability that the approximation error is more than than epsilon (e.g. 0.0001) (0 <= delta <= 1). To get exact results, set delta=0.0.
quantiles - the number of quantiles to be computed (e.g. 100) (quantiles >= 1). If unknown in advance, set this number large, e.g. quantiles >= 10000.
Returns:
long[4] - long[0]=the number of buffers, long[1]=the number of elements per buffer, long[2]=the tree height where sampling shall start, long[3]==1 if precomputing is better, otherwise 0;

unknown_N_compute_B_and_K_raw

protected static long[] unknown_N_compute_B_and_K_raw(double epsilon,
                                                      double delta,
                                                      int quantiles)
Deprecated. 
Computes the number of buffers and number of values per buffer such that quantiles can be determined with an approximation error no more than epsilon with a certain probability. You never need to call this method. It is only for curious users wanting to gain some insight into the workings of the algorithms.

Parameters:
epsilon - the approximation error which is guaranteed not to be exceeded (e.g. 0.001) (0 <= epsilon <= 1). To get exact result, set epsilon=0.0;
delta - the probability that the approximation error is more than than epsilon (e.g. 0.0001) (0 <= delta <= 1). To get exact results, set delta=0.0.
quantiles - the number of quantiles to be computed (e.g. 100) (quantiles >= 1). If unknown in advance, set this number large, e.g. quantiles >= 10000.
Returns:
long[4] - long[0]=the number of buffers, long[1]=the number of elements per buffer, long[2]=the tree height where sampling shall start, long[3]==1 if precomputing is better, otherwise 0;


Copyright © 2008-2010 The Apache Software Foundation. All Rights Reserved.