Class EmpiricalDistributionImpl

  • All Implemented Interfaces:
    java.io.Serializable, EmpiricalDistribution

    public class EmpiricalDistributionImpl
    extends java.lang.Object
    implements java.io.Serializable, EmpiricalDistribution
    Implements EmpiricalDistribution interface. This implementation uses what amounts to the Variable Kernel Method with Gaussian smoothing:

    Digesting the input file

    1. Pass the file once to compute min and max.
    2. Divide the range from min-max into binCount "bins."
    3. Pass the data file again, computing bin counts and univariate statistics (mean, std dev.) for each of the bins
    4. Divide the interval (0,1) into subintervals associated with the bins, with the length of a bin's subinterval proportional to its count.
    Generating random values from the distribution
    1. Generate a uniformly distributed value in (0,1)
    2. Select the subinterval to which the value belongs.
    3. Generate a random Gaussian value with mean = mean of the associated bin and std dev = std dev of associated bin.

    USAGE NOTES:

    • The binCount is set by default to 1000. A good rule of thumb is to set the bin count to approximately the length of the input file divided by 10.
    • The input file must be a plain text file containing one valid numeric entry per line.

    See Also:
    Serialized Form
    • Constructor Summary

      Constructors 
      Constructor Description
      EmpiricalDistributionImpl()
      Creates a new EmpiricalDistribution with the default bin count.
      EmpiricalDistributionImpl​(int binCount)
      Creates a new EmpiricalDistribution with the specified bin count.
    • Method Summary

      All Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      int getBinCount()
      Returns the number of bins.
      java.util.List<SummaryStatistics> getBinStats()
      Returns a List of SummaryStatistics instances containing statistics describing the values in each of the bins.
      double[] getGeneratorUpperBounds()
      Returns a fresh copy of the array of upper bounds of the subintervals of [0,1] used in generating data from the empirical distribution.
      double getNextValue()
      Generates a random value from this distribution.
      StatisticalSummary getSampleStats()
      Returns a StatisticalSummary describing this distribution.
      double[] getUpperBounds()
      Returns a fresh copy of the array of upper bounds for the bins.
      boolean isLoaded()
      Property indicating whether or not the distribution has been loaded.
      void load​(double[] in)
      Computes the empirical distribution from the provided array of numbers.
      void load​(java.io.File file)
      Computes the empirical distribution from the input file.
      void load​(java.net.URL url)
      Computes the empirical distribution using data read from a URL.
      • Methods inherited from class java.lang.Object

        equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Constructor Detail

      • EmpiricalDistributionImpl

        public EmpiricalDistributionImpl()
        Creates a new EmpiricalDistribution with the default bin count.
      • EmpiricalDistributionImpl

        public EmpiricalDistributionImpl​(int binCount)
        Creates a new EmpiricalDistribution with the specified bin count.
        Parameters:
        binCount - number of bins
    • Method Detail

      • load

        public void load​(double[] in)
        Computes the empirical distribution from the provided array of numbers.
        Specified by:
        load in interface EmpiricalDistribution
        Parameters:
        in - the input data array
      • load

        public void load​(java.net.URL url)
                  throws java.io.IOException
        Computes the empirical distribution using data read from a URL.
        Specified by:
        load in interface EmpiricalDistribution
        Parameters:
        url - url of the input file
        Throws:
        java.io.IOException - if an IO error occurs
      • load

        public void load​(java.io.File file)
                  throws java.io.IOException
        Computes the empirical distribution from the input file.
        Specified by:
        load in interface EmpiricalDistribution
        Parameters:
        file - the input file
        Throws:
        java.io.IOException - if an IO error occurs
      • getNextValue

        public double getNextValue()
                            throws java.lang.IllegalStateException
        Generates a random value from this distribution.
        Specified by:
        getNextValue in interface EmpiricalDistribution
        Returns:
        the random value.
        Throws:
        java.lang.IllegalStateException - if the distribution has not been loaded
      • getSampleStats

        public StatisticalSummary getSampleStats()
        Returns a StatisticalSummary describing this distribution. Preconditions:
        • the distribution must be loaded before invoking this method
        Specified by:
        getSampleStats in interface EmpiricalDistribution
        Returns:
        the sample statistics
        Throws:
        java.lang.IllegalStateException - if the distribution has not been loaded
      • getBinCount

        public int getBinCount()
        Returns the number of bins.
        Specified by:
        getBinCount in interface EmpiricalDistribution
        Returns:
        the number of bins.
      • getUpperBounds

        public double[] getUpperBounds()

        Returns a fresh copy of the array of upper bounds for the bins. Bins are:
        [min,upperBounds[0]],(upperBounds[0],upperBounds[1]],..., (upperBounds[binCount-2], upperBounds[binCount-1] = max].

        Note: In versions 1.0-2.0 of commons-math, this method incorrectly returned the array of probability generator upper bounds now returned by getGeneratorUpperBounds().

        Specified by:
        getUpperBounds in interface EmpiricalDistribution
        Returns:
        array of bin upper bounds
        Since:
        2.1
      • getGeneratorUpperBounds

        public double[] getGeneratorUpperBounds()

        Returns a fresh copy of the array of upper bounds of the subintervals of [0,1] used in generating data from the empirical distribution. Subintervals correspond to bins with lengths proportional to bin counts.

        In versions 1.0-2.0 of commons-math, this array was (incorrectly) returned by getUpperBounds().

        Returns:
        array of upper bounds of subintervals used in data generation
        Since:
        2.1
      • isLoaded

        public boolean isLoaded()
        Property indicating whether or not the distribution has been loaded.
        Specified by:
        isLoaded in interface EmpiricalDistribution
        Returns:
        true if the distribution has been loaded