Interface UnknownDistributionChiSquareTest

  • All Superinterfaces:
    ChiSquareTest
    All Known Implementing Classes:
    ChiSquareTestImpl

    public interface UnknownDistributionChiSquareTest
    extends ChiSquareTest
    An interface for Chi-Square tests for unknown distributions.

    Two samples tests are used when the distribution is unknown a priori but provided by one sample. We compare the second sample against the first.

    Since:
    1.2
    • Method Detail

      • chiSquareDataSetsComparison

        double chiSquareDataSetsComparison​(long[] observed1,
                                           long[] observed2)
                                    throws java.lang.IllegalArgumentException

        Computes a Chi-Square two sample test statistic comparing bin frequency counts in observed1 and observed2. The sums of frequency counts in the two samples are not required to be the same. The formula used to compute the test statistic is

        ∑[(K * observed1[i] - observed2[i]/K)2 / (observed1[i] + observed2[i])] where
        K = &sqrt;[&sum(observed2 / ∑(observed1)]

        This statistic can be used to perform a Chi-Square test evaluating the null hypothesis that both observed counts follow the same distribution.

        Preconditions:

        • Observed counts must be non-negative.
        • Observed counts for a specific bin must not both be zero.
        • Observed counts for a specific sample must not all be 0.
        • The arrays observed1 and observed2 must have the same length and their common length must be at least 2.

        If any of the preconditions are not met, an IllegalArgumentException is thrown.

        Parameters:
        observed1 - array of observed frequency counts of the first data set
        observed2 - array of observed frequency counts of the second data set
        Returns:
        chiSquare statistic
        Throws:
        java.lang.IllegalArgumentException - if preconditions are not met
      • chiSquareTestDataSetsComparison

        double chiSquareTestDataSetsComparison​(long[] observed1,
                                               long[] observed2)
                                        throws java.lang.IllegalArgumentException,
                                               MathException

        Returns the observed significance level, or p-value, associated with a Chi-Square two sample test comparing bin frequency counts in observed1 and observed2.

        The number returned is the smallest significance level at which one can reject the null hypothesis that the observed counts conform to the same distribution.

        See chiSquareDataSetsComparison(long[], long[]) for details on the formula used to compute the test statistic. The degrees of of freedom used to perform the test is one less than the common length of the input observed count arrays.

        Preconditions:
        • Observed counts must be non-negative.
        • Observed counts for a specific bin must not both be zero.
        • Observed counts for a specific sample must not all be 0.
        • The arrays observed1 and observed2 must have the same length and their common length must be at least 2.

        If any of the preconditions are not met, an IllegalArgumentException is thrown.

        Parameters:
        observed1 - array of observed frequency counts of the first data set
        observed2 - array of observed frequency counts of the second data set
        Returns:
        p-value
        Throws:
        java.lang.IllegalArgumentException - if preconditions are not met
        MathException - if an error occurs computing the p-value
      • chiSquareTestDataSetsComparison

        boolean chiSquareTestDataSetsComparison​(long[] observed1,
                                                long[] observed2,
                                                double alpha)
                                         throws java.lang.IllegalArgumentException,
                                                MathException

        Performs a Chi-Square two sample test comparing two binned data sets. The test evaluates the null hypothesis that the two lists of observed counts conform to the same frequency distribution, with significance level alpha. Returns true iff the null hypothesis can be rejected with 100 * (1 - alpha) percent confidence.

        See chiSquareDataSetsComparison(long[], long[]) for details on the formula used to compute the Chisquare statistic used in the test. The degrees of of freedom used to perform the test is one less than the common length of the input observed count arrays.

        Preconditions:
        • Observed counts must be non-negative.
        • Observed counts for a specific bin must not both be zero.
        • Observed counts for a specific sample must not all be 0.
        • The arrays observed1 and observed2 must have the same length and their common length must be at least 2.
        • 0 < alpha < 0.5

        If any of the preconditions are not met, an IllegalArgumentException is thrown.

        Parameters:
        observed1 - array of observed frequency counts of the first data set
        observed2 - array of observed frequency counts of the second data set
        alpha - significance level of the test
        Returns:
        true iff null hypothesis can be rejected with confidence 1 - alpha
        Throws:
        java.lang.IllegalArgumentException - if preconditions are not met
        MathException - if an error occurs performing the test