Class SweetSpotSimilarity


  • public class SweetSpotSimilarity
    extends DefaultSimilarity

    A similarity with a lengthNorm that provides for a "plateau" of equally good lengths, and tf helper functions.

    For lengthNorm, A min/max can be specified to define the plateau of lengths that should all have a norm of 1.0. Below the min, and above the max the lengthNorm drops off in a sqrt function.

    For tf, baselineTf and hyperbolicTf functions are provided, which subclasses can choose between.

    See Also:
    A Gnuplot file used to generate some of the visualizations refrenced from each function.
    • Constructor Detail

      • SweetSpotSimilarity

        public SweetSpotSimilarity()
    • Method Detail

      • setBaselineTfFactors

        public void setBaselineTfFactors​(float base,
                                         float min)
        Sets the baseline and minimum function variables for baselineTf
        See Also:
        baselineTf(float)
      • setHyperbolicTfFactors

        public void setHyperbolicTfFactors​(float min,
                                           float max,
                                           double base,
                                           float xoffset)
        Sets the function variables for the hyperbolicTf functions
        Parameters:
        min - the minimum tf value to ever be returned (default: 0.0)
        max - the maximum tf value to ever be returned (default: 2.0)
        base - the base value to be used in the exponential for the hyperbolic function (default: 1.3)
        xoffset - the midpoint of the hyperbolic function (default: 10.0)
        See Also:
        hyperbolicTf(float)
      • setLengthNormFactors

        public void setLengthNormFactors​(int min,
                                         int max,
                                         float steepness,
                                         boolean discountOverlaps)
        Sets the default function variables used by lengthNorm when no field specific variables have been set.
        See Also:
        computeLengthNorm(int)
      • lengthNorm

        public float lengthNorm​(FieldInvertState state)
        Implemented as state.getBoost() * computeLengthNorm(numTokens) where numTokens does not count overlap tokens if discountOverlaps is true by default or true for this specific field.
        Overrides:
        lengthNorm in class DefaultSimilarity
        Parameters:
        state - statistics of the current field (such as length, boost, etc)
        Returns:
        an index-time normalization value
      • computeLengthNorm

        public float computeLengthNorm​(int numTerms)
        Implemented as: 1/sqrt( steepness * (abs(x-min) + abs(x-max) - (max-min)) + 1 ) .

        This degrades to 1/sqrt(x) when min and max are both 1 and steepness is 0.5

        :TODO: potential optimization is to just flat out return 1.0f if numTerms is between min and max.

        See Also:
        setLengthNormFactors(int, int, float, boolean), An SVG visualization of this function
      • tf

        public float tf​(float freq)
        Delegates to baselineTf
        Overrides:
        tf in class DefaultSimilarity
        Parameters:
        freq - the frequency of a term within a document
        Returns:
        a score factor based on a term's within-document frequency
        See Also:
        baselineTf(float)