Class Similarity

    • Constructor Detail

      • Similarity

        public Similarity()
        Sole constructor. (For invocation by subclass constructors, typically implicit.)
    • Method Detail

      • coord

        public float coord​(int overlap,
                           int maxOverlap)
        Hook to integrate coordinate-level matching.

        By default this is disabled (returns 1), as with most modern models this will only skew performance, but some implementations such as TFIDFSimilarity override this.

        overlap - the number of query terms matched in the document
        maxOverlap - the total number of terms in the query
        a score factor based on term overlap with the query
      • queryNorm

        public float queryNorm​(float valueForNormalization)
        Computes the normalization value for a query given the sum of the normalized weights Similarity.SimWeight.getValueForNormalization() of each of the query terms. This value is passed back to the weight (Similarity.SimWeight.normalize(float, float) of each query term, to provide a hook to attempt to make scores from different queries comparable.

        By default this is disabled (returns 1), but some implementations such as TFIDFSimilarity override this.

        valueForNormalization - the sum of the term normalization values
        a normalization factor for query weights
      • computeNorm

        public abstract long computeNorm​(FieldInvertState state)
        Computes the normalization value for a field, given the accumulated state of term processing for this field (see FieldInvertState).

        Matches in longer fields are less precise, so implementations of this method usually set smaller values when state.getLength() is large, and larger values when state.getLength() is small.

        state - current processing state for this field
        computed norm value
      • computeWeight

        public abstract Similarity.SimWeight computeWeight​(float queryBoost,
                                                           CollectionStatistics collectionStats,
                                                           TermStatistics... termStats)
        Compute any collection-level weight (e.g. IDF, average document length, etc) needed for scoring a query.
        queryBoost - the query-time boost.
        collectionStats - collection-level statistics, such as the number of tokens in the collection.
        termStats - term-level statistics, such as the document frequency of a term across the collection.
        SimWeight object with the information this Similarity needs to score a query.