Class SimilarityBase
- java.lang.Object
-
- org.apache.lucene.search.similarities.Similarity
-
- org.apache.lucene.search.similarities.SimilarityBase
-
- Direct Known Subclasses:
DFRSimilarity
,IBSimilarity
,LMSimilarity
public abstract class SimilarityBase extends Similarity
A subclass ofSimilarity
that provides a simplified API for its descendants. Subclasses are only required to implement thescore(org.apache.lucene.search.similarities.BasicStats, float, float)
andtoString()
methods. Implementingexplain(Explanation, BasicStats, int, float, float)
is optional, inasmuch as SimilarityBase already provides a basic explanation of the score and the term frequency. However, implementers of a subclass are encouraged to include as much detail about the scoring method as possible.Note: multi-word queries such as phrase queries are scored in a different way than Lucene's default ranking algorithm: whereas it "fakes" an IDF value for the phrase as a whole (since it does not know it), this class instead scores phrases as a summation of the individual term scores.
-
-
Nested Class Summary
-
Nested classes/interfaces inherited from class org.apache.lucene.search.similarities.Similarity
Similarity.SimScorer, Similarity.SimWeight
-
-
Constructor Summary
Constructors Constructor Description SimilarityBase()
Sole constructor.
-
Method Summary
All Methods Static Methods Instance Methods Abstract Methods Concrete Methods Modifier and Type Method Description long
computeNorm(FieldInvertState state)
Encodes the document length in the same way asTFIDFSimilarity
.Similarity.SimWeight
computeWeight(float queryBoost, CollectionStatistics collectionStats, TermStatistics... termStats)
Compute any collection-level weight (e.g.boolean
getDiscountOverlaps()
Returns true if overlap tokens are discounted from the document's length.static double
log2(double x)
Returns the base two logarithm ofx
.void
setDiscountOverlaps(boolean v)
Determines whether overlap tokens (Tokens with 0 position increment) are ignored when computing norm.Similarity.SimScorer
simScorer(Similarity.SimWeight stats, AtomicReaderContext context)
Creates a newSimilarity.SimScorer
to score matching documents from a segment of the inverted index.abstract java.lang.String
toString()
Subclasses must override this method to return the name of the Similarity and preferably the values of parameters (if any) as well.-
Methods inherited from class org.apache.lucene.search.similarities.Similarity
coord, queryNorm
-
-
-
-
Method Detail
-
setDiscountOverlaps
public void setDiscountOverlaps(boolean v)
Determines whether overlap tokens (Tokens with 0 position increment) are ignored when computing norm. By default this is true, meaning overlap tokens do not count when computing norms.
-
getDiscountOverlaps
public boolean getDiscountOverlaps()
Returns true if overlap tokens are discounted from the document's length.- See Also:
setDiscountOverlaps(boolean)
-
computeWeight
public final Similarity.SimWeight computeWeight(float queryBoost, CollectionStatistics collectionStats, TermStatistics... termStats)
Description copied from class:Similarity
Compute any collection-level weight (e.g. IDF, average document length, etc) needed for scoring a query.- Specified by:
computeWeight
in classSimilarity
- Parameters:
queryBoost
- the query-time boost.collectionStats
- collection-level statistics, such as the number of tokens in the collection.termStats
- term-level statistics, such as the document frequency of a term across the collection.- Returns:
- SimWeight object with the information this Similarity needs to score a query.
-
simScorer
public Similarity.SimScorer simScorer(Similarity.SimWeight stats, AtomicReaderContext context) throws java.io.IOException
Description copied from class:Similarity
Creates a newSimilarity.SimScorer
to score matching documents from a segment of the inverted index.- Specified by:
simScorer
in classSimilarity
- Parameters:
stats
- collection information fromSimilarity.computeWeight(float, CollectionStatistics, TermStatistics...)
context
- segment of the inverted index to be scored.- Returns:
- SloppySimScorer for scoring documents across
context
- Throws:
java.io.IOException
- if there is a low-level I/O error
-
toString
public abstract java.lang.String toString()
Subclasses must override this method to return the name of the Similarity and preferably the values of parameters (if any) as well.- Overrides:
toString
in classjava.lang.Object
-
computeNorm
public long computeNorm(FieldInvertState state)
Encodes the document length in the same way asTFIDFSimilarity
.- Specified by:
computeNorm
in classSimilarity
- Parameters:
state
- current processing state for this field- Returns:
- computed norm value
-
log2
public static double log2(double x)
Returns the base two logarithm ofx
.
-
-