Class CommonTermsQuery
- java.lang.Object
-
- org.apache.lucene.search.Query
-
- org.apache.lucene.queries.CommonTermsQuery
-
- All Implemented Interfaces:
java.lang.Cloneable
public class CommonTermsQuery extends Query
A query that executes high-frequency terms in a optional sub-query to prevent slow queries due to "common" terms like stopwords. This query builds 2 queries off theaddedterms: low-frequency terms are added to a required boolean clause and high-frequency terms are added to an optional boolean clause. The optional clause is only executed if the required "low-frequency" clause matches. Scores produced by this query will be slightly different than plainBooleanQueryscorer mainly due to differences in thenumber of leaf queriesin the required boolean clause. In most cases, high-frequency terms are unlikely to significantly contribute to the document score unless at least one of the low-frequency terms are matched. This query can improve query execution times significantly if applicable.CommonTermsQueryhas several advantages over stopword filtering at index or query time since a term can be "classified" based on the actual document frequency in the index and can prevent slow queries even across domains without specialized stopword files.Note: if the query only contains high-frequency terms the query is rewritten into a plain conjunction query ie. all high-frequency terms need to match in order to match a document.
-
-
Constructor Summary
Constructors Constructor Description CommonTermsQuery(BooleanClause.Occur highFreqOccur, BooleanClause.Occur lowFreqOccur, float maxTermFrequency)Creates a newCommonTermsQueryCommonTermsQuery(BooleanClause.Occur highFreqOccur, BooleanClause.Occur lowFreqOccur, float maxTermFrequency, boolean disableCoord)Creates a newCommonTermsQuery
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description voidadd(Term term)Adds a term to theCommonTermsQueryvoidcollectTermContext(IndexReader reader, java.util.List<AtomicReaderContext> leaves, TermContext[] contextArray, Term[] queryTerms)booleanequals(java.lang.Object obj)voidextractTerms(java.util.Set<Term> terms)Expert: adds all terms occurring in this query to the terms set.floatgetHighFreqMinimumNumberShouldMatch()Gets the minimum number of the optional high frequent BooleanClauses which must be satisfied.floatgetLowFreqMinimumNumberShouldMatch()Gets the minimum number of the optional low frequent BooleanClauses which must be satisfied.inthashCode()booleanisCoordDisabled()Returns true iffSimilarity.coord(int,int)is disabled in scoring for the high and low frequency query instance.Queryrewrite(IndexReader reader)Expert: called to re-write queries into primitive queries.voidsetHighFreqMinimumNumberShouldMatch(float min)Specifies a minimum number of the high frequent optional BooleanClauses which must be satisfied in order to produce a match on the low frequency terms query part.voidsetLowFreqMinimumNumberShouldMatch(float min)Specifies a minimum number of the low frequent optional BooleanClauses which must be satisfied in order to produce a match on the low frequency terms query part.java.lang.StringtoString(java.lang.String field)Prints a query to a string, withfieldassumed to be the default field and omitted.
-
-
-
Constructor Detail
-
CommonTermsQuery
public CommonTermsQuery(BooleanClause.Occur highFreqOccur, BooleanClause.Occur lowFreqOccur, float maxTermFrequency)
Creates a newCommonTermsQuery- Parameters:
highFreqOccur-BooleanClause.Occurused for high frequency termslowFreqOccur-BooleanClause.Occurused for low frequency termsmaxTermFrequency- a value in [0..1) (or absolute number >=1) representing the maximum threshold of a terms document frequency to be considered a low frequency term.- Throws:
java.lang.IllegalArgumentException- ifBooleanClause.Occur.MUST_NOTis pass as lowFreqOccur or highFreqOccur
-
CommonTermsQuery
public CommonTermsQuery(BooleanClause.Occur highFreqOccur, BooleanClause.Occur lowFreqOccur, float maxTermFrequency, boolean disableCoord)
Creates a newCommonTermsQuery- Parameters:
highFreqOccur-BooleanClause.Occurused for high frequency termslowFreqOccur-BooleanClause.Occurused for low frequency termsmaxTermFrequency- a value in [0..1) (or absolute number >=1) representing the maximum threshold of a terms document frequency to be considered a low frequency term.disableCoord- disablesSimilarity.coord(int,int)in scoring for the low / high frequency sub-queries- Throws:
java.lang.IllegalArgumentException- ifBooleanClause.Occur.MUST_NOTis pass as lowFreqOccur or highFreqOccur
-
-
Method Detail
-
add
public void add(Term term)
Adds a term to theCommonTermsQuery- Parameters:
term- the term to add
-
rewrite
public Query rewrite(IndexReader reader) throws java.io.IOException
Description copied from class:QueryExpert: called to re-write queries into primitive queries. For example, a PrefixQuery will be rewritten into a BooleanQuery that consists of TermQuerys.
-
collectTermContext
public void collectTermContext(IndexReader reader, java.util.List<AtomicReaderContext> leaves, TermContext[] contextArray, Term[] queryTerms) throws java.io.IOException
- Throws:
java.io.IOException
-
isCoordDisabled
public boolean isCoordDisabled()
Returns true iffSimilarity.coord(int,int)is disabled in scoring for the high and low frequency query instance. The top level query will always disable coords.
-
setLowFreqMinimumNumberShouldMatch
public void setLowFreqMinimumNumberShouldMatch(float min)
Specifies a minimum number of the low frequent optional BooleanClauses which must be satisfied in order to produce a match on the low frequency terms query part. This method accepts a float value in the range [0..1) as a fraction of the actual query terms in the low frequent clause or a number >=1 as an absolut number of clauses that need to match.By default no optional clauses are necessary for a match (unless there are no required clauses). If this method is used, then the specified number of clauses is required.
- Parameters:
min- the number of optional clauses that must match
-
getLowFreqMinimumNumberShouldMatch
public float getLowFreqMinimumNumberShouldMatch()
Gets the minimum number of the optional low frequent BooleanClauses which must be satisfied.
-
setHighFreqMinimumNumberShouldMatch
public void setHighFreqMinimumNumberShouldMatch(float min)
Specifies a minimum number of the high frequent optional BooleanClauses which must be satisfied in order to produce a match on the low frequency terms query part. This method accepts a float value in the range [0..1) as a fraction of the actual query terms in the low frequent clause or a number >=1 as an absolut number of clauses that need to match.By default no optional clauses are necessary for a match (unless there are no required clauses). If this method is used, then the specified number of clauses is required.
- Parameters:
min- the number of optional clauses that must match
-
getHighFreqMinimumNumberShouldMatch
public float getHighFreqMinimumNumberShouldMatch()
Gets the minimum number of the optional high frequent BooleanClauses which must be satisfied.
-
extractTerms
public void extractTerms(java.util.Set<Term> terms)
Description copied from class:QueryExpert: adds all terms occurring in this query to the terms set. Only works if this query is in itsrewrittenform.- Overrides:
extractTermsin classQuery
-
toString
public java.lang.String toString(java.lang.String field)
Description copied from class:QueryPrints a query to a string, withfieldassumed to be the default field and omitted.
-
-