Package org.apache.lucene.analysis.query
Class QueryAutoStopWordAnalyzer
- java.lang.Object
-
- org.apache.lucene.analysis.Analyzer
-
- org.apache.lucene.analysis.AnalyzerWrapper
-
- org.apache.lucene.analysis.query.QueryAutoStopWordAnalyzer
-
- All Implemented Interfaces:
java.io.Closeable
,java.lang.AutoCloseable
public final class QueryAutoStopWordAnalyzer extends AnalyzerWrapper
AnAnalyzer
used primarily at query time to wrap another analyzer and provide a layer of protection which prevents very common words from being passed into queries.For very large indexes the cost of reading TermDocs for a very common word can be high. This analyzer was created after experience with a 38 million doc index which had a term in around 50% of docs and was causing TermQueries for this term to take 2 seconds.
-
-
Nested Class Summary
-
Nested classes/interfaces inherited from class org.apache.lucene.analysis.Analyzer
Analyzer.GlobalReuseStrategy, Analyzer.PerFieldReuseStrategy, Analyzer.ReuseStrategy, Analyzer.TokenStreamComponents
-
-
Field Summary
Fields Modifier and Type Field Description static float
defaultMaxDocFreqPercent
-
Fields inherited from class org.apache.lucene.analysis.Analyzer
GLOBAL_REUSE_STRATEGY, PER_FIELD_REUSE_STRATEGY
-
-
Constructor Summary
Constructors Constructor Description QueryAutoStopWordAnalyzer(Version matchVersion, Analyzer delegate, IndexReader indexReader)
Creates a new QueryAutoStopWordAnalyzer with stopwords calculated for all indexed fields from terms with a document frequency percentage greater thandefaultMaxDocFreqPercent
QueryAutoStopWordAnalyzer(Version matchVersion, Analyzer delegate, IndexReader indexReader, float maxPercentDocs)
Creates a new QueryAutoStopWordAnalyzer with stopwords calculated for all indexed fields from terms with a document frequency percentage greater than the given maxPercentDocsQueryAutoStopWordAnalyzer(Version matchVersion, Analyzer delegate, IndexReader indexReader, int maxDocFreq)
Creates a new QueryAutoStopWordAnalyzer with stopwords calculated for all indexed fields from terms with a document frequency greater than the given maxDocFreqQueryAutoStopWordAnalyzer(Version matchVersion, Analyzer delegate, IndexReader indexReader, java.util.Collection<java.lang.String> fields, float maxPercentDocs)
Creates a new QueryAutoStopWordAnalyzer with stopwords calculated for the given selection of fields from terms with a document frequency percentage greater than the given maxPercentDocsQueryAutoStopWordAnalyzer(Version matchVersion, Analyzer delegate, IndexReader indexReader, java.util.Collection<java.lang.String> fields, int maxDocFreq)
Creates a new QueryAutoStopWordAnalyzer with stopwords calculated for the given selection of fields from terms with a document frequency greater than the given maxDocFreq
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description Term[]
getStopWords()
Provides information on which stop words have been identified for all fieldsjava.lang.String[]
getStopWords(java.lang.String fieldName)
Provides information on which stop words have been identified for a field-
Methods inherited from class org.apache.lucene.analysis.AnalyzerWrapper
getOffsetGap, getPositionIncrementGap, initReader
-
Methods inherited from class org.apache.lucene.analysis.Analyzer
close, getReuseStrategy, tokenStream, tokenStream
-
-
-
-
Field Detail
-
defaultMaxDocFreqPercent
public static final float defaultMaxDocFreqPercent
- See Also:
- Constant Field Values
-
-
Constructor Detail
-
QueryAutoStopWordAnalyzer
public QueryAutoStopWordAnalyzer(Version matchVersion, Analyzer delegate, IndexReader indexReader) throws java.io.IOException
Creates a new QueryAutoStopWordAnalyzer with stopwords calculated for all indexed fields from terms with a document frequency percentage greater thandefaultMaxDocFreqPercent
- Parameters:
matchVersion
- Version to be used inStopFilter
delegate
- Analyzer whose TokenStream will be filteredindexReader
- IndexReader to identify the stopwords from- Throws:
java.io.IOException
- Can be thrown while reading from the IndexReader
-
QueryAutoStopWordAnalyzer
public QueryAutoStopWordAnalyzer(Version matchVersion, Analyzer delegate, IndexReader indexReader, int maxDocFreq) throws java.io.IOException
Creates a new QueryAutoStopWordAnalyzer with stopwords calculated for all indexed fields from terms with a document frequency greater than the given maxDocFreq- Parameters:
matchVersion
- Version to be used inStopFilter
delegate
- Analyzer whose TokenStream will be filteredindexReader
- IndexReader to identify the stopwords frommaxDocFreq
- Document frequency terms should be above in order to be stopwords- Throws:
java.io.IOException
- Can be thrown while reading from the IndexReader
-
QueryAutoStopWordAnalyzer
public QueryAutoStopWordAnalyzer(Version matchVersion, Analyzer delegate, IndexReader indexReader, float maxPercentDocs) throws java.io.IOException
Creates a new QueryAutoStopWordAnalyzer with stopwords calculated for all indexed fields from terms with a document frequency percentage greater than the given maxPercentDocs- Parameters:
matchVersion
- Version to be used inStopFilter
delegate
- Analyzer whose TokenStream will be filteredindexReader
- IndexReader to identify the stopwords frommaxPercentDocs
- The maximum percentage (between 0.0 and 1.0) of index documents which contain a term, after which the word is considered to be a stop word- Throws:
java.io.IOException
- Can be thrown while reading from the IndexReader
-
QueryAutoStopWordAnalyzer
public QueryAutoStopWordAnalyzer(Version matchVersion, Analyzer delegate, IndexReader indexReader, java.util.Collection<java.lang.String> fields, float maxPercentDocs) throws java.io.IOException
Creates a new QueryAutoStopWordAnalyzer with stopwords calculated for the given selection of fields from terms with a document frequency percentage greater than the given maxPercentDocs- Parameters:
matchVersion
- Version to be used inStopFilter
delegate
- Analyzer whose TokenStream will be filteredindexReader
- IndexReader to identify the stopwords fromfields
- Selection of fields to calculate stopwords formaxPercentDocs
- The maximum percentage (between 0.0 and 1.0) of index documents which contain a term, after which the word is considered to be a stop word- Throws:
java.io.IOException
- Can be thrown while reading from the IndexReader
-
QueryAutoStopWordAnalyzer
public QueryAutoStopWordAnalyzer(Version matchVersion, Analyzer delegate, IndexReader indexReader, java.util.Collection<java.lang.String> fields, int maxDocFreq) throws java.io.IOException
Creates a new QueryAutoStopWordAnalyzer with stopwords calculated for the given selection of fields from terms with a document frequency greater than the given maxDocFreq- Parameters:
matchVersion
- Version to be used inStopFilter
delegate
- Analyzer whose TokenStream will be filteredindexReader
- IndexReader to identify the stopwords fromfields
- Selection of fields to calculate stopwords formaxDocFreq
- Document frequency terms should be above in order to be stopwords- Throws:
java.io.IOException
- Can be thrown while reading from the IndexReader
-
-
Method Detail
-
getStopWords
public java.lang.String[] getStopWords(java.lang.String fieldName)
Provides information on which stop words have been identified for a field- Parameters:
fieldName
- The field for which stop words identified in "addStopWords" method calls will be returned- Returns:
- the stop words identified for a field
-
getStopWords
public Term[] getStopWords()
Provides information on which stop words have been identified for all fields- Returns:
- the stop words (as terms)
-
-