Package org.apache.lucene.analysis.cz
Class CzechAnalyzer
- java.lang.Object
-
- org.apache.lucene.analysis.Analyzer
-
- org.apache.lucene.analysis.util.StopwordAnalyzerBase
-
- org.apache.lucene.analysis.cz.CzechAnalyzer
-
- All Implemented Interfaces:
java.io.Closeable
,java.lang.AutoCloseable
public final class CzechAnalyzer extends StopwordAnalyzerBase
Analyzer
for Czech language.Supports an external list of stopwords (words that will not be indexed at all). A default set of stopwords is used unless an alternative list is specified.
You must specify the required
Version
compatibility when creating CzechAnalyzer:- As of 3.1, words are stemmed with
CzechStemFilter
- As of 2.9, StopFilter preserves position increments
- As of 2.4, Tokens incorrectly identified as acronyms are corrected (see LUCENE-1068)
-
-
Nested Class Summary
-
Nested classes/interfaces inherited from class org.apache.lucene.analysis.Analyzer
Analyzer.GlobalReuseStrategy, Analyzer.PerFieldReuseStrategy, Analyzer.ReuseStrategy, Analyzer.TokenStreamComponents
-
-
Field Summary
Fields Modifier and Type Field Description static java.lang.String
DEFAULT_STOPWORD_FILE
File containing default Czech stopwords.-
Fields inherited from class org.apache.lucene.analysis.Analyzer
GLOBAL_REUSE_STRATEGY, PER_FIELD_REUSE_STRATEGY
-
-
Constructor Summary
Constructors Constructor Description CzechAnalyzer(Version matchVersion)
Builds an analyzer with the default stop words (getDefaultStopSet()
).CzechAnalyzer(Version matchVersion, CharArraySet stopwords)
Builds an analyzer with the given stop words.CzechAnalyzer(Version matchVersion, CharArraySet stopwords, CharArraySet stemExclusionTable)
Builds an analyzer with the given stop words and a set of work to be excluded from theCzechStemFilter
.
-
Method Summary
All Methods Static Methods Concrete Methods Modifier and Type Method Description static CharArraySet
getDefaultStopSet()
Returns a set of default Czech-stopwords-
Methods inherited from class org.apache.lucene.analysis.util.StopwordAnalyzerBase
getStopwordSet
-
Methods inherited from class org.apache.lucene.analysis.Analyzer
close, getOffsetGap, getPositionIncrementGap, getReuseStrategy, tokenStream, tokenStream
-
-
-
-
Field Detail
-
DEFAULT_STOPWORD_FILE
public static final java.lang.String DEFAULT_STOPWORD_FILE
File containing default Czech stopwords.- See Also:
- Constant Field Values
-
-
Constructor Detail
-
CzechAnalyzer
public CzechAnalyzer(Version matchVersion)
Builds an analyzer with the default stop words (getDefaultStopSet()
).- Parameters:
matchVersion
- Lucene version to match See {@link above}
-
CzechAnalyzer
public CzechAnalyzer(Version matchVersion, CharArraySet stopwords)
Builds an analyzer with the given stop words.- Parameters:
matchVersion
- Lucene version to match See {@link above}stopwords
- a stopword set
-
CzechAnalyzer
public CzechAnalyzer(Version matchVersion, CharArraySet stopwords, CharArraySet stemExclusionTable)
Builds an analyzer with the given stop words and a set of work to be excluded from theCzechStemFilter
.- Parameters:
matchVersion
- Lucene version to match See {@link above}stopwords
- a stopword setstemExclusionTable
- a stemming exclusion set
-
-
Method Detail
-
getDefaultStopSet
public static final CharArraySet getDefaultStopSet()
Returns a set of default Czech-stopwords- Returns:
- a set of default Czech-stopwords
-
-