public final class NGramTokenFilter extends TokenFilterTokenizes the input into n-grams of the given size(s).
- handles supplementary characters correctly,
- emits all n-grams for the same token at the same position,
- does not modify offsets,
- sorts n-grams by their offset in the original token first, then increasing length (meaning that "abc" will give "a", "ab", "abc", "b", "bc", "c").
You can make this filter use the old behavior by providing a version <
Version.LUCENE_44in the constructor but this is not recommended as it will lead to broken
TokenStreams that will cause highlighting bugs.
If you were using this
TokenFilterto perform partial highlighting, this won't work anymore since this filter doesn't update offsets. You should modify your analysis chain to use
NGramTokenizer, and potentially override
NGramTokenizer.isTokenChar(int)to perform pre-tokenization.
Constructors Constructor Description
NGramTokenFilter(Version version, TokenStream input)Creates NGramTokenFilter with default min and max n-grams.
NGramTokenFilter(Version version, TokenStream input, int minGram, int maxGram)Creates NGramTokenFilter with given min and max n-grams.
All Methods Instance Methods Concrete Methods Modifier and Type Method Description
incrementToken()Returns the next token in the stream, or null at EOS.
reset()This method is called by a consumer before it begins consumption using
Methods inherited from class org.apache.lucene.util.AttributeSource
addAttribute, addAttributeImpl, captureState, clearAttributes, cloneAttributes, copyTo, equals, getAttribute, getAttributeClassesIterator, getAttributeFactory, getAttributeImplsIterator, hasAttribute, hasAttributes, hashCode, reflectAsString, reflectWith, restoreState, toString
public NGramTokenFilter(Version version, TokenStream input, int minGram, int maxGram)Creates NGramTokenFilter with given min and max n-grams.
public final boolean incrementToken() throws IOExceptionReturns the next token in the stream, or null at EOS.
public void reset() throws IOExceptionDescription copied from class:
TokenFilterThis method is called by a consumer before it begins consumption using
Resets this stream to a clean state. Stateful implementations must implement this method so that they can be reused, just as if they had been created fresh.
NOTE: The default implementation chains the call to the input TokenStream, so be sure to call
super.reset()when overriding this method.