Class Tokenizer
- java.lang.Object
-
- org.apache.lucene.util.AttributeSource
-
- org.apache.lucene.analysis.TokenStream
-
- org.apache.lucene.analysis.Tokenizer
-
- All Implemented Interfaces:
java.io.Closeable
,java.lang.AutoCloseable
- Direct Known Subclasses:
CharTokenizer
,ChineseTokenizer
,CJKTokenizer
,ClassicTokenizer
,KeywordTokenizer
,Lucene43EdgeNGramTokenizer
,Lucene43NGramTokenizer
,NGramTokenizer
,PathHierarchyTokenizer
,PatternTokenizer
,ReversePathHierarchyTokenizer
,StandardTokenizer
,UAX29URLEmailTokenizer
,WikipediaTokenizer
public abstract class Tokenizer extends TokenStream
A Tokenizer is a TokenStream whose input is a Reader.This is an abstract class; subclasses must override
TokenStream.incrementToken()
NOTE: Subclasses overriding
TokenStream.incrementToken()
must callAttributeSource.clearAttributes()
before setting attributes.
-
-
Nested Class Summary
-
Nested classes/interfaces inherited from class org.apache.lucene.util.AttributeSource
AttributeSource.AttributeFactory, AttributeSource.State
-
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description void
close()
Releases resources associated with this stream.void
reset()
This method is called by a consumer before it begins consumption usingTokenStream.incrementToken()
.void
setReader(java.io.Reader input)
Expert: Set a new reader on the Tokenizer.-
Methods inherited from class org.apache.lucene.analysis.TokenStream
end, incrementToken
-
Methods inherited from class org.apache.lucene.util.AttributeSource
addAttribute, addAttributeImpl, captureState, clearAttributes, cloneAttributes, copyTo, equals, getAttribute, getAttributeClassesIterator, getAttributeFactory, getAttributeImplsIterator, hasAttribute, hasAttributes, hashCode, reflectAsString, reflectWith, restoreState, toString
-
-
-
-
Method Detail
-
close
public void close() throws java.io.IOException
Releases resources associated with this stream.If you override this method, always call
super.close()
, otherwise some internal state will not be correctly reset (e.g.,Tokenizer
will throwIllegalStateException
on reuse).NOTE: The default implementation closes the input Reader, so be sure to call
super.close()
when overriding this method.- Specified by:
close
in interfacejava.lang.AutoCloseable
- Specified by:
close
in interfacejava.io.Closeable
- Overrides:
close
in classTokenStream
- Throws:
java.io.IOException
-
setReader
public final void setReader(java.io.Reader input) throws java.io.IOException
Expert: Set a new reader on the Tokenizer. Typically, an analyzer (in its tokenStream method) will use this to re-use a previously created tokenizer.- Throws:
java.io.IOException
-
reset
public void reset() throws java.io.IOException
Description copied from class:TokenStream
This method is called by a consumer before it begins consumption usingTokenStream.incrementToken()
.Resets this stream to a clean state. Stateful implementations must implement this method so that they can be reused, just as if they had been created fresh.
If you override this method, always call
super.reset()
, otherwise some internal state will not be correctly reset (e.g.,Tokenizer
will throwIllegalStateException
on further usage).- Overrides:
reset
in classTokenStream
- Throws:
java.io.IOException
-
-