Package org.apache.lucene.analysis.standard
Fast, general-purpose grammar-based tokenizers.
The org.apache.lucene.analysis.standard package contains three
    fast grammar-based tokenizers constructed with JFlex:
StandardTokenizer: as of Lucene 3.1, implements the Word Break rules from the Unicode Text Segmentation algorithm, as specified in Unicode Standard Annex #29. UnlikeUAX29URLEmailTokenizer, URLs and email addresses are not tokenized as single tokens, but are instead split up into tokens according to the UAX#29 word break rules.
StandardAnalyzerincludesStandardTokenizer,StandardFilter,LowerCaseFilterandStopFilter. When theVersionspecified in the constructor is lower than 3.1, theClassicTokenizerimplementation is invoked.ClassicTokenizer: this class was formerly (prior to Lucene 3.1) namedStandardTokenizer. (Its tokenization rules are not based on the Unicode Text Segmentation algorithm.)ClassicAnalyzerincludesClassicTokenizer,StandardFilter,LowerCaseFilterandStopFilter.UAX29URLEmailTokenizer: implements the Word Break rules from the Unicode Text Segmentation algorithm, as specified in Unicode Standard Annex #29. URLs and email addresses are also tokenized according to the relevant RFCs.
UAX29URLEmailAnalyzerincludesUAX29URLEmailTokenizer,StandardFilter,LowerCaseFilterandStopFilter.
- 
Interface Summary Interface Description StandardTokenizerInterface Internal interface for supporting versioned grammars. - 
Class Summary Class Description ClassicAnalyzer FiltersClassicTokenizerwithClassicFilter,LowerCaseFilterandStopFilter, using a list of English stop words.ClassicFilter Normalizes tokens extracted withClassicTokenizer.ClassicFilterFactory Factory forClassicFilter.ClassicTokenizer A grammar-based tokenizer constructed with JFlexClassicTokenizerFactory Factory forClassicTokenizer.StandardAnalyzer FiltersStandardTokenizerwithStandardFilter,LowerCaseFilterandStopFilter, using a list of English stop words.StandardFilter Normalizes tokens extracted withStandardTokenizer.StandardFilterFactory Factory forStandardFilter.StandardTokenizer A grammar-based tokenizer constructed with JFlex.StandardTokenizerFactory Factory forStandardTokenizer.StandardTokenizerImpl This class implements Word Break rules from the Unicode Text Segmentation algorithm, as specified in Unicode Standard Annex #29.UAX29URLEmailAnalyzer FiltersUAX29URLEmailTokenizerwithStandardFilter,LowerCaseFilterandStopFilter, using a list of English stop words.UAX29URLEmailTokenizer This class implements Word Break rules from the Unicode Text Segmentation algorithm, as specified in ` Unicode Standard Annex #29 URLs and email addresses are also tokenized according to the relevant RFCs.UAX29URLEmailTokenizerFactory Factory forUAX29URLEmailTokenizer.UAX29URLEmailTokenizerImpl This class implements Word Break rules from the Unicode Text Segmentation algorithm, as specified in Unicode Standard Annex #29 URLs and email addresses are also tokenized according to the relevant RFCs.