Class StopFilter

  • All Implemented Interfaces:
    java.io.Closeable, java.lang.AutoCloseable

    public final class StopFilter
    extends FilteringTokenFilter
    Removes stop words from a token stream.

    You must specify the required Version compatibility when creating StopFilter:

    • As of 3.1, StopFilter correctly handles Unicode 4.0 supplementary characters in stopwords and position increments are preserved
    • Constructor Detail

      • StopFilter

        public StopFilter​(Version matchVersion,
                          TokenStream in,
                          CharArraySet stopWords)
        Constructs a filter which removes words from the input TokenStream that are named in the Set.
        Parameters:
        matchVersion - Lucene version to enable correct Unicode 4.0 behavior in the stop set if Version > 3.0. See above for details.
        in - Input stream
        stopWords - A CharArraySet representing the stopwords.
        See Also:
        makeStopSet(Version, java.lang.String...)
    • Method Detail

      • makeStopSet

        public static CharArraySet makeStopSet​(Version matchVersion,
                                               java.lang.String... stopWords)
        Builds a Set from an array of stop words, appropriate for passing into the StopFilter constructor. This permits this stopWords construction to be cached once when an Analyzer is constructed.
        Parameters:
        matchVersion - Lucene version to enable correct Unicode 4.0 behavior in the returned set if Version > 3.0
        stopWords - An array of stopwords
        See Also:
        passing false to ignoreCase
      • makeStopSet

        public static CharArraySet makeStopSet​(Version matchVersion,
                                               java.util.List<?> stopWords)
        Builds a Set from an array of stop words, appropriate for passing into the StopFilter constructor. This permits this stopWords construction to be cached once when an Analyzer is constructed.
        Parameters:
        matchVersion - Lucene version to enable correct Unicode 4.0 behavior in the returned set if Version > 3.0
        stopWords - A List of Strings or char[] or any other toString()-able list representing the stopwords
        Returns:
        A Set (CharArraySet) containing the words
        See Also:
        passing false to ignoreCase
      • makeStopSet

        public static CharArraySet makeStopSet​(Version matchVersion,
                                               java.lang.String[] stopWords,
                                               boolean ignoreCase)
        Creates a stopword set from the given stopword array.
        Parameters:
        matchVersion - Lucene version to enable correct Unicode 4.0 behavior in the returned set if Version > 3.0
        stopWords - An array of stopwords
        ignoreCase - If true, all words are lower cased first.
        Returns:
        a Set containing the words
      • makeStopSet

        public static CharArraySet makeStopSet​(Version matchVersion,
                                               java.util.List<?> stopWords,
                                               boolean ignoreCase)
        Creates a stopword set from the given stopword list.
        Parameters:
        matchVersion - Lucene version to enable correct Unicode 4.0 behavior in the returned set if Version > 3.0
        stopWords - A List of Strings or char[] or any other toString()-able list representing the stopwords
        ignoreCase - if true, all words are lower cased first
        Returns:
        A Set (CharArraySet) containing the words