Class Analyzer

    • Method Detail

      • tokenStream

        public final TokenStream tokenStream​(java.lang.String fieldName,
                                             java.io.Reader reader)
                                      throws java.io.IOException
        Returns a TokenStream suitable for fieldName, tokenizing the contents of reader.

        This method uses createComponents(String, Reader) to obtain an instance of Analyzer.TokenStreamComponents. It returns the sink of the components and stores the components internally. Subsequent calls to this method will reuse the previously stored components after resetting them through Analyzer.TokenStreamComponents.setReader(Reader).

        NOTE: After calling this method, the consumer must follow the workflow described in TokenStream to properly consume its contents. See the Analysis package documentation for some examples demonstrating this. NOTE: If your data is available as a String, use tokenStream(String, String) which reuses a StringReader-like instance internally.

        Parameters:
        fieldName - the name of the field the created TokenStream is used for
        reader - the reader the streams source reads from
        Returns:
        TokenStream for iterating the analyzed content of reader
        Throws:
        AlreadyClosedException - if the Analyzer is closed.
        java.io.IOException - if an i/o error occurs.
        See Also:
        tokenStream(String, String)
      • tokenStream

        public final TokenStream tokenStream​(java.lang.String fieldName,
                                             java.lang.String text)
                                      throws java.io.IOException
        Returns a TokenStream suitable for fieldName, tokenizing the contents of text.

        This method uses createComponents(String, Reader) to obtain an instance of Analyzer.TokenStreamComponents. It returns the sink of the components and stores the components internally. Subsequent calls to this method will reuse the previously stored components after resetting them through Analyzer.TokenStreamComponents.setReader(Reader).

        NOTE: After calling this method, the consumer must follow the workflow described in TokenStream to properly consume its contents. See the Analysis package documentation for some examples demonstrating this.

        Parameters:
        fieldName - the name of the field the created TokenStream is used for
        text - the String the streams source reads from
        Returns:
        TokenStream for iterating the analyzed content of reader
        Throws:
        AlreadyClosedException - if the Analyzer is closed.
        java.io.IOException - if an i/o error occurs (may rarely happen for strings).
        See Also:
        tokenStream(String, Reader)
      • getPositionIncrementGap

        public int getPositionIncrementGap​(java.lang.String fieldName)
        Invoked before indexing a IndexableField instance if terms have already been added to that field. This allows custom analyzers to place an automatic position increment gap between IndexbleField instances using the same field name. The default value position increment gap is 0. With a 0 position increment gap and the typical default token position increment of 1, all terms in a field, including across IndexableField instances, are in successive positions, allowing exact PhraseQuery matches, for instance, across IndexableField instance boundaries.
        Parameters:
        fieldName - IndexableField name being indexed.
        Returns:
        position increment gap, added to the next token emitted from tokenStream(String,Reader). This value must be >= 0.
      • getOffsetGap

        public int getOffsetGap​(java.lang.String fieldName)
        Just like getPositionIncrementGap(java.lang.String), except for Token offsets instead. By default this returns 1. This method is only called if the field produced at least one token for indexing.
        Parameters:
        fieldName - the field just indexed
        Returns:
        offset gap, added to the next token emitted from tokenStream(String,Reader). This value must be >= 0.
      • close

        public void close()
        Frees persistent resources used by this Analyzer
        Specified by:
        close in interface java.lang.AutoCloseable
        Specified by:
        close in interface java.io.Closeable