Class ParsingReader

  • All Implemented Interfaces:
    java.io.Closeable, java.lang.AutoCloseable, java.lang.Readable

    public class ParsingReader
    extends java.io.Reader
    Reader for the text content from a given binary stream. This class uses a background parsing task with a Parser (AutoDetectParser by default) to parse the text content from a given input stream. The BodyContentHandler class and a pipe is used to convert the push-based SAX event stream to the pull-based character stream defined by the Reader interface.
    Since:
    Apache Tika 0.2
    • Constructor Summary

      Constructors 
      Constructor Description
      ParsingReader​(java.io.File file)
      Creates a reader for the text content of the given file.
      ParsingReader​(java.io.InputStream stream)
      Creates a reader for the text content of the given binary stream.
      ParsingReader​(java.io.InputStream stream, java.lang.String name)
      Creates a reader for the text content of the given binary stream with the given name.
      ParsingReader​(java.nio.file.Path path)
      Creates a reader for the text content of the file at the given path.
      ParsingReader​(Parser parser, java.io.InputStream stream, Metadata metadata, ParseContext context)
      Creates a reader for the text content of the given binary stream with the given document metadata.
      ParsingReader​(Parser parser, java.io.InputStream stream, Metadata metadata, ParseContext context, java.util.concurrent.Executor executor)
      Creates a reader for the text content of the given binary stream with the given document metadata.
    • Method Summary

      All Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      void close()
      Closes the read end of the pipe.
      int read​(char[] cbuf, int off, int len)
      Reads parsed text from the pipe connected to the parsing thread.
      • Methods inherited from class java.io.Reader

        mark, markSupported, nullReader, read, read, read, ready, reset, skip, transferTo
      • Methods inherited from class java.lang.Object

        equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Constructor Detail

      • ParsingReader

        public ParsingReader​(java.io.InputStream stream)
                      throws java.io.IOException
        Creates a reader for the text content of the given binary stream.
        Parameters:
        stream - binary stream
        Throws:
        java.io.IOException - if the document can not be parsed
      • ParsingReader

        public ParsingReader​(java.io.InputStream stream,
                             java.lang.String name)
                      throws java.io.IOException
        Creates a reader for the text content of the given binary stream with the given name.
        Parameters:
        stream - binary stream
        name - document name
        Throws:
        java.io.IOException - if the document can not be parsed
      • ParsingReader

        public ParsingReader​(java.nio.file.Path path)
                      throws java.io.IOException
        Creates a reader for the text content of the file at the given path.
        Parameters:
        path - path
        Throws:
        java.io.FileNotFoundException - if the given file does not exist
        java.io.IOException - if the document can not be parsed
      • ParsingReader

        public ParsingReader​(java.io.File file)
                      throws java.io.FileNotFoundException,
                             java.io.IOException
        Creates a reader for the text content of the given file.
        Parameters:
        file - file
        Throws:
        java.io.FileNotFoundException - if the given file does not exist
        java.io.IOException - if the document can not be parsed
        See Also:
        ParsingReader(Path)
      • ParsingReader

        public ParsingReader​(Parser parser,
                             java.io.InputStream stream,
                             Metadata metadata,
                             ParseContext context)
                      throws java.io.IOException
        Creates a reader for the text content of the given binary stream with the given document metadata. The given parser is used for parsing. A new background thread is started for the parsing task.

        The created reader will be responsible for closing the given stream. The stream and any associated resources will be closed at or before the time when the close() method is called on this reader.

        Parameters:
        parser - parser instance
        stream - binary stream
        metadata - document metadata
        Throws:
        java.io.IOException - if the document can not be parsed
      • ParsingReader

        public ParsingReader​(Parser parser,
                             java.io.InputStream stream,
                             Metadata metadata,
                             ParseContext context,
                             java.util.concurrent.Executor executor)
                      throws java.io.IOException
        Creates a reader for the text content of the given binary stream with the given document metadata. The given parser is used for the parsing task that is run with the given executor. The given executor must run the parsing task asynchronously in a separate thread, since the current thread must return to the caller that can then consume the parsed text through the Reader interface.

        The created reader will be responsible for closing the given stream. The stream and any associated resources will be closed at or before the time when the close() method is called on this reader.

        Parameters:
        parser - parser instance
        stream - binary stream
        metadata - document metadata
        context - parsing context
        executor - executor for the parsing task
        Throws:
        java.io.IOException - if the document can not be parsed
        Since:
        Apache Tika 0.4
    • Method Detail

      • read

        public int read​(char[] cbuf,
                        int off,
                        int len)
                 throws java.io.IOException
        Reads parsed text from the pipe connected to the parsing thread. Fails if the parsing thread has thrown an exception.
        Specified by:
        read in class java.io.Reader
        Parameters:
        cbuf - character buffer
        off - start offset within the buffer
        len - maximum number of characters to read
        Throws:
        java.io.IOException - if the parsing thread has failed or if for some reason the pipe does not work properly
      • close

        public void close()
                   throws java.io.IOException
        Closes the read end of the pipe. If the parsing thread is still running, next write to the pipe will fail and cause the thread to stop. Thus there is no need to explicitly terminate the thread.
        Specified by:
        close in interface java.lang.AutoCloseable
        Specified by:
        close in interface java.io.Closeable
        Specified by:
        close in class java.io.Reader
        Throws:
        java.io.IOException - if the pipe can not be closed