Class ParserPostProcessor

  • All Implemented Interfaces:
    java.io.Serializable, Parser

    public class ParserPostProcessor
    extends ParserDecorator
    Parser decorator that post-processes the results from a decorated parser. The post-processing takes care of filling in the "fulltext", "summary", and "outlinks" metadata entries based on the full text content returned by the decorated parser.
    See Also:
    Serialized Form
    • Constructor Detail

      • ParserPostProcessor

        public ParserPostProcessor​(Parser parser)
        Creates a post-processing decorator for the given parser.
        Parameters:
        parser - the parser to be decorated
    • Method Detail

      • parse

        public void parse​(java.io.InputStream stream,
                          org.xml.sax.ContentHandler handler,
                          Metadata metadata,
                          ParseContext context)
                   throws java.io.IOException,
                          org.xml.sax.SAXException,
                          TikaException
        Forwards the call to the delegated parser and post-processes the results as described above.
        Specified by:
        parse in interface Parser
        Overrides:
        parse in class ParserDecorator
        Parameters:
        stream - the document stream (input)
        handler - handler for the XHTML SAX events (output)
        metadata - document metadata (input and output)
        context - parse context
        Throws:
        java.io.IOException - if the document stream could not be read
        org.xml.sax.SAXException - if the SAX events could not be processed
        TikaException - if the document could not be parsed