public abstract class AbstractOOXMLExtractor extends java.lang.Object implements OOXMLExtractor
Tika extractors decorate POI extractors so that the parsed content of
documents is returned as a sequence of XHTML SAX events. Subclasses must
implement the buildXHTML method buildXHTML(XHTMLContentHandler)
that
populates the XHTMLContentHandler
object received as parameter.
Constructor and Description |
---|
AbstractOOXMLExtractor(ParseContext context,
org.apache.poi.ooxml.extractor.POIXMLTextExtractor extractor) |
Modifier and Type | Method and Description |
---|---|
org.apache.poi.ooxml.POIXMLDocument |
getDocument()
Returns the opened document.
|
MetadataExtractor |
getMetadataExtractor()
POIXMLTextExtractor.getMetadataTextExtractor() not yet supported
for OOXML by POI. |
void |
getXHTML(org.xml.sax.ContentHandler handler,
Metadata metadata,
ParseContext context)
Parses the document into a sequence of XHTML SAX events sent to the
given content handler.
|
public AbstractOOXMLExtractor(ParseContext context, org.apache.poi.ooxml.extractor.POIXMLTextExtractor extractor)
public org.apache.poi.ooxml.POIXMLDocument getDocument()
OOXMLExtractor
getDocument
in interface OOXMLExtractor
OOXMLExtractor.getDocument()
public MetadataExtractor getMetadataExtractor()
OOXMLExtractor
POIXMLTextExtractor.getMetadataTextExtractor()
not yet supported
for OOXML by POI.getMetadataExtractor
in interface OOXMLExtractor
OOXMLExtractor.getMetadataExtractor()
public void getXHTML(org.xml.sax.ContentHandler handler, Metadata metadata, ParseContext context) throws org.xml.sax.SAXException, org.apache.xmlbeans.XmlException, java.io.IOException, TikaException
OOXMLExtractor
getXHTML
in interface OOXMLExtractor
org.xml.sax.SAXException
org.apache.xmlbeans.XmlException
java.io.IOException
TikaException
OOXMLExtractor.getXHTML(ContentHandler, Metadata, ParseContext)
Copyright © 2010 - 2023 Adobe. All Rights Reserved