Class AbstractOOXMLExtractor
- java.lang.Object
-
- org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor
-
- All Implemented Interfaces:
OOXMLExtractor
- Direct Known Subclasses:
POIXMLTextExtractorDecorator
,SXSLFPowerPointExtractorDecorator
,SXWPFWordExtractorDecorator
,XPSExtractorDecorator
,XSLFPowerPointExtractorDecorator
,XSSFExcelExtractorDecorator
,XWPFWordExtractorDecorator
public abstract class AbstractOOXMLExtractor extends java.lang.Object implements OOXMLExtractor
Base class for all Tika OOXML extractors. Tika extractors decorate POI extractors so that the parsed content of documents is returned as a sequence of XHTML SAX events. Subclasses must implement the buildXHTML methodbuildXHTML(XHTMLContentHandler)
that populates theXHTMLContentHandler
object received as parameter.
-
-
Constructor Summary
Constructors Constructor Description AbstractOOXMLExtractor(ParseContext context, POIXMLTextExtractor extractor)
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description POIXMLDocument
getDocument()
Returns the opened document.MetadataExtractor
getMetadataExtractor()
POIXMLTextExtractor.getMetadataTextExtractor()
not yet supported for OOXML by POI.void
getXHTML(org.xml.sax.ContentHandler handler, Metadata metadata, ParseContext context)
Parses the document into a sequence of XHTML SAX events sent to the given content handler.
-
-
-
Constructor Detail
-
AbstractOOXMLExtractor
public AbstractOOXMLExtractor(ParseContext context, POIXMLTextExtractor extractor)
-
-
Method Detail
-
getDocument
public POIXMLDocument getDocument()
Description copied from interface:OOXMLExtractor
Returns the opened document.- Specified by:
getDocument
in interfaceOOXMLExtractor
- See Also:
OOXMLExtractor.getDocument()
-
getMetadataExtractor
public MetadataExtractor getMetadataExtractor()
Description copied from interface:OOXMLExtractor
POIXMLTextExtractor.getMetadataTextExtractor()
not yet supported for OOXML by POI.- Specified by:
getMetadataExtractor
in interfaceOOXMLExtractor
- See Also:
OOXMLExtractor.getMetadataExtractor()
-
getXHTML
public void getXHTML(org.xml.sax.ContentHandler handler, Metadata metadata, ParseContext context) throws org.xml.sax.SAXException, XmlException, java.io.IOException, TikaException
Description copied from interface:OOXMLExtractor
Parses the document into a sequence of XHTML SAX events sent to the given content handler.- Specified by:
getXHTML
in interfaceOOXMLExtractor
- Throws:
org.xml.sax.SAXException
XmlException
java.io.IOException
TikaException
- See Also:
OOXMLExtractor.getXHTML(ContentHandler, Metadata, ParseContext)
-
-