Class AbstractOOXMLExtractor
- java.lang.Object
 - 
- org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor
 
 
- 
- All Implemented Interfaces:
 OOXMLExtractor
- Direct Known Subclasses:
 POIXMLTextExtractorDecorator,SXSLFPowerPointExtractorDecorator,SXWPFWordExtractorDecorator,XPSExtractorDecorator,XSLFPowerPointExtractorDecorator,XSSFExcelExtractorDecorator,XWPFWordExtractorDecorator
public abstract class AbstractOOXMLExtractor extends java.lang.Object implements OOXMLExtractor
Base class for all Tika OOXML extractors. Tika extractors decorate POI extractors so that the parsed content of documents is returned as a sequence of XHTML SAX events. Subclasses must implement the buildXHTML methodbuildXHTML(XHTMLContentHandler)that populates theXHTMLContentHandlerobject received as parameter. 
- 
- 
Constructor Summary
Constructors Constructor Description AbstractOOXMLExtractor(ParseContext context, POIXMLTextExtractor extractor) 
- 
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description POIXMLDocumentgetDocument()Returns the opened document.MetadataExtractorgetMetadataExtractor()POIXMLTextExtractor.getMetadataTextExtractor()not yet supported for OOXML by POI.voidgetXHTML(org.xml.sax.ContentHandler handler, Metadata metadata, ParseContext context)Parses the document into a sequence of XHTML SAX events sent to the given content handler. 
 - 
 
- 
- 
Constructor Detail
- 
AbstractOOXMLExtractor
public AbstractOOXMLExtractor(ParseContext context, POIXMLTextExtractor extractor)
 
 - 
 
- 
Method Detail
- 
getDocument
public POIXMLDocument getDocument()
Description copied from interface:OOXMLExtractorReturns the opened document.- Specified by:
 getDocumentin interfaceOOXMLExtractor- See Also:
 OOXMLExtractor.getDocument()
 
- 
getMetadataExtractor
public MetadataExtractor getMetadataExtractor()
Description copied from interface:OOXMLExtractorPOIXMLTextExtractor.getMetadataTextExtractor()not yet supported for OOXML by POI.- Specified by:
 getMetadataExtractorin interfaceOOXMLExtractor- See Also:
 OOXMLExtractor.getMetadataExtractor()
 
- 
getXHTML
public void getXHTML(org.xml.sax.ContentHandler handler, Metadata metadata, ParseContext context) throws org.xml.sax.SAXException, XmlException, java.io.IOException, TikaExceptionDescription copied from interface:OOXMLExtractorParses the document into a sequence of XHTML SAX events sent to the given content handler.- Specified by:
 getXHTMLin interfaceOOXMLExtractor- Throws:
 org.xml.sax.SAXExceptionXmlExceptionjava.io.IOExceptionTikaException- See Also:
 OOXMLExtractor.getXHTML(ContentHandler, Metadata, ParseContext)
 
 - 
 
 -