Class OOXMLWordAndPowerPointTextHandler
- java.lang.Object
 - 
- org.xml.sax.helpers.DefaultHandler
 - 
- org.apache.tika.parser.microsoft.ooxml.OOXMLWordAndPowerPointTextHandler
 
 
 
- 
- All Implemented Interfaces:
 org.xml.sax.ContentHandler,org.xml.sax.DTDHandler,org.xml.sax.EntityResolver,org.xml.sax.ErrorHandler
public class OOXMLWordAndPowerPointTextHandler extends org.xml.sax.helpers.DefaultHandlerThis class is intended to handle anything that might contain IBodyElements: main document, headers, footers, notes, slides, etc. This class does not generally check for namespaces, and it can be applied to PPTX and DOCX for text extraction. This can be used to scrape content from charts. It currently ignores formula (<c:f/>) elements This does not work with .xlsx or .vsdx. TODO: move this into POI? 
- 
- 
Nested Class Summary
Nested Classes Modifier and Type Class Description static classOOXMLWordAndPowerPointTextHandler.EditTypestatic interfaceOOXMLWordAndPowerPointTextHandler.XWPFBodyContentsHandler 
- 
Field Summary
Fields Modifier and Type Field Description static java.lang.StringW_NS 
- 
Constructor Summary
Constructors Constructor Description OOXMLWordAndPowerPointTextHandler(OOXMLWordAndPowerPointTextHandler.XWPFBodyContentsHandler bodyContentsHandler, java.util.Map<java.lang.String,java.lang.String> hyperlinks)OOXMLWordAndPowerPointTextHandler(OOXMLWordAndPowerPointTextHandler.XWPFBodyContentsHandler bodyContentsHandler, java.util.Map<java.lang.String,java.lang.String> hyperlinks, boolean includeTextBox, boolean concatenatePhoneticRuns) 
- 
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description voidcharacters(char[] ch, int start, int length)voidendDocument()voidendElement(java.lang.String uri, java.lang.String localName, java.lang.String qName)voidendPrefixMapping(java.lang.String prefix)voidignorableWhitespace(char[] ch, int start, int length)voidstartDocument()voidstartElement(java.lang.String uri, java.lang.String localName, java.lang.String qName, org.xml.sax.Attributes atts)voidstartPrefixMapping(java.lang.String prefix, java.lang.String uri) 
 - 
 
- 
- 
Field Detail
- 
W_NS
public static final java.lang.String W_NS
- See Also:
 - Constant Field Values
 
 
 - 
 
- 
Constructor Detail
- 
OOXMLWordAndPowerPointTextHandler
public OOXMLWordAndPowerPointTextHandler(OOXMLWordAndPowerPointTextHandler.XWPFBodyContentsHandler bodyContentsHandler, java.util.Map<java.lang.String,java.lang.String> hyperlinks)
 
- 
OOXMLWordAndPowerPointTextHandler
public OOXMLWordAndPowerPointTextHandler(OOXMLWordAndPowerPointTextHandler.XWPFBodyContentsHandler bodyContentsHandler, java.util.Map<java.lang.String,java.lang.String> hyperlinks, boolean includeTextBox, boolean concatenatePhoneticRuns)
 
 - 
 
- 
Method Detail
- 
startDocument
public void startDocument() throws org.xml.sax.SAXException- Specified by:
 startDocumentin interfaceorg.xml.sax.ContentHandler- Overrides:
 startDocumentin classorg.xml.sax.helpers.DefaultHandler- Throws:
 org.xml.sax.SAXException
 
- 
endDocument
public void endDocument() throws org.xml.sax.SAXException- Specified by:
 endDocumentin interfaceorg.xml.sax.ContentHandler- Overrides:
 endDocumentin classorg.xml.sax.helpers.DefaultHandler- Throws:
 org.xml.sax.SAXException
 
- 
startPrefixMapping
public void startPrefixMapping(java.lang.String prefix, java.lang.String uri) throws org.xml.sax.SAXException- Specified by:
 startPrefixMappingin interfaceorg.xml.sax.ContentHandler- Overrides:
 startPrefixMappingin classorg.xml.sax.helpers.DefaultHandler- Throws:
 org.xml.sax.SAXException
 
- 
endPrefixMapping
public void endPrefixMapping(java.lang.String prefix) throws org.xml.sax.SAXException- Specified by:
 endPrefixMappingin interfaceorg.xml.sax.ContentHandler- Overrides:
 endPrefixMappingin classorg.xml.sax.helpers.DefaultHandler- Throws:
 org.xml.sax.SAXException
 
- 
startElement
public void startElement(java.lang.String uri, java.lang.String localName, java.lang.String qName, org.xml.sax.Attributes atts) throws org.xml.sax.SAXException- Specified by:
 startElementin interfaceorg.xml.sax.ContentHandler- Overrides:
 startElementin classorg.xml.sax.helpers.DefaultHandler- Throws:
 org.xml.sax.SAXException
 
- 
endElement
public void endElement(java.lang.String uri, java.lang.String localName, java.lang.String qName) throws org.xml.sax.SAXException- Specified by:
 endElementin interfaceorg.xml.sax.ContentHandler- Overrides:
 endElementin classorg.xml.sax.helpers.DefaultHandler- Throws:
 org.xml.sax.SAXException
 
- 
characters
public void characters(char[] ch, int start, int length) throws org.xml.sax.SAXException- Specified by:
 charactersin interfaceorg.xml.sax.ContentHandler- Overrides:
 charactersin classorg.xml.sax.helpers.DefaultHandler- Throws:
 org.xml.sax.SAXException
 
- 
ignorableWhitespace
public void ignorableWhitespace(char[] ch, int start, int length) throws org.xml.sax.SAXException- Specified by:
 ignorableWhitespacein interfaceorg.xml.sax.ContentHandler- Overrides:
 ignorableWhitespacein classorg.xml.sax.helpers.DefaultHandler- Throws:
 org.xml.sax.SAXException
 
 - 
 
 -