Package org.apache.poi.xwpf.extractor
Class XWPFWordExtractor
- java.lang.Object
-
- org.apache.poi.extractor.POITextExtractor
-
- org.apache.poi.ooxml.extractor.POIXMLTextExtractor
-
- org.apache.poi.xwpf.extractor.XWPFWordExtractor
-
- All Implemented Interfaces:
java.io.Closeable
,java.lang.AutoCloseable
public class XWPFWordExtractor extends POIXMLTextExtractor
Helper class to extract text from an OOXML Word file
-
-
Field Summary
Fields Modifier and Type Field Description static XWPFRelation[]
SUPPORTED_TYPES
-
Constructor Summary
Constructors Constructor Description XWPFWordExtractor(OPCPackage container)
XWPFWordExtractor(XWPFDocument document)
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description void
appendBodyElementText(java.lang.StringBuilder text, IBodyElement e)
void
appendParagraphText(java.lang.StringBuilder text, XWPFParagraph paragraph)
java.lang.String
getText()
Retrieves all the text from the document.static void
main(java.lang.String[] args)
void
setConcatenatePhoneticRuns(boolean concatenatePhoneticRuns)
Should we concatenate phonetic runs in extraction.void
setFetchHyperlinks(boolean fetch)
Should we also fetch the hyperlinks, when fetching the text content? Default is to only output the hyperlink label, and not the contents-
Methods inherited from class org.apache.poi.ooxml.extractor.POIXMLTextExtractor
close, getCoreProperties, getCustomProperties, getDocument, getExtendedProperties, getMetadataTextExtractor, getPackage
-
Methods inherited from class org.apache.poi.extractor.POITextExtractor
setFilesystem
-
-
-
-
Field Detail
-
SUPPORTED_TYPES
public static final XWPFRelation[] SUPPORTED_TYPES
-
-
Constructor Detail
-
XWPFWordExtractor
public XWPFWordExtractor(OPCPackage container) throws XmlException, OpenXML4JException, java.io.IOException
- Throws:
XmlException
OpenXML4JException
java.io.IOException
-
XWPFWordExtractor
public XWPFWordExtractor(XWPFDocument document)
-
-
Method Detail
-
main
public static void main(java.lang.String[] args) throws java.lang.Exception
- Throws:
java.lang.Exception
-
setFetchHyperlinks
public void setFetchHyperlinks(boolean fetch)
Should we also fetch the hyperlinks, when fetching the text content? Default is to only output the hyperlink label, and not the contents
-
setConcatenatePhoneticRuns
public void setConcatenatePhoneticRuns(boolean concatenatePhoneticRuns)
Should we concatenate phonetic runs in extraction. Default istrue
- Parameters:
concatenatePhoneticRuns
-
-
getText
public java.lang.String getText()
Description copied from class:POITextExtractor
Retrieves all the text from the document. How cells, paragraphs etc are separated in the text is implementation specific - see the javadocs for a specific project for details.- Specified by:
getText
in classPOITextExtractor
- Returns:
- All the text from the document
-
appendBodyElementText
public void appendBodyElementText(java.lang.StringBuilder text, IBodyElement e)
-
appendParagraphText
public void appendParagraphText(java.lang.StringBuilder text, XWPFParagraph paragraph)
-
-