Package org.apache.poi.extractor
Class POIOLE2TextExtractor
- java.lang.Object
-
- org.apache.poi.extractor.POITextExtractor
-
- org.apache.poi.extractor.POIOLE2TextExtractor
-
- All Implemented Interfaces:
java.io.Closeable
,java.lang.AutoCloseable
- Direct Known Subclasses:
EventBasedExcelExtractor
,ExcelExtractor
,HPSFPropertiesExtractor
,OutlookTextExtactor
,PowerPointExtractor
,PublisherTextExtractor
,VisioTextExtractor
,Word6Extractor
,WordExtractor
public abstract class POIOLE2TextExtractor extends POITextExtractor
Common Parent for OLE2 based Text Extractors of POI Documents, such as .doc, .xls You will typically find the implementation of a given format's text extractor under org.apache.poi.[format].extractor .- See Also:
ExcelExtractor
,PowerPointExtractor
,VisioTextExtractor
,WordExtractor
-
-
Constructor Summary
Constructors Constructor Description POIOLE2TextExtractor(POIDocument document)
Creates a new text extractor for the given document
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description DocumentSummaryInformation
getDocSummaryInformation()
Returns the document information metadata for the documentPOIDocument
getDocument()
Return the underlying POIDocumentPOITextExtractor
getMetadataTextExtractor()
Returns an HPSF powered text extractor for the document properties metadata, such as title and author.DirectoryEntry
getRoot()
Return the underlying DirectoryEntry of this document.SummaryInformation
getSummaryInformation()
Returns the summary information metadata for the document.-
Methods inherited from class org.apache.poi.extractor.POITextExtractor
close, getText, setFilesystem
-
-
-
-
Constructor Detail
-
POIOLE2TextExtractor
public POIOLE2TextExtractor(POIDocument document)
Creates a new text extractor for the given document- Parameters:
document
- The POIDocument to use in this extractor.
-
-
Method Detail
-
getDocSummaryInformation
public DocumentSummaryInformation getDocSummaryInformation()
Returns the document information metadata for the document- Returns:
- The Document Summary Information or null if it could not be read for this document.
-
getSummaryInformation
public SummaryInformation getSummaryInformation()
Returns the summary information metadata for the document.- Returns:
- The Summary information for the document or null if it could not be read for this document.
-
getMetadataTextExtractor
public POITextExtractor getMetadataTextExtractor()
Returns an HPSF powered text extractor for the document properties metadata, such as title and author.- Specified by:
getMetadataTextExtractor
in classPOITextExtractor
- Returns:
- an instance of POIExtractor that can extract meta-data.
-
getRoot
public DirectoryEntry getRoot()
Return the underlying DirectoryEntry of this document.- Returns:
- the DirectoryEntry that is associated with the POIDocument of this extractor.
-
getDocument
public POIDocument getDocument()
Return the underlying POIDocument- Specified by:
getDocument
in classPOITextExtractor
- Returns:
- the underlying POIDocument
-
-