Package org.apache.poi.extractor
Class POIOLE2TextExtractor
- java.lang.Object
-
- org.apache.poi.extractor.POITextExtractor
-
- org.apache.poi.extractor.POIOLE2TextExtractor
-
- All Implemented Interfaces:
java.io.Closeable,java.lang.AutoCloseable
- Direct Known Subclasses:
EventBasedExcelExtractor,ExcelExtractor,HPSFPropertiesExtractor,OutlookTextExtactor,PowerPointExtractor,PublisherTextExtractor,VisioTextExtractor,Word6Extractor,WordExtractor
public abstract class POIOLE2TextExtractor extends POITextExtractor
Common Parent for OLE2 based Text Extractors of POI Documents, such as .doc, .xls You will typically find the implementation of a given format's text extractor under org.apache.poi.[format].extractor .- See Also:
ExcelExtractor,PowerPointExtractor,VisioTextExtractor,WordExtractor
-
-
Constructor Summary
Constructors Constructor Description POIOLE2TextExtractor(POIDocument document)Creates a new text extractor for the given document
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description DocumentSummaryInformationgetDocSummaryInformation()Returns the document information metadata for the documentPOIDocumentgetDocument()Return the underlying POIDocumentPOITextExtractorgetMetadataTextExtractor()Returns an HPSF powered text extractor for the document properties metadata, such as title and author.DirectoryEntrygetRoot()Return the underlying DirectoryEntry of this document.SummaryInformationgetSummaryInformation()Returns the summary information metadata for the document.-
Methods inherited from class org.apache.poi.extractor.POITextExtractor
close, getText, setFilesystem
-
-
-
-
Constructor Detail
-
POIOLE2TextExtractor
public POIOLE2TextExtractor(POIDocument document)
Creates a new text extractor for the given document- Parameters:
document- The POIDocument to use in this extractor.
-
-
Method Detail
-
getDocSummaryInformation
public DocumentSummaryInformation getDocSummaryInformation()
Returns the document information metadata for the document- Returns:
- The Document Summary Information or null if it could not be read for this document.
-
getSummaryInformation
public SummaryInformation getSummaryInformation()
Returns the summary information metadata for the document.- Returns:
- The Summary information for the document or null if it could not be read for this document.
-
getMetadataTextExtractor
public POITextExtractor getMetadataTextExtractor()
Returns an HPSF powered text extractor for the document properties metadata, such as title and author.- Specified by:
getMetadataTextExtractorin classPOITextExtractor- Returns:
- an instance of POIExtractor that can extract meta-data.
-
getRoot
public DirectoryEntry getRoot()
Return the underlying DirectoryEntry of this document.- Returns:
- the DirectoryEntry that is associated with the POIDocument of this extractor.
-
getDocument
public POIDocument getDocument()
Return the underlying POIDocument- Specified by:
getDocumentin classPOITextExtractor- Returns:
- the underlying POIDocument
-
-