Package org.apache.poi.hssf.extractor
Class ExcelExtractor
- java.lang.Object
-
- org.apache.poi.extractor.POITextExtractor
-
- org.apache.poi.extractor.POIOLE2TextExtractor
-
- org.apache.poi.hssf.extractor.ExcelExtractor
-
- All Implemented Interfaces:
java.io.Closeable,java.lang.AutoCloseable,ExcelExtractor
public class ExcelExtractor extends POIOLE2TextExtractor implements ExcelExtractor
A text extractor for Excel files.Returns the textual content of the file, suitable for indexing by something like Lucene, but not really intended for display to the user.
To turn an excel file into a CSV or similar, then see the XLS2CSVmra example
- See Also:
- XLS2CSVmra
-
-
Constructor Summary
Constructors Constructor Description ExcelExtractor(HSSFWorkbook wb)ExcelExtractor(DirectoryNode dir)ExcelExtractor(POIFSFileSystem fs)
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description static java.lang.String_extractHeaderFooter(HeaderFooter hf)java.lang.StringgetText()Retrieves all the text from the document.static voidmain(java.lang.String[] args)Command line extractor.voidsetFormulasNotResults(boolean formulasNotResults)Should we return the formula itself, and not the result it produces? Default is falsevoidsetIncludeBlankCells(boolean includeBlankCells)Should blank cells be output? Default is to only output cells that are present in the file and are non-blank.voidsetIncludeCellComments(boolean includeCellComments)Should cell comments be included? Default is falsevoidsetIncludeHeadersFooters(boolean includeHeadersFooters)Should headers and footers be included in the output? Default is truevoidsetIncludeSheetNames(boolean includeSheetNames)Should sheet names be included? Default is true-
Methods inherited from class org.apache.poi.extractor.POIOLE2TextExtractor
getDocSummaryInformation, getDocument, getMetadataTextExtractor, getRoot, getSummaryInformation
-
Methods inherited from class org.apache.poi.extractor.POITextExtractor
close, setFilesystem
-
-
-
-
Constructor Detail
-
ExcelExtractor
public ExcelExtractor(HSSFWorkbook wb)
-
ExcelExtractor
public ExcelExtractor(POIFSFileSystem fs) throws java.io.IOException
- Throws:
java.io.IOException
-
ExcelExtractor
public ExcelExtractor(DirectoryNode dir) throws java.io.IOException
- Throws:
java.io.IOException
-
-
Method Detail
-
main
public static void main(java.lang.String[] args) throws java.io.IOExceptionCommand line extractor.- Parameters:
args- the command line parameters- Throws:
java.io.IOException- if the file can't be read or contains errors
-
setIncludeSheetNames
public void setIncludeSheetNames(boolean includeSheetNames)
Description copied from interface:ExcelExtractorShould sheet names be included? Default is true- Specified by:
setIncludeSheetNamesin interfaceExcelExtractor- Parameters:
includeSheetNames-trueif the sheet names should be included
-
setFormulasNotResults
public void setFormulasNotResults(boolean formulasNotResults)
Description copied from interface:ExcelExtractorShould we return the formula itself, and not the result it produces? Default is false- Specified by:
setFormulasNotResultsin interfaceExcelExtractor- Parameters:
formulasNotResults-trueif the formula itself is returned
-
setIncludeCellComments
public void setIncludeCellComments(boolean includeCellComments)
Description copied from interface:ExcelExtractorShould cell comments be included? Default is false- Specified by:
setIncludeCellCommentsin interfaceExcelExtractor- Parameters:
includeCellComments-trueif cell comments should be included
-
setIncludeBlankCells
public void setIncludeBlankCells(boolean includeBlankCells)
Should blank cells be output? Default is to only output cells that are present in the file and are non-blank.- Parameters:
includeBlankCells-trueif blank cells should be included
-
setIncludeHeadersFooters
public void setIncludeHeadersFooters(boolean includeHeadersFooters)
Description copied from interface:ExcelExtractorShould headers and footers be included in the output? Default is true- Specified by:
setIncludeHeadersFootersin interfaceExcelExtractor- Parameters:
includeHeadersFooters-trueif headers and footers should be included
-
getText
public java.lang.String getText()
Description copied from class:POITextExtractorRetrieves all the text from the document. How cells, paragraphs etc are separated in the text is implementation specific - see the javadocs for a specific project for details.- Specified by:
getTextin interfaceExcelExtractor- Specified by:
getTextin classPOITextExtractor- Returns:
- All the text from the document
-
_extractHeaderFooter
public static java.lang.String _extractHeaderFooter(HeaderFooter hf)
-
-