Package org.apache.poi.ooxml.extractor
Class ExtractorFactory
- java.lang.Object
-
- org.apache.poi.ooxml.extractor.ExtractorFactory
-
public final class ExtractorFactory extends java.lang.Object
Figures out the correct POITextExtractor for your supplied document, and returns it.Note 1 - will fail for many file formats if the POI Scratchpad jar is not present on the runtime classpath
Note 2 - rather than using this, for most cases you would be better off switching to Apache Tika instead!
-
-
Field Summary
Fields Modifier and Type Field Description static java.lang.String
CORE_DOCUMENT_REL
-
Method Summary
All Methods Static Methods Concrete Methods Deprecated Methods Modifier and Type Method Description static <T extends POITextExtractor>
TcreateExtractor(java.io.File f)
static POITextExtractor
createExtractor(java.io.InputStream inp)
static POITextExtractor
createExtractor(OPCPackage pkg)
Tries to determine the actual type of file and produces a matching text-extractor for it.static <T extends POITextExtractor>
TcreateExtractor(DirectoryNode poifsDir)
static <T extends POITextExtractor>
TcreateExtractor(POIFSFileSystem fs)
static java.lang.Boolean
getAllThreadsPreferEventExtractors()
Should all threads prefer event based over usermodel based extractors? (usermodel extractors tend to be more accurate, but use more memory) Default is to use the thread level setting, which defaults to false.static POITextExtractor[]
getEmbeddedDocsTextExtractors(POIOLE2TextExtractor ext)
Returns an array of text extractors, one for each of the embedded documents in the file (if there are any).static POITextExtractor[]
getEmbeddedDocsTextExtractors(POIXMLTextExtractor ext)
Returns an array of text extractors, one for each of the embedded documents in the file (if there are any).static POITextExtractor[]
getEmbededDocsTextExtractors(POIOLE2TextExtractor ext)
Deprecated.Use the method with correct "embedded"static POITextExtractor[]
getEmbededDocsTextExtractors(POIXMLTextExtractor ext)
Deprecated.Use the method with correct "embedded"static boolean
getPreferEventExtractor()
Should this thread use event based extractors is available? Checks the all-threads one first, then thread specific.static boolean
getThreadPrefersEventExtractors()
Should this thread prefer event based over usermodel based extractors? (usermodel extractors tend to be more accurate, but use more memory) Default is false.static void
setAllThreadsPreferEventExtractors(java.lang.Boolean preferEventExtractors)
Should all threads prefer event based over usermodel based extractors? If set, will take preference over the Thread level setting.static void
setThreadPrefersEventExtractors(boolean preferEventExtractors)
Should this thread prefer event based over usermodel based extractors? Will only be used if the All Threads setting is null.
-
-
-
Field Detail
-
CORE_DOCUMENT_REL
public static final java.lang.String CORE_DOCUMENT_REL
- See Also:
- Constant Field Values
-
-
Method Detail
-
getThreadPrefersEventExtractors
public static boolean getThreadPrefersEventExtractors()
Should this thread prefer event based over usermodel based extractors? (usermodel extractors tend to be more accurate, but use more memory) Default is false.
-
getAllThreadsPreferEventExtractors
public static java.lang.Boolean getAllThreadsPreferEventExtractors()
Should all threads prefer event based over usermodel based extractors? (usermodel extractors tend to be more accurate, but use more memory) Default is to use the thread level setting, which defaults to false.
-
setThreadPrefersEventExtractors
public static void setThreadPrefersEventExtractors(boolean preferEventExtractors)
Should this thread prefer event based over usermodel based extractors? Will only be used if the All Threads setting is null.
-
setAllThreadsPreferEventExtractors
public static void setAllThreadsPreferEventExtractors(java.lang.Boolean preferEventExtractors)
Should all threads prefer event based over usermodel based extractors? If set, will take preference over the Thread level setting.
-
getPreferEventExtractor
public static boolean getPreferEventExtractor()
Should this thread use event based extractors is available? Checks the all-threads one first, then thread specific.
-
createExtractor
public static <T extends POITextExtractor> T createExtractor(java.io.File f) throws java.io.IOException, OpenXML4JException, XmlException
- Throws:
java.io.IOException
OpenXML4JException
XmlException
-
createExtractor
public static POITextExtractor createExtractor(java.io.InputStream inp) throws java.io.IOException, OpenXML4JException, XmlException
- Throws:
java.io.IOException
OpenXML4JException
XmlException
-
createExtractor
public static POITextExtractor createExtractor(OPCPackage pkg) throws java.io.IOException, OpenXML4JException, XmlException
Tries to determine the actual type of file and produces a matching text-extractor for it.- Parameters:
pkg
- AnOPCPackage
.- Returns:
- A
POIXMLTextExtractor
for the given file. - Throws:
java.io.IOException
- If an error occurs while reading the fileOpenXML4JException
- If an error parsing the OpenXML file format is found.XmlException
- If an XML parsing error occurs.java.lang.IllegalArgumentException
- If no matching file type could be found.
-
createExtractor
public static <T extends POITextExtractor> T createExtractor(POIFSFileSystem fs) throws java.io.IOException, OpenXML4JException, XmlException
- Throws:
java.io.IOException
OpenXML4JException
XmlException
-
createExtractor
public static <T extends POITextExtractor> T createExtractor(DirectoryNode poifsDir) throws java.io.IOException, OpenXML4JException, XmlException
- Throws:
java.io.IOException
OpenXML4JException
XmlException
-
getEmbededDocsTextExtractors
@Deprecated @Removal(version="4.2") public static POITextExtractor[] getEmbededDocsTextExtractors(POIOLE2TextExtractor ext) throws java.io.IOException, OpenXML4JException, XmlException
Deprecated.Use the method with correct "embedded"Returns an array of text extractors, one for each of the embedded documents in the file (if there are any). If there are no embedded documents, you'll get back an empty array. Otherwise, you'll get one openPOITextExtractor
for each embedded file.- Throws:
java.io.IOException
OpenXML4JException
XmlException
-
getEmbeddedDocsTextExtractors
public static POITextExtractor[] getEmbeddedDocsTextExtractors(POIOLE2TextExtractor ext) throws java.io.IOException, OpenXML4JException, XmlException
Returns an array of text extractors, one for each of the embedded documents in the file (if there are any). If there are no embedded documents, you'll get back an empty array. Otherwise, you'll get one openPOITextExtractor
for each embedded file.- Throws:
java.io.IOException
OpenXML4JException
XmlException
-
getEmbededDocsTextExtractors
@Deprecated @Removal(version="4.2") @NotImplemented public static POITextExtractor[] getEmbededDocsTextExtractors(POIXMLTextExtractor ext)
Deprecated.Use the method with correct "embedded"Returns an array of text extractors, one for each of the embedded documents in the file (if there are any). If there are no embedded documents, you'll get back an empty array. Otherwise, you'll get one openPOITextExtractor
for each embedded file.
-
getEmbeddedDocsTextExtractors
@NotImplemented public static POITextExtractor[] getEmbeddedDocsTextExtractors(POIXMLTextExtractor ext)
Returns an array of text extractors, one for each of the embedded documents in the file (if there are any). If there are no embedded documents, you'll get back an empty array. Otherwise, you'll get one openPOITextExtractor
for each embedded file.
-
-