java.lang.Object
- org.apache.poi.ooxml.extractor.ExtractorFactory

```
public final class ExtractorFactory
extends java.lang.Object
```
Figures out the correct POITextExtractor for your supplied document, and returns it.
Note 1 - will fail for many file formats if the POI Scratchpad jar is not present on the runtime classpath

Note 2 - rather than using this, for most cases you would be better off switching to Apache Tika instead!

Field Summary

Fields
Modifier and Type Field Description

static java.lang.String CORE_DOCUMENT_REL

Method Summary

All Methods Static Methods Concrete Methods Deprecated Methods
Modifier and Type	Method	Description
`static <T extends POITextExtractor> T`	`createExtractor(java.io.File f)`
`static POITextExtractor`	`createExtractor(java.io.InputStream inp)`
`static POITextExtractor`	`createExtractor(OPCPackage pkg)`	Tries to determine the actual type of file and produces a matching text-extractor for it.
`static <T extends POITextExtractor> T`	`createExtractor(DirectoryNode poifsDir)`
`static <T extends POITextExtractor> T`	`createExtractor(POIFSFileSystem fs)`
`static java.lang.Boolean`	`getAllThreadsPreferEventExtractors()`	Should all threads prefer event based over usermodel based extractors? (usermodel extractors tend to be more accurate, but use more memory) Default is to use the thread level setting, which defaults to false.
`static POITextExtractor[]`	`getEmbeddedDocsTextExtractors(POIOLE2TextExtractor ext)`	Returns an array of text extractors, one for each of the embedded documents in the file (if there are any).
`static POITextExtractor[]`	`getEmbeddedDocsTextExtractors(POIXMLTextExtractor ext)`	Returns an array of text extractors, one for each of the embedded documents in the file (if there are any).
`static POITextExtractor[]`	`getEmbededDocsTextExtractors(POIOLE2TextExtractor ext)`	Deprecated. Use the method with correct "embedded"
`static POITextExtractor[]`	`getEmbededDocsTextExtractors(POIXMLTextExtractor ext)`	Deprecated. Use the method with correct "embedded"
`static boolean`	`getPreferEventExtractor()`	Should this thread use event based extractors is available? Checks the all-threads one first, then thread specific.
`static boolean`	`getThreadPrefersEventExtractors()`	Should this thread prefer event based over usermodel based extractors? (usermodel extractors tend to be more accurate, but use more memory) Default is false.
`static void`	`setAllThreadsPreferEventExtractors(java.lang.Boolean preferEventExtractors)`	Should all threads prefer event based over usermodel based extractors? If set, will take preference over the Thread level setting.
`static void`	`setThreadPrefersEventExtractors(boolean preferEventExtractors)`	Should this thread prefer event based over usermodel based extractors? Will only be used if the All Threads setting is null.

Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Field Detail
- CORE_DOCUMENT_REL
```
public static final java.lang.String CORE_DOCUMENT_REL
```
  See Also:
  
  Constant Field Values

Method Detail

getThreadPrefersEventExtractors
```
public static boolean getThreadPrefersEventExtractors()
```
Should this thread prefer event based over usermodel based extractors? (usermodel extractors tend to be more accurate, but use more memory) Default is false.

getAllThreadsPreferEventExtractors
```
public static java.lang.Boolean getAllThreadsPreferEventExtractors()
```
Should all threads prefer event based over usermodel based extractors? (usermodel extractors tend to be more accurate, but use more memory) Default is to use the thread level setting, which defaults to false.

setThreadPrefersEventExtractors
```
public static void setThreadPrefersEventExtractors(boolean preferEventExtractors)
```
Should this thread prefer event based over usermodel based extractors? Will only be used if the All Threads setting is null.

setAllThreadsPreferEventExtractors
```
public static void setAllThreadsPreferEventExtractors(java.lang.Boolean preferEventExtractors)
```
Should all threads prefer event based over usermodel based extractors? If set, will take preference over the Thread level setting.

getPreferEventExtractor
```
public static boolean getPreferEventExtractor()
```
Should this thread use event based extractors is available? Checks the all-threads one first, then thread specific.

createExtractor

public static <T extends POITextExtractor> T createExtractor(java.io.File f)
                                                      throws java.io.IOException,
                                                             OpenXML4JException,
                                                             XmlException

Throws:: java.io.IOException; OpenXML4JException; XmlException

createExtractor

public static POITextExtractor createExtractor(java.io.InputStream inp)
                                        throws java.io.IOException,
                                               OpenXML4JException,
                                               XmlException

Throws:: java.io.IOException; OpenXML4JException; XmlException

createExtractor
```
public static POITextExtractor createExtractor(OPCPackage pkg)
                                        throws java.io.IOException,
                                               OpenXML4JException,
                                               XmlException
```
Tries to determine the actual type of file and produces a matching text-extractor for it.

Parameters:

pkg - An OPCPackage.

Returns:

A POIXMLTextExtractor for the given file.

Throws:

java.io.IOException - If an error occurs while reading the file

OpenXML4JException - If an error parsing the OpenXML file format is found.

XmlException - If an XML parsing error occurs.

java.lang.IllegalArgumentException - If no matching file type could be found.

createExtractor

public static <T extends POITextExtractor> T createExtractor(POIFSFileSystem fs)
                                                      throws java.io.IOException,
                                                             OpenXML4JException,
                                                             XmlException

Throws:: java.io.IOException; OpenXML4JException; XmlException

createExtractor

public static <T extends POITextExtractor> T createExtractor(DirectoryNode poifsDir)
                                                      throws java.io.IOException,
                                                             OpenXML4JException,
                                                             XmlException

Throws:: java.io.IOException; OpenXML4JException; XmlException

getEmbededDocsTextExtractors

@Deprecated
@Removal(version="4.2")
public static POITextExtractor[] getEmbededDocsTextExtractors(POIOLE2TextExtractor ext)
                                                       throws java.io.IOException,
                                                              OpenXML4JException,
                                                              XmlException

Deprecated.

Use the method with correct "embedded"

Returns an array of text extractors, one for each of the embedded documents in the file (if there are any). If there are no embedded documents, you'll get back an empty array. Otherwise, you'll get one open POITextExtractor for each embedded file.

Throws:: java.io.IOException; OpenXML4JException; XmlException

getEmbeddedDocsTextExtractors

public static POITextExtractor[] getEmbeddedDocsTextExtractors(POIOLE2TextExtractor ext)
                                                        throws java.io.IOException,
                                                               OpenXML4JException,
                                                               XmlException

Returns an array of text extractors, one for each of the embedded documents in the file (if there are any). If there are no embedded documents, you'll get back an empty array. Otherwise, you'll get one open POITextExtractor for each embedded file.

Throws:: java.io.IOException; OpenXML4JException; XmlException

getEmbededDocsTextExtractors
```
@Deprecated
@Removal(version="4.2")
@NotImplemented
public static POITextExtractor[] getEmbededDocsTextExtractors(POIXMLTextExtractor ext)
```
Deprecated.
Use the method with correct "embedded"

Returns an array of text extractors, one for each of the embedded documents in the file (if there are any). If there are no embedded documents, you'll get back an empty array. Otherwise, you'll get one open POITextExtractor for each embedded file.

getEmbeddedDocsTextExtractors
```
@NotImplemented
public static POITextExtractor[] getEmbeddedDocsTextExtractors(POIXMLTextExtractor ext)
```
Returns an array of text extractors, one for each of the embedded documents in the file (if there are any). If there are no embedded documents, you'll get back an empty array. Otherwise, you'll get one open POITextExtractor for each embedded file.

Class ExtractorFactory

Field Summary

Method Summary

Methods inherited from class java.lang.Object

Field Detail

CORE_DOCUMENT_REL

Method Detail

getThreadPrefersEventExtractors

getAllThreadsPreferEventExtractors

setThreadPrefersEventExtractors

setAllThreadsPreferEventExtractors

getPreferEventExtractor

createExtractor

createExtractor

createExtractor

createExtractor

createExtractor

getEmbededDocsTextExtractors

getEmbeddedDocsTextExtractors

getEmbededDocsTextExtractors

getEmbeddedDocsTextExtractors