Package org.apache.tika.extractor
Class EmbeddedDocumentUtil
- java.lang.Object
-
- org.apache.tika.extractor.EmbeddedDocumentUtil
-
- All Implemented Interfaces:
java.io.Serializable
public class EmbeddedDocumentUtil extends java.lang.Object implements java.io.SerializableUtility class to handle common issues with embedded documents. Use statically if all that is needed is getting the EmbeddedDocumentExtractor. Otherwise, instantiate an instance. Note: This is not thread safe. Make sure to instantiate one per thread.- See Also:
- Serialized Form
-
-
Constructor Summary
Constructors Constructor Description EmbeddedDocumentUtil(ParseContext context)
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Deprecated Methods Modifier and Type Method Description TikaConfiggetConfig()Deprecated.as of 1.17, usegetTikaConfig()insteadDetectorgetDetector()static EmbeddedDocumentExtractorgetEmbeddedDocumentExtractor(ParseContext context)This offers a uniform way to get an EmbeddedDocumentExtractor from a ParseContext.java.lang.StringgetExtension(TikaInputStream is, Metadata metadata)MimeTypesgetMimeTypes()PasswordProvidergetPasswordProvider()TikaConfiggetTikaConfig()voidparseEmbedded(java.io.InputStream inputStream, org.xml.sax.ContentHandler handler, Metadata metadata, boolean outputHtml)static voidrecordEmbeddedStreamException(java.lang.Throwable t, Metadata m)static voidrecordException(java.lang.Throwable t, Metadata m)booleanshouldParseEmbedded(Metadata m)static ParsertryToFindExistingLeafParser(java.lang.Class clazz, ParseContext context)Tries to find an existing parser within the ParseContext.
-
-
-
Constructor Detail
-
EmbeddedDocumentUtil
public EmbeddedDocumentUtil(ParseContext context)
-
-
Method Detail
-
getEmbeddedDocumentExtractor
public static EmbeddedDocumentExtractor getEmbeddedDocumentExtractor(ParseContext context)
This offers a uniform way to get an EmbeddedDocumentExtractor from a ParseContext. As of Tika 1.15, an AutoDetectParser will automatically be added to parse embedded documents if no Parser.class is specified in the ParseContext. If you'd prefer not to parse embedded documents, set Parser.class toEmptyParserin the ParseContext.- Parameters:
context-- Returns:
- EmbeddedDocumentExtractor
-
getPasswordProvider
public PasswordProvider getPasswordProvider()
-
getDetector
public Detector getDetector()
-
getMimeTypes
public MimeTypes getMimeTypes()
-
getTikaConfig
public TikaConfig getTikaConfig()
- Returns:
- Returns a
TikaConfig-- trying to find it first in the ParseContext that was included during initialization, and then creating a new one from viaTikaConfig.getDefaultConfig()if it can't find one in the ParseContext. This caches the default config so that it only has to be created once.
-
getExtension
public java.lang.String getExtension(TikaInputStream is, Metadata metadata)
-
getConfig
@Deprecated public TikaConfig getConfig()
Deprecated.as of 1.17, usegetTikaConfig()instead- Returns:
- Returns a
TikaConfig-- trying to find it first in the ParseContext that was included in the initialization, and then creating a new one from viaTikaConfig.getDefaultConfig()if it can't find one in the ParseContext.
-
recordException
public static void recordException(java.lang.Throwable t, Metadata m)
-
recordEmbeddedStreamException
public static void recordEmbeddedStreamException(java.lang.Throwable t, Metadata m)
-
shouldParseEmbedded
public boolean shouldParseEmbedded(Metadata m)
-
parseEmbedded
public void parseEmbedded(java.io.InputStream inputStream, org.xml.sax.ContentHandler handler, Metadata metadata, boolean outputHtml) throws java.io.IOException, org.xml.sax.SAXException- Throws:
java.io.IOExceptionorg.xml.sax.SAXException
-
tryToFindExistingLeafParser
public static Parser tryToFindExistingLeafParser(java.lang.Class clazz, ParseContext context)
Tries to find an existing parser within the ParseContext. It looks inside of CompositeParsers and ParserDecorators. The use case is when a parser needs to parse an internal stream that is _part_ of the document, e.g. rtf body inside an msg. Can returnnullif the context contains no parser or the correct parser can't be found.- Parameters:
clazz- parser class to search forcontext-- Returns:
-
-