Package org.apache.tika.sax
Class AbstractRecursiveParserWrapperHandler
- java.lang.Object
-
- org.xml.sax.helpers.DefaultHandler
-
- org.apache.tika.sax.AbstractRecursiveParserWrapperHandler
-
- All Implemented Interfaces:
java.io.Serializable
,org.xml.sax.ContentHandler
,org.xml.sax.DTDHandler
,org.xml.sax.EntityResolver
,org.xml.sax.ErrorHandler
- Direct Known Subclasses:
RecursiveParserWrapperHandler
public abstract class AbstractRecursiveParserWrapperHandler extends org.xml.sax.helpers.DefaultHandler implements java.io.Serializable
This is a special handler to be used only with theRecursiveParserWrapper
. It allows for finer-grained processing of embedded documents than in the legacy handlers. Subclasses can choose how to process individual embedded documents.- See Also:
- Serialized Form
-
-
Field Summary
Fields Modifier and Type Field Description static Property
CONTAINER_EXCEPTION
static Property
EMBEDDED_DEPTH
static Property
EMBEDDED_EXCEPTION
static Property
EMBEDDED_RESOURCE_LIMIT_REACHED
static Property
EMBEDDED_RESOURCE_PATH
static Property
PARSE_TIME_MILLIS
static Property
TIKA_CONTENT
static Property
TIKA_CONTENT_HANDLER
Simple class name of the content handlerstatic Property
WRITE_LIMIT_REACHED
-
Constructor Summary
Constructors Constructor Description AbstractRecursiveParserWrapperHandler(ContentHandlerFactory contentHandlerFactory)
AbstractRecursiveParserWrapperHandler(ContentHandlerFactory contentHandlerFactory, int maxEmbeddedResources)
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description void
endDocument(org.xml.sax.ContentHandler contentHandler, Metadata metadata)
This is called after the full parse has completed.void
endEmbeddedDocument(org.xml.sax.ContentHandler contentHandler, Metadata metadata)
This is called after parsing each embedded document.ContentHandlerFactory
getContentHandlerFactory()
org.xml.sax.ContentHandler
getNewContentHandler()
org.xml.sax.ContentHandler
getNewContentHandler(java.io.OutputStream os, java.nio.charset.Charset charset)
boolean
hasHitMaximumEmbeddedResources()
void
startEmbeddedDocument(org.xml.sax.ContentHandler contentHandler, Metadata metadata)
This is called before parsing each embedded document.-
Methods inherited from class org.xml.sax.helpers.DefaultHandler
characters, endDocument, endElement, endPrefixMapping, error, fatalError, ignorableWhitespace, notationDecl, processingInstruction, resolveEntity, setDocumentLocator, skippedEntity, startDocument, startElement, startPrefixMapping, unparsedEntityDecl, warning
-
-
-
-
Field Detail
-
TIKA_CONTENT
public static final Property TIKA_CONTENT
-
TIKA_CONTENT_HANDLER
public static final Property TIKA_CONTENT_HANDLER
Simple class name of the content handler
-
PARSE_TIME_MILLIS
public static final Property PARSE_TIME_MILLIS
-
WRITE_LIMIT_REACHED
public static final Property WRITE_LIMIT_REACHED
-
EMBEDDED_RESOURCE_LIMIT_REACHED
public static final Property EMBEDDED_RESOURCE_LIMIT_REACHED
-
EMBEDDED_EXCEPTION
public static final Property EMBEDDED_EXCEPTION
-
CONTAINER_EXCEPTION
public static final Property CONTAINER_EXCEPTION
-
EMBEDDED_RESOURCE_PATH
public static final Property EMBEDDED_RESOURCE_PATH
-
EMBEDDED_DEPTH
public static final Property EMBEDDED_DEPTH
-
-
Constructor Detail
-
AbstractRecursiveParserWrapperHandler
public AbstractRecursiveParserWrapperHandler(ContentHandlerFactory contentHandlerFactory)
-
AbstractRecursiveParserWrapperHandler
public AbstractRecursiveParserWrapperHandler(ContentHandlerFactory contentHandlerFactory, int maxEmbeddedResources)
-
-
Method Detail
-
getNewContentHandler
public org.xml.sax.ContentHandler getNewContentHandler()
-
getNewContentHandler
public org.xml.sax.ContentHandler getNewContentHandler(java.io.OutputStream os, java.nio.charset.Charset charset)
-
startEmbeddedDocument
public void startEmbeddedDocument(org.xml.sax.ContentHandler contentHandler, Metadata metadata) throws org.xml.sax.SAXException
This is called before parsing each embedded document. Override this for custom behavior. Make sure to call this in your custom classes because this tracks the number of embedded documents.- Parameters:
contentHandler
- local handler to be used on this embedded documentmetadata
- embedded document's metadata- Throws:
org.xml.sax.SAXException
-
endEmbeddedDocument
public void endEmbeddedDocument(org.xml.sax.ContentHandler contentHandler, Metadata metadata) throws org.xml.sax.SAXException
This is called after parsing each embedded document. Override this for custom behavior. This is currently a no-op.- Parameters:
contentHandler
- content handler that was used on this embedded documentmetadata
- metadata for this embedded document- Throws:
org.xml.sax.SAXException
-
endDocument
public void endDocument(org.xml.sax.ContentHandler contentHandler, Metadata metadata) throws org.xml.sax.SAXException
This is called after the full parse has completed. Override this for custom behavior. Make sure to call this assuper.endDocument(...)
in subclasses because this adds whether or not the embedded resource maximum has been hit to the metadata.- Parameters:
contentHandler
- content handler that was used on the main documentmetadata
- metadata that was gathered for the main document- Throws:
org.xml.sax.SAXException
-
hasHitMaximumEmbeddedResources
public boolean hasHitMaximumEmbeddedResources()
- Returns:
- whether this handler has hit the maximum embedded resources during the parse
-
getContentHandlerFactory
public ContentHandlerFactory getContentHandlerFactory()
-
-