Package org.apache.tika.sax
Class AbstractRecursiveParserWrapperHandler
- java.lang.Object
-
- org.xml.sax.helpers.DefaultHandler
-
- org.apache.tika.sax.AbstractRecursiveParserWrapperHandler
-
- All Implemented Interfaces:
java.io.Serializable,org.xml.sax.ContentHandler,org.xml.sax.DTDHandler,org.xml.sax.EntityResolver,org.xml.sax.ErrorHandler
- Direct Known Subclasses:
RecursiveParserWrapperHandler
public abstract class AbstractRecursiveParserWrapperHandler extends org.xml.sax.helpers.DefaultHandler implements java.io.SerializableThis is a special handler to be used only with theRecursiveParserWrapper. It allows for finer-grained processing of embedded documents than in the legacy handlers. Subclasses can choose how to process individual embedded documents.- See Also:
- Serialized Form
-
-
Field Summary
Fields Modifier and Type Field Description static PropertyCONTAINER_EXCEPTIONstatic PropertyEMBEDDED_DEPTHstatic PropertyEMBEDDED_EXCEPTIONstatic PropertyEMBEDDED_RESOURCE_LIMIT_REACHEDstatic PropertyEMBEDDED_RESOURCE_PATHstatic PropertyPARSE_TIME_MILLISstatic PropertyTIKA_CONTENTstatic PropertyTIKA_CONTENT_HANDLERSimple class name of the content handlerstatic PropertyWRITE_LIMIT_REACHED
-
Constructor Summary
Constructors Constructor Description AbstractRecursiveParserWrapperHandler(ContentHandlerFactory contentHandlerFactory)AbstractRecursiveParserWrapperHandler(ContentHandlerFactory contentHandlerFactory, int maxEmbeddedResources)
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description voidendDocument(org.xml.sax.ContentHandler contentHandler, Metadata metadata)This is called after the full parse has completed.voidendEmbeddedDocument(org.xml.sax.ContentHandler contentHandler, Metadata metadata)This is called after parsing each embedded document.ContentHandlerFactorygetContentHandlerFactory()org.xml.sax.ContentHandlergetNewContentHandler()org.xml.sax.ContentHandlergetNewContentHandler(java.io.OutputStream os, java.nio.charset.Charset charset)booleanhasHitMaximumEmbeddedResources()voidstartEmbeddedDocument(org.xml.sax.ContentHandler contentHandler, Metadata metadata)This is called before parsing each embedded document.-
Methods inherited from class org.xml.sax.helpers.DefaultHandler
characters, endDocument, endElement, endPrefixMapping, error, fatalError, ignorableWhitespace, notationDecl, processingInstruction, resolveEntity, setDocumentLocator, skippedEntity, startDocument, startElement, startPrefixMapping, unparsedEntityDecl, warning
-
-
-
-
Field Detail
-
TIKA_CONTENT
public static final Property TIKA_CONTENT
-
TIKA_CONTENT_HANDLER
public static final Property TIKA_CONTENT_HANDLER
Simple class name of the content handler
-
PARSE_TIME_MILLIS
public static final Property PARSE_TIME_MILLIS
-
WRITE_LIMIT_REACHED
public static final Property WRITE_LIMIT_REACHED
-
EMBEDDED_RESOURCE_LIMIT_REACHED
public static final Property EMBEDDED_RESOURCE_LIMIT_REACHED
-
EMBEDDED_EXCEPTION
public static final Property EMBEDDED_EXCEPTION
-
CONTAINER_EXCEPTION
public static final Property CONTAINER_EXCEPTION
-
EMBEDDED_RESOURCE_PATH
public static final Property EMBEDDED_RESOURCE_PATH
-
EMBEDDED_DEPTH
public static final Property EMBEDDED_DEPTH
-
-
Constructor Detail
-
AbstractRecursiveParserWrapperHandler
public AbstractRecursiveParserWrapperHandler(ContentHandlerFactory contentHandlerFactory)
-
AbstractRecursiveParserWrapperHandler
public AbstractRecursiveParserWrapperHandler(ContentHandlerFactory contentHandlerFactory, int maxEmbeddedResources)
-
-
Method Detail
-
getNewContentHandler
public org.xml.sax.ContentHandler getNewContentHandler()
-
getNewContentHandler
public org.xml.sax.ContentHandler getNewContentHandler(java.io.OutputStream os, java.nio.charset.Charset charset)
-
startEmbeddedDocument
public void startEmbeddedDocument(org.xml.sax.ContentHandler contentHandler, Metadata metadata) throws org.xml.sax.SAXExceptionThis is called before parsing each embedded document. Override this for custom behavior. Make sure to call this in your custom classes because this tracks the number of embedded documents.- Parameters:
contentHandler- local handler to be used on this embedded documentmetadata- embedded document's metadata- Throws:
org.xml.sax.SAXException
-
endEmbeddedDocument
public void endEmbeddedDocument(org.xml.sax.ContentHandler contentHandler, Metadata metadata) throws org.xml.sax.SAXExceptionThis is called after parsing each embedded document. Override this for custom behavior. This is currently a no-op.- Parameters:
contentHandler- content handler that was used on this embedded documentmetadata- metadata for this embedded document- Throws:
org.xml.sax.SAXException
-
endDocument
public void endDocument(org.xml.sax.ContentHandler contentHandler, Metadata metadata) throws org.xml.sax.SAXExceptionThis is called after the full parse has completed. Override this for custom behavior. Make sure to call this assuper.endDocument(...)in subclasses because this adds whether or not the embedded resource maximum has been hit to the metadata.- Parameters:
contentHandler- content handler that was used on the main documentmetadata- metadata that was gathered for the main document- Throws:
org.xml.sax.SAXException
-
hasHitMaximumEmbeddedResources
public boolean hasHitMaximumEmbeddedResources()
- Returns:
- whether this handler has hit the maximum embedded resources during the parse
-
getContentHandlerFactory
public ContentHandlerFactory getContentHandlerFactory()
-
-