Class SecureContentHandler

  • All Implemented Interfaces:
    org.xml.sax.ContentHandler, org.xml.sax.DTDHandler, org.xml.sax.EntityResolver, org.xml.sax.ErrorHandler

    public class SecureContentHandler
    extends ContentHandlerDecorator
    Content handler decorator that attempts to prevent denial of service attacks against Tika parsers.

    Currently this class simply compares the number of output characters to to the number of input bytes and keeps track of the XML nesting levels. An exception gets thrown if the output seems excessive compared to the input document. This is a strong indication of a zip bomb.

    Since:
    Apache Tika 0.4
    See Also:
    TIKA-216
    • Constructor Detail

      • SecureContentHandler

        public SecureContentHandler​(org.xml.sax.ContentHandler handler,
                                    TikaInputStream stream)
        Decorates the given content handler with zip bomb prevention based on the count of bytes read from the given counting input stream. The resulting decorator can be passed to a Tika parser along with the given counting input stream.
        Parameters:
        handler - the content handler to be decorated
        stream - the input stream to be parsed
    • Method Detail

      • getOutputThreshold

        public long getOutputThreshold()
        Returns the configured output threshold.
        Returns:
        output threshold
      • setOutputThreshold

        public void setOutputThreshold​(long threshold)
        Sets the threshold for output characters before the zip bomb prevention is activated. This avoids false positives in cases where an otherwise normal document for some reason starts with a highly compressible sequence of bytes.
        Parameters:
        threshold - new output threshold
      • getMaximumCompressionRatio

        public long getMaximumCompressionRatio()
        Returns the maximum compression ratio.
        Returns:
        maximum compression ratio
      • setMaximumCompressionRatio

        public void setMaximumCompressionRatio​(long ratio)
        Sets the ratio between output characters and input bytes. If this ratio is exceeded (after the output threshold has been reached) then an exception gets thrown.
        Parameters:
        ratio - new maximum compression ratio
      • getMaximumDepth

        public int getMaximumDepth()
        Returns the maximum XML element nesting level.
        Returns:
        maximum XML element nesting level
      • setMaximumPackageEntryDepth

        public void setMaximumPackageEntryDepth​(int depth)
        Sets the maximum package entry nesting level. If this depth level is exceeded then an exception gets thrown.
        Parameters:
        depth - maximum package entry nesting level
      • getMaximumPackageEntryDepth

        public int getMaximumPackageEntryDepth()
        Returns the maximum package entry nesting level.
        Returns:
        maximum package entry nesting level
      • setMaximumDepth

        public void setMaximumDepth​(int depth)
        Sets the maximum XML element nesting level. If this depth level is exceeded then an exception gets thrown.
        Parameters:
        depth - maximum XML element nesting level
      • throwIfCauseOf

        public void throwIfCauseOf​(org.xml.sax.SAXException e)
                            throws TikaException
        Converts the given SAXException to a corresponding TikaException if it's caused by this instance detecting a zip bomb.
        Parameters:
        e - SAX exception
        Throws:
        TikaException - zip bomb exception
      • startElement

        public void startElement​(java.lang.String uri,
                                 java.lang.String localName,
                                 java.lang.String name,
                                 org.xml.sax.Attributes atts)
                          throws org.xml.sax.SAXException
        Specified by:
        startElement in interface org.xml.sax.ContentHandler
        Overrides:
        startElement in class ContentHandlerDecorator
        Throws:
        org.xml.sax.SAXException
      • endElement

        public void endElement​(java.lang.String uri,
                               java.lang.String localName,
                               java.lang.String name)
                        throws org.xml.sax.SAXException
        Specified by:
        endElement in interface org.xml.sax.ContentHandler
        Overrides:
        endElement in class ContentHandlerDecorator
        Throws:
        org.xml.sax.SAXException
      • characters

        public void characters​(char[] ch,
                               int start,
                               int length)
                        throws org.xml.sax.SAXException
        Specified by:
        characters in interface org.xml.sax.ContentHandler
        Overrides:
        characters in class ContentHandlerDecorator
        Throws:
        org.xml.sax.SAXException
      • ignorableWhitespace

        public void ignorableWhitespace​(char[] ch,
                                        int start,
                                        int length)
                                 throws org.xml.sax.SAXException
        Specified by:
        ignorableWhitespace in interface org.xml.sax.ContentHandler
        Overrides:
        ignorableWhitespace in class ContentHandlerDecorator
        Throws:
        org.xml.sax.SAXException