Package org.apache.tika.sax
Class SafeContentHandler
- java.lang.Object
-
- org.xml.sax.helpers.DefaultHandler
-
- org.apache.tika.sax.ContentHandlerDecorator
-
- org.apache.tika.sax.SafeContentHandler
-
- All Implemented Interfaces:
org.xml.sax.ContentHandler
,org.xml.sax.DTDHandler
,org.xml.sax.EntityResolver
,org.xml.sax.ErrorHandler
- Direct Known Subclasses:
XHTMLContentHandler
,XMPContentHandler
public class SafeContentHandler extends ContentHandlerDecorator
Content handler decorator that makes sure that the character events (characters(char[], int, int)
orignorableWhitespace(char[], int, int)
) passed to the decorated content handler contain only valid XML characters. All invalid characters are replaced with the Unicode replacement character U+FFFD (though a subclass may change this by overriding thewriteReplacement(Output)
method).The XML standard defines the following Unicode character ranges as valid XML characters:
#x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF]
Note that currently this class only detects those invalid characters whose UTF-16 representation fits a single char. Also, this class does not ensure that the UTF-16 encoding of incoming characters is correct.
-
-
Constructor Summary
Constructors Constructor Description SafeContentHandler(org.xml.sax.ContentHandler handler)
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description void
characters(char[] ch, int start, int length)
void
endDocument()
void
endElement(java.lang.String uri, java.lang.String localName, java.lang.String name)
void
ignorableWhitespace(char[] ch, int start, int length)
void
startElement(java.lang.String uri, java.lang.String localName, java.lang.String name, org.xml.sax.Attributes atts)
-
Methods inherited from class org.apache.tika.sax.ContentHandlerDecorator
endPrefixMapping, processingInstruction, setDocumentLocator, skippedEntity, startDocument, startPrefixMapping, toString
-
-
-
-
Method Detail
-
startElement
public void startElement(java.lang.String uri, java.lang.String localName, java.lang.String name, org.xml.sax.Attributes atts) throws org.xml.sax.SAXException
- Specified by:
startElement
in interfaceorg.xml.sax.ContentHandler
- Overrides:
startElement
in classContentHandlerDecorator
- Throws:
org.xml.sax.SAXException
-
endElement
public void endElement(java.lang.String uri, java.lang.String localName, java.lang.String name) throws org.xml.sax.SAXException
- Specified by:
endElement
in interfaceorg.xml.sax.ContentHandler
- Overrides:
endElement
in classContentHandlerDecorator
- Throws:
org.xml.sax.SAXException
-
endDocument
public void endDocument() throws org.xml.sax.SAXException
- Specified by:
endDocument
in interfaceorg.xml.sax.ContentHandler
- Overrides:
endDocument
in classContentHandlerDecorator
- Throws:
org.xml.sax.SAXException
-
characters
public void characters(char[] ch, int start, int length) throws org.xml.sax.SAXException
- Specified by:
characters
in interfaceorg.xml.sax.ContentHandler
- Overrides:
characters
in classContentHandlerDecorator
- Throws:
org.xml.sax.SAXException
-
ignorableWhitespace
public void ignorableWhitespace(char[] ch, int start, int length) throws org.xml.sax.SAXException
- Specified by:
ignorableWhitespace
in interfaceorg.xml.sax.ContentHandler
- Overrides:
ignorableWhitespace
in classContentHandlerDecorator
- Throws:
org.xml.sax.SAXException
-
-