Package org.apache.tika.sax
Class SafeContentHandler
- java.lang.Object
-
- org.xml.sax.helpers.DefaultHandler
-
- org.apache.tika.sax.ContentHandlerDecorator
-
- org.apache.tika.sax.SafeContentHandler
-
- All Implemented Interfaces:
org.xml.sax.ContentHandler,org.xml.sax.DTDHandler,org.xml.sax.EntityResolver,org.xml.sax.ErrorHandler
- Direct Known Subclasses:
XHTMLContentHandler,XMPContentHandler
public class SafeContentHandler extends ContentHandlerDecorator
Content handler decorator that makes sure that the character events (characters(char[], int, int)orignorableWhitespace(char[], int, int)) passed to the decorated content handler contain only valid XML characters. All invalid characters are replaced with the Unicode replacement character U+FFFD (though a subclass may change this by overriding thewriteReplacement(Output)method).The XML standard defines the following Unicode character ranges as valid XML characters:
#x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF]
Note that currently this class only detects those invalid characters whose UTF-16 representation fits a single char. Also, this class does not ensure that the UTF-16 encoding of incoming characters is correct.
-
-
Constructor Summary
Constructors Constructor Description SafeContentHandler(org.xml.sax.ContentHandler handler)
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description voidcharacters(char[] ch, int start, int length)voidendDocument()voidendElement(java.lang.String uri, java.lang.String localName, java.lang.String name)voidignorableWhitespace(char[] ch, int start, int length)voidstartElement(java.lang.String uri, java.lang.String localName, java.lang.String name, org.xml.sax.Attributes atts)-
Methods inherited from class org.apache.tika.sax.ContentHandlerDecorator
endPrefixMapping, processingInstruction, setDocumentLocator, skippedEntity, startDocument, startPrefixMapping, toString
-
-
-
-
Method Detail
-
startElement
public void startElement(java.lang.String uri, java.lang.String localName, java.lang.String name, org.xml.sax.Attributes atts) throws org.xml.sax.SAXException- Specified by:
startElementin interfaceorg.xml.sax.ContentHandler- Overrides:
startElementin classContentHandlerDecorator- Throws:
org.xml.sax.SAXException
-
endElement
public void endElement(java.lang.String uri, java.lang.String localName, java.lang.String name) throws org.xml.sax.SAXException- Specified by:
endElementin interfaceorg.xml.sax.ContentHandler- Overrides:
endElementin classContentHandlerDecorator- Throws:
org.xml.sax.SAXException
-
endDocument
public void endDocument() throws org.xml.sax.SAXException- Specified by:
endDocumentin interfaceorg.xml.sax.ContentHandler- Overrides:
endDocumentin classContentHandlerDecorator- Throws:
org.xml.sax.SAXException
-
characters
public void characters(char[] ch, int start, int length) throws org.xml.sax.SAXException- Specified by:
charactersin interfaceorg.xml.sax.ContentHandler- Overrides:
charactersin classContentHandlerDecorator- Throws:
org.xml.sax.SAXException
-
ignorableWhitespace
public void ignorableWhitespace(char[] ch, int start, int length) throws org.xml.sax.SAXException- Specified by:
ignorableWhitespacein interfaceorg.xml.sax.ContentHandler- Overrides:
ignorableWhitespacein classContentHandlerDecorator- Throws:
org.xml.sax.SAXException
-
-