Package org.apache.tika.utils
Class XMLReaderUtils
- java.lang.Object
-
- org.apache.tika.utils.XMLReaderUtils
-
- All Implemented Interfaces:
java.io.Serializable
public class XMLReaderUtils extends java.lang.Object implements java.io.SerializableUtility functions for reading XML. If you are doing SAX parsing, make sure to use theOfflineContentHandlerto guard against XML External Entity attacks.- See Also:
- Serialized Form
-
-
Field Summary
Fields Modifier and Type Field Description static intDEFAULT_MAX_ENTITY_EXPANSIONSstatic intDEFAULT_POOL_SIZEDefault size for the pool of SAX Parsers and the pool of DOM builders
-
Constructor Summary
Constructors Constructor Description XMLReaderUtils()
-
Method Summary
All Methods Static Methods Concrete Methods Modifier and Type Method Description static org.w3c.dom.DocumentbuildDOM(java.io.InputStream is)Builds a Document with a DocumentBuilder from the poolstatic org.w3c.dom.DocumentbuildDOM(java.io.InputStream is, ParseContext context)This checks context for a user specifiedDocumentBuilder.static org.w3c.dom.DocumentbuildDOM(java.lang.String uriString)Builds a Document with a DocumentBuilder from the poolstatic org.w3c.dom.DocumentbuildDOM(java.nio.file.Path path)Builds a Document with a DocumentBuilder from the poolstatic java.lang.StringgetAttrValue(java.lang.String localName, org.xml.sax.Attributes atts)static javax.xml.parsers.DocumentBuildergetDocumentBuilder()Returns the DOM builder specified in this parsing context.static javax.xml.parsers.DocumentBuilderFactorygetDocumentBuilderFactory()Returns the DOM builder factory specified in this parsing context.static intgetMaxEntityExpansions()static intgetPoolSize()static javax.xml.parsers.SAXParsergetSAXParser()Returns the SAX parser specified in this parsing context.static javax.xml.parsers.SAXParserFactorygetSAXParserFactory()Returns the SAX parser factory specified in this parsing context.static javax.xml.transform.TransformergetTransformer()Returns a new transformerstatic javax.xml.stream.XMLInputFactorygetXMLInputFactory()Returns the StAX input factory specified in this parsing context.static org.xml.sax.XMLReadergetXMLReader()Returns the XMLReader specified in this parsing context.static voidparseSAX(java.io.InputStream is, org.xml.sax.helpers.DefaultHandler contentHandler, ParseContext context)This checks context for a user specifiedSAXParser.static voidsetMaxEntityExpansions(int maxEntityExpansions)Set the maximum number of entity expansions allowable in SAX/DOM/StAX parsing.static voidsetPoolSize(int poolSize)Set the pool size for cached XML parsers.
-
-
-
Field Detail
-
DEFAULT_POOL_SIZE
public static final int DEFAULT_POOL_SIZE
Default size for the pool of SAX Parsers and the pool of DOM builders- See Also:
- Constant Field Values
-
DEFAULT_MAX_ENTITY_EXPANSIONS
public static final int DEFAULT_MAX_ENTITY_EXPANSIONS
- See Also:
- Constant Field Values
-
-
Method Detail
-
setMaxEntityExpansions
public static void setMaxEntityExpansions(int maxEntityExpansions)
Set the maximum number of entity expansions allowable in SAX/DOM/StAX parsing. NOTE:A value less than or equal to zero indicates no limit. This will override the system propertyJAXP_ENTITY_EXPANSION_LIMIT_KEYand theDEFAULT_MAX_ENTITY_EXPANSIONSvalue for allowable entity expansionsNOTE: To trigger a rebuild of the pool of parsers with this setting, the client must call
setPoolSize(int)to rebuild the SAX and DOM parsers with this setting.- Parameters:
maxEntityExpansions- -- maximum number of allowable entity expansions- Since:
- Apache Tika 1.19
-
getXMLReader
public static org.xml.sax.XMLReader getXMLReader() throws TikaExceptionReturns the XMLReader specified in this parsing context. If a reader is not explicitly specified, then one is created using the specified or the default SAX parser.- Returns:
- XMLReader
- Throws:
TikaException- Since:
- Apache Tika 1.13
- See Also:
getSAXParser()
-
getSAXParser
public static javax.xml.parsers.SAXParser getSAXParser() throws TikaExceptionReturns the SAX parser specified in this parsing context. If a parser is not explicitly specified, then one is created using the specified or the default SAX parser factory.Make sure to wrap your handler in the
OfflineContentHandlerto prevent XML External Entity attacksIf you call reset() on the parser, make sure to replace the SecurityManager which will be cleared by xerces2 on reset().
- Returns:
- SAX parser
- Throws:
TikaException- if a SAX parser could not be created- Since:
- Apache Tika 0.8
- See Also:
getSAXParserFactory()
-
getSAXParserFactory
public static javax.xml.parsers.SAXParserFactory getSAXParserFactory()
Returns the SAX parser factory specified in this parsing context. If a factory is not explicitly specified, then a default factory instance is created and returned. The default factory instance is configured to be namespace-aware, not validating, and to usesecure XML processing.Make sure to wrap your handler in the
OfflineContentHandlerto prevent XML External Entity attacks- Returns:
- SAX parser factory
- Since:
- Apache Tika 0.8
-
getDocumentBuilderFactory
public static javax.xml.parsers.DocumentBuilderFactory getDocumentBuilderFactory()
Returns the DOM builder factory specified in this parsing context. If a factory is not explicitly specified, then a default factory instance is created and returned. The default factory instance is configured to be namespace-aware and to apply reasonable security features.- Returns:
- DOM parser factory
- Since:
- Apache Tika 1.13
-
getDocumentBuilder
public static javax.xml.parsers.DocumentBuilder getDocumentBuilder() throws TikaExceptionReturns the DOM builder specified in this parsing context. If a builder is not explicitly specified, then a builder instance is created and returned. The builder instance is configured to apply anIGNORING_SAX_ENTITY_RESOLVER, and it sets the ErrorHandler tonull.- Returns:
- DOM Builder
- Throws:
TikaException- Since:
- Apache Tika 1.13
-
getXMLInputFactory
public static javax.xml.stream.XMLInputFactory getXMLInputFactory()
Returns the StAX input factory specified in this parsing context. If a factory is not explicitly specified, then a default factory instance is created and returned. The default factory instance is configured to be namespace-aware and to apply reasonable security using theIGNORING_STAX_ENTITY_RESOLVER.- Returns:
- StAX input factory
- Since:
- Apache Tika 1.13
-
getTransformer
public static javax.xml.transform.Transformer getTransformer() throws TikaExceptionReturns a new transformerThe transformer instance is configured to to use
secure XML processing.- Returns:
- Transformer
- Throws:
TikaException- when the transformer can not be created- Since:
- Apache Tika 1.17
-
buildDOM
public static org.w3c.dom.Document buildDOM(java.io.InputStream is, ParseContext context) throws TikaException, java.io.IOException, org.xml.sax.SAXExceptionThis checks context for a user specifiedDocumentBuilder. If one is not found, this reuses a DocumentBuilder from the pool.- Parameters:
is- InputStream to parsecontext- context to use- Returns:
- a document
- Throws:
TikaExceptionjava.io.IOExceptionorg.xml.sax.SAXException- Since:
- Apache Tika 1.19
-
buildDOM
public static org.w3c.dom.Document buildDOM(java.nio.file.Path path) throws TikaException, java.io.IOException, org.xml.sax.SAXExceptionBuilds a Document with a DocumentBuilder from the pool- Parameters:
path- path to parse- Returns:
- a document
- Throws:
TikaExceptionjava.io.IOExceptionorg.xml.sax.SAXException- Since:
- Apache Tika 1.19.1
-
buildDOM
public static org.w3c.dom.Document buildDOM(java.lang.String uriString) throws TikaException, java.io.IOException, org.xml.sax.SAXExceptionBuilds a Document with a DocumentBuilder from the pool- Parameters:
uriString- uriString to process- Returns:
- a document
- Throws:
TikaExceptionjava.io.IOExceptionorg.xml.sax.SAXException- Since:
- Apache Tika 1.19.1
-
buildDOM
public static org.w3c.dom.Document buildDOM(java.io.InputStream is) throws TikaException, java.io.IOException, org.xml.sax.SAXExceptionBuilds a Document with a DocumentBuilder from the pool- Returns:
- a document
- Throws:
TikaExceptionjava.io.IOExceptionorg.xml.sax.SAXException- Since:
- Apache Tika 1.19.1
-
parseSAX
public static void parseSAX(java.io.InputStream is, org.xml.sax.helpers.DefaultHandler contentHandler, ParseContext context) throws TikaException, java.io.IOException, org.xml.sax.SAXExceptionThis checks context for a user specifiedSAXParser. If one is not found, this reuses a SAXParser from the pool.- Parameters:
is- InputStream to parsecontentHandler- handler to usecontext- context to use- Throws:
TikaExceptionjava.io.IOExceptionorg.xml.sax.SAXException- Since:
- Apache Tika 1.19
-
setPoolSize
public static void setPoolSize(int poolSize) throws TikaExceptionSet the pool size for cached XML parsers. This has a side effect of locking the pool, and rebuilding the pool from scratch with the most recent settings, such asMAX_ENTITY_EXPANSIONS- Parameters:
poolSize-- Throws:
TikaException- Since:
- Apache Tika 1.19
-
getPoolSize
public static int getPoolSize()
-
getMaxEntityExpansions
public static int getMaxEntityExpansions()
-
getAttrValue
public static java.lang.String getAttrValue(java.lang.String localName, org.xml.sax.Attributes atts)- Parameters:
localName-atts-- Returns:
- attribute value with that local name or
nullif not found
-
-