public class Latin1StringsParser extends AbstractParser
AutoDetectParser parser = new AutoDetectParser(); parser.setFallback(new Latin1StringsParser());
Currently the parser does a best effort to extract Latin1 strings, used by Western European languages, encoded with ISO-8859-1, UTF-8 or UTF-16 charsets mixed within the same file. The implementation is optimized for fast parsing with only one pass.Constructor and Description |
---|
Latin1StringsParser() |
Modifier and Type | Method and Description |
---|---|
int |
getMinSize()
Returns the minimum size of a character sequence to be extracted.
|
java.util.Set<MediaType> |
getSupportedTypes(ParseContext arg0)
Returns the set of media types supported by this parser when used
with the given parse context.
|
void |
parse(java.io.InputStream stream,
org.xml.sax.ContentHandler handler,
Metadata metadata,
ParseContext context)
Parses a document stream into a sequence of XHTML SAX events.
|
void |
setMinSize(int minSize)
Sets the minimum size of a character sequence to be extracted.
|
parse
public int getMinSize()
public void setMinSize(int minSize)
minSize
- the minimum size of a character sequencepublic java.util.Set<MediaType> getSupportedTypes(ParseContext arg0)
Parser
arg0
- parse contextpublic void parse(java.io.InputStream stream, org.xml.sax.ContentHandler handler, Metadata metadata, ParseContext context) throws java.io.IOException, org.xml.sax.SAXException
Parser
The given document stream is consumed but not closed by this method. The responsibility to close the stream remains on the caller.
Information about the parsing context can be passed in the context parameter. See the parser implementations for the kinds of context information they expect.
stream
- the document stream (input)handler
- handler for the XHTML SAX events (output)metadata
- document metadata (input and output)context
- parse contextjava.io.IOException
- if the document stream could not be readorg.xml.sax.SAXException
- if the SAX events could not be processedParser.parse(java.io.InputStream,
org.xml.sax.ContentHandler, org.apache.tika.metadata.Metadata,
org.apache.tika.parser.ParseContext)
"Copyright © 2010 - 2020 Adobe Systems Incorporated. All Rights Reserved"