public class Tika
extends java.lang.Object
Constructor and Description |
---|
Tika()
Creates a Tika facade using the default configuration.
|
Tika(Detector detector)
Creates a Tika facade using the given detector instance, the
default parser configuration, and the default Translator.
|
Tika(Detector detector,
Parser parser)
Creates a Tika facade using the given detector and parser instances, but the default Translator.
|
Tika(Detector detector,
Parser parser,
Translator translator)
Creates a Tika facade using the given detector, parser, and translator instances.
|
Tika(TikaConfig config)
Creates a Tika facade using the given configuration.
|
Modifier and Type | Method and Description |
---|---|
java.lang.String |
detect(byte[] prefix)
Detects the media type of the given document.
|
java.lang.String |
detect(byte[] prefix,
java.lang.String name)
Detects the media type of the given document.
|
java.lang.String |
detect(java.io.File file)
Detects the media type of the given file.
|
java.lang.String |
detect(java.io.InputStream stream)
Detects the media type of the given document.
|
java.lang.String |
detect(java.io.InputStream stream,
Metadata metadata)
Detects the media type of the given document.
|
java.lang.String |
detect(java.io.InputStream stream,
java.lang.String name)
Detects the media type of the given document.
|
java.lang.String |
detect(java.nio.file.Path path)
Detects the media type of the file at the given path.
|
java.lang.String |
detect(java.lang.String name)
Detects the media type of a document with the given file name.
|
java.lang.String |
detect(java.net.URL url)
Detects the media type of the resource at the given URL.
|
Detector |
getDetector()
Returns the detector instance used by this facade.
|
int |
getMaxStringLength()
Returns the maximum length of strings returned by the
parseToString methods.
|
Parser |
getParser()
Returns the parser instance used by this facade.
|
Translator |
getTranslator()
Returns the translator instance used by this facade.
|
java.io.Reader |
parse(java.io.File file)
Parses the given file and returns the extracted text content.
|
java.io.Reader |
parse(java.io.File file,
Metadata metadata)
Parses the given file and returns the extracted text content.
|
java.io.Reader |
parse(java.io.InputStream stream)
Parses the given document and returns the extracted text content.
|
java.io.Reader |
parse(java.io.InputStream stream,
Metadata metadata)
Parses the given document and returns the extracted text content.
|
java.io.Reader |
parse(java.nio.file.Path path)
Parses the file at the given path and returns the extracted text content.
|
java.io.Reader |
parse(java.nio.file.Path path,
Metadata metadata)
Parses the file at the given path and returns the extracted text content.
|
java.io.Reader |
parse(java.net.URL url)
Parses the resource at the given URL and returns the extracted
text content.
|
java.lang.String |
parseToString(java.io.File file)
Parses the given file and returns the extracted text content.
|
java.lang.String |
parseToString(java.io.InputStream stream)
Parses the given document and returns the extracted text content.
|
java.lang.String |
parseToString(java.io.InputStream stream,
Metadata metadata)
Parses the given document and returns the extracted text content.
|
java.lang.String |
parseToString(java.io.InputStream stream,
Metadata metadata,
int maxLength)
Parses the given document and returns the extracted text content.
|
java.lang.String |
parseToString(java.nio.file.Path path)
Parses the file at the given path and returns the extracted text content.
|
java.lang.String |
parseToString(java.net.URL url)
Parses the resource at the given URL and returns the extracted
text content.
|
void |
setMaxStringLength(int maxStringLength)
Sets the maximum length of strings returned by the parseToString
methods.
|
java.lang.String |
toString() |
java.lang.String |
translate(java.io.InputStream text,
java.lang.String targetLanguage)
Translate the given text InputStream to the given language, attempting to auto-detect the source language.
|
java.lang.String |
translate(java.io.InputStream text,
java.lang.String sourceLanguage,
java.lang.String targetLanguage)
Translate the given text InputStream to and from the given languages.
|
java.lang.String |
translate(java.lang.String text,
java.lang.String targetLanguage)
Translate the given text String to the given language, attempting to auto-detect the source language.
|
java.lang.String |
translate(java.lang.String text,
java.lang.String sourceLanguage,
java.lang.String targetLanguage)
Translate the given text String to and from the given languages.
|
public Tika(Detector detector, Parser parser)
detector
- type detectorparser
- document parserpublic Tika(Detector detector, Parser parser, Translator translator)
detector
- type detectorparser
- document parsertranslator
- text translatorpublic Tika(TikaConfig config)
config
- Tika configurationpublic Tika()
public Tika(Detector detector)
detector
- type detectorpublic java.lang.String detect(java.io.InputStream stream, Metadata metadata) throws java.io.IOException
null
,
in which case only the given document metadata is used for type
detection.
If the document stream supports the
mark feature
, then the stream is
marked and reset to the original position before this method returns.
Only a limited number of bytes are read from the stream.
The given document stream is not closed by this method.
Unlike in the parse(InputStream, Metadata)
method, the
given document metadata is not modified by this method.
stream
- the document stream, or null
metadata
- document metadatajava.io.IOException
- if the stream can not be readpublic java.lang.String detect(java.io.InputStream stream, java.lang.String name) throws java.io.IOException
If the document stream supports the
mark feature
, then the stream is
marked and reset to the original position before this method returns.
Only a limited number of bytes are read from the stream.
The given document stream is not closed by this method.
stream
- the document streamname
- document namejava.io.IOException
- if the stream can not be readpublic java.lang.String detect(java.io.InputStream stream) throws java.io.IOException
If the document stream supports the
mark feature
, then the stream is
marked and reset to the original position before this method returns.
Only a limited number of bytes are read from the stream.
The given document stream is not closed by this method.
stream
- the document streamjava.io.IOException
- if the stream can not be readpublic java.lang.String detect(byte[] prefix, java.lang.String name)
For best results at least a few kilobytes of the document data are needed. See also the other detect() methods for better alternatives when you have more than just the document prefix available for type detection.
prefix
- first few bytes of the documentname
- document namepublic java.lang.String detect(byte[] prefix)
For best results at least a few kilobytes of the document data are needed. See also the other detect() methods for better alternatives when you have more than just the document prefix available for type detection.
prefix
- first few bytes of the documentpublic java.lang.String detect(java.nio.file.Path path) throws java.io.IOException
Use the detect(String)
method when you want to detect the
type of the document without actually accessing the file.
path
- the path of the filejava.io.IOException
- if the file can not be readpublic java.lang.String detect(java.io.File file) throws java.io.IOException
Use the detect(String)
method when you want to detect the
type of the document without actually accessing the file.
file
- the filejava.io.IOException
- if the file can not be readdetect(Path)
public java.lang.String detect(java.net.URL url) throws java.io.IOException
Use the detect(String)
method when you want to detect the
type of the document without actually accessing the URL.
url
- the URL of the resourcejava.io.IOException
- if the resource can not be readpublic java.lang.String detect(java.lang.String name)
The given name can also be a URL or a full file path. In such cases only the file name part of the string is used for type detection.
name
- the file name of the documentpublic java.lang.String translate(java.lang.String text, java.lang.String sourceLanguage, java.lang.String targetLanguage)
text
- The text to translate.sourceLanguage
- The input text language (for example, "hi").targetLanguage
- The desired output language (for example, "fr").Translator
public java.lang.String translate(java.lang.String text, java.lang.String targetLanguage)
text
- The text to translate.targetLanguage
- The desired output language (for example, "en").Translator
public java.lang.String translate(java.io.InputStream text, java.lang.String sourceLanguage, java.lang.String targetLanguage)
text
- The text to translate.sourceLanguage
- The input text language (for example, "hi").targetLanguage
- The desired output language (for example, "fr").Translator
public java.lang.String translate(java.io.InputStream text, java.lang.String targetLanguage)
text
- The text to translate.targetLanguage
- The desired output language (for example, "en").Translator
public java.io.Reader parse(java.io.InputStream stream, Metadata metadata) throws java.io.IOException
The returned reader will be responsible for closing the given stream.
The stream and any associated resources will be closed at or before
the time when the Reader.close()
method is called.
stream
- the document to be parsedmetadata
- where document's metadata will be populatedjava.io.IOException
- if the document can not be read or parsedpublic java.io.Reader parse(java.io.InputStream stream) throws java.io.IOException
The returned reader will be responsible for closing the given stream.
The stream and any associated resources will be closed at or before
the time when the Reader.close()
method is called.
stream
- the document to be parsedjava.io.IOException
- if the document can not be read or parsedpublic java.io.Reader parse(java.nio.file.Path path, Metadata metadata) throws java.io.IOException
Metadata information extracted from the document is returned in the supplied metadata instance.
path
- the path of the file to be parsedmetadata
- where document's metadata will be populatedjava.io.IOException
- if the file can not be read or parsedpublic java.io.Reader parse(java.nio.file.Path path) throws java.io.IOException
path
- the path of the file to be parsedjava.io.IOException
- if the file can not be read or parsedpublic java.io.Reader parse(java.io.File file, Metadata metadata) throws java.io.IOException
Metadata information extracted from the document is returned in the supplied metadata instance.
file
- the file to be parsedmetadata
- where document's metadata will be populatedjava.io.IOException
- if the file can not be read or parsedparse(Path)
public java.io.Reader parse(java.io.File file) throws java.io.IOException
file
- the file to be parsedjava.io.IOException
- if the file can not be read or parsedparse(Path)
public java.io.Reader parse(java.net.URL url) throws java.io.IOException
url
- the URL of the resource to be parsedjava.io.IOException
- if the resource can not be read or parsedpublic java.lang.String parseToString(java.io.InputStream stream, Metadata metadata) throws java.io.IOException, TikaException
To avoid unpredictable excess memory use, the returned string contains
only up to getMaxStringLength()
first characters extracted
from the input document. Use the setMaxStringLength(int)
method to adjust this limitation.
NOTE: Unlike most other Tika methods that take an
InputStream
, this method will close the given stream for
you as a convenience. With other methods you are still responsible
for closing the stream or a wrapper instance returned by Tika.
stream
- the document to be parsedmetadata
- document metadatajava.io.IOException
- if the document can not be readTikaException
- if the document can not be parsedpublic java.lang.String parseToString(java.io.InputStream stream, Metadata metadata, int maxLength) throws java.io.IOException, TikaException
To avoid unpredictable excess memory use, the returned string contains only up to maxLength (parameter) first characters extracted from the input document.
NOTE: Unlike most other Tika methods that take an
InputStream
, this method will close the given stream for
you as a convenience. With other methods you are still responsible
for closing the stream or a wrapper instance returned by Tika.
stream
- the document to be parsedmetadata
- document metadatamaxLength
- maximum length of the returned stringjava.io.IOException
- if the document can not be readTikaException
- if the document can not be parsedpublic java.lang.String parseToString(java.io.InputStream stream) throws java.io.IOException, TikaException
To avoid unpredictable excess memory use, the returned string contains
only up to getMaxStringLength()
first characters extracted
from the input document. Use the setMaxStringLength(int)
method to adjust this limitation.
NOTE: Unlike most other Tika methods that take an
InputStream
, this method will close the given stream for
you as a convenience. With other methods you are still responsible
for closing the stream or a wrapper instance returned by Tika.
stream
- the document to be parsedjava.io.IOException
- if the document can not be readTikaException
- if the document can not be parsedpublic java.lang.String parseToString(java.nio.file.Path path) throws java.io.IOException, TikaException
To avoid unpredictable excess memory use, the returned string contains
only up to getMaxStringLength()
first characters extracted
from the input document. Use the setMaxStringLength(int)
method to adjust this limitation.
path
- the path of the file to be parsedjava.io.IOException
- if the file can not be readTikaException
- if the file can not be parsedpublic java.lang.String parseToString(java.io.File file) throws java.io.IOException, TikaException
To avoid unpredictable excess memory use, the returned string contains
only up to getMaxStringLength()
first characters extracted
from the input document. Use the setMaxStringLength(int)
method to adjust this limitation.
file
- the file to be parsedjava.io.IOException
- if the file can not be readTikaException
- if the file can not be parsedparseToString(Path)
public java.lang.String parseToString(java.net.URL url) throws java.io.IOException, TikaException
To avoid unpredictable excess memory use, the returned string contains
only up to getMaxStringLength()
first characters extracted
from the input document. Use the setMaxStringLength(int)
method to adjust this limitation.
url
- the URL of the resource to be parsedjava.io.IOException
- if the resource can not be readTikaException
- if the resource can not be parsedpublic int getMaxStringLength()
public void setMaxStringLength(int maxStringLength)
maxStringLength
- maximum string length,
or -1 to disable this limitpublic Parser getParser()
public Detector getDetector()
public Translator getTranslator()
public java.lang.String toString()
toString
in class java.lang.Object
Copyright © 2010 - 2020 Adobe. All Rights Reserved