Icu4jEncodingDetector (The Adobe AEM Quickstart and Web Application.)

java.lang.Object
- org.apache.tika.parser.txt.Icu4jEncodingDetector

All Implemented Interfaces:: java.io.Serializable, EncodingDetector

public class Icu4jEncodingDetector
extends java.lang.Object
implements EncodingDetector

See Also:: Serialized Form

Constructor Summary

Constructors
Constructor and Description

Icu4jEncodingDetector()

Constructors
Constructor and Description
`Icu4jEncodingDetector()`

Method Summary

All Methods Instance Methods Concrete Methods
Modifier and Type	Method and Description
`java.nio.charset.Charset`	`detect(java.io.InputStream input, Metadata metadata)` Detects the character encoding of the given text document, or `null` if the encoding of the document can not be detected.
`int`	`getMarkLimit()`
`boolean`	`getStripMarkup()`
`void`	`setMarkLimit(int markLimit)` How far into the stream to read for charset detection.
`void`	`setStripMarkup(boolean stripMarkup)` Whether or not to attempt to strip html-ish markup from the stream before sending it to the underlying detector.

Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

- Constructor Detail
  - Icu4jEncodingDetector
```
public Icu4jEncodingDetector()
```
- Method Detail
  - detect
```
public java.nio.charset.Charset detect(java.io.InputStream input,
                                       Metadata metadata)
                                throws java.io.IOException
```
    Description copied from interface: EncodingDetector
    
    Detects the character encoding of the given text document, or null if the encoding of the document can not be detected.
    If the document input stream is not available, then the first argument may be null. Otherwise the detector may read bytes from the start of the stream to help in encoding detection. The given stream is guaranteed to support the mark feature and the detector is expected to mark the stream before reading any bytes from it, and to reset the stream before returning. The stream must not be closed by the detector.
    The given input metadata is only read, not modified, by the detector.
    
    Specified by:
    
    detect in interface EncodingDetector
    
    Parameters:
    
    input - text document input stream, or null
    
    metadata - input metadata for the document
    
    Returns:
    
    detected character encoding, or null
    
    Throws:
    
    java.io.IOException - if the document input stream could not be read
  - setStripMarkup
```
@Field
public void setStripMarkup(boolean stripMarkup)
```
    Whether or not to attempt to strip html-ish markup from the stream before sending it to the underlying detector. The underlying detector may still apply its own stripping if this is set to false.
    
    Parameters:
    
    stripMarkup - whether or not to attempt to strip markup before sending the stream to the underlying detector
  - getStripMarkup
```
public boolean getStripMarkup()
```
  - setMarkLimit
```
@Field
public void setMarkLimit(int markLimit)
```
    How far into the stream to read for charset detection. Default is 12000.
    
    Parameters:
    
    markLimit -
  - getMarkLimit
```
public int getMarkLimit()
```

Class Icu4jEncodingDetector

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Constructor Detail

Icu4jEncodingDetector

Method Detail

detect

setStripMarkup

getStripMarkup

setMarkLimit

getMarkLimit