Package org.apache.tika.sax
Class StandardsExtractingContentHandler
- java.lang.Object
-
- org.xml.sax.helpers.DefaultHandler
-
- org.apache.tika.sax.ContentHandlerDecorator
-
- org.apache.tika.sax.StandardsExtractingContentHandler
-
- All Implemented Interfaces:
org.xml.sax.ContentHandler,org.xml.sax.DTDHandler,org.xml.sax.EntityResolver,org.xml.sax.ErrorHandler
public class StandardsExtractingContentHandler extends ContentHandlerDecorator
StandardsExtractingContentHandler is a Content Handler used to extract standard references while parsing.
-
-
Field Summary
Fields Modifier and Type Field Description static java.lang.StringSTANDARD_REFERENCES
-
Constructor Summary
Constructors Constructor Description StandardsExtractingContentHandler(org.xml.sax.ContentHandler handler, Metadata metadata)Creates a decorator for the given SAX event handler and Metadata object.
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description voidcharacters(char[] ch, int start, int length)The characters method is called whenever a Parser wants to pass raw characters to the ContentHandler.voidendDocument()This method is called whenever the Parser is done parsing the file.doublegetThreshold()Gets the threshold to be used for selecting the standard references found within the text based on their score.voidsetThreshold(double score)Sets the score to be used as threshold.-
Methods inherited from class org.apache.tika.sax.ContentHandlerDecorator
endElement, endPrefixMapping, ignorableWhitespace, processingInstruction, setDocumentLocator, skippedEntity, startDocument, startElement, startPrefixMapping, toString
-
-
-
-
Field Detail
-
STANDARD_REFERENCES
public static final java.lang.String STANDARD_REFERENCES
- See Also:
- Constant Field Values
-
-
Method Detail
-
getThreshold
public double getThreshold()
Gets the threshold to be used for selecting the standard references found within the text based on their score.- Returns:
- the threshold to be used for selecting the standard references found within the text based on their score.
-
setThreshold
public void setThreshold(double score)
Sets the score to be used as threshold.- Parameters:
score- the score to be used as threshold.
-
characters
public void characters(char[] ch, int start, int length) throws org.xml.sax.SAXExceptionThe characters method is called whenever a Parser wants to pass raw characters to the ContentHandler. However, standard references are often split across different calls to characters, depending on the specific Parser used. Therefore, we simply add all characters to a StringBuilder and analyze it once the document is finished.- Specified by:
charactersin interfaceorg.xml.sax.ContentHandler- Overrides:
charactersin classContentHandlerDecorator- Throws:
org.xml.sax.SAXException
-
endDocument
public void endDocument() throws org.xml.sax.SAXExceptionThis method is called whenever the Parser is done parsing the file. So, we check the output for any standard references.- Specified by:
endDocumentin interfaceorg.xml.sax.ContentHandler- Overrides:
endDocumentin classContentHandlerDecorator- Throws:
org.xml.sax.SAXException
-
-