Class StandardsExtractingContentHandler

  • All Implemented Interfaces:
    org.xml.sax.ContentHandler, org.xml.sax.DTDHandler, org.xml.sax.EntityResolver, org.xml.sax.ErrorHandler

    public class StandardsExtractingContentHandler
    extends ContentHandlerDecorator
    StandardsExtractingContentHandler is a Content Handler used to extract standard references while parsing.
    • Field Detail


        public static final java.lang.String STANDARD_REFERENCES
        See Also:
        Constant Field Values
    • Constructor Detail

      • StandardsExtractingContentHandler

        public StandardsExtractingContentHandler​(org.xml.sax.ContentHandler handler,
                                                 Metadata metadata)
        Creates a decorator for the given SAX event handler and Metadata object.
        handler - SAX event handler to be decorated.
        metadata - Metadata object.
    • Method Detail

      • getThreshold

        public double getThreshold()
        Gets the threshold to be used for selecting the standard references found within the text based on their score.
        the threshold to be used for selecting the standard references found within the text based on their score.
      • setThreshold

        public void setThreshold​(double score)
        Sets the score to be used as threshold.
        score - the score to be used as threshold.
      • characters

        public void characters​(char[] ch,
                               int start,
                               int length)
                        throws org.xml.sax.SAXException
        The characters method is called whenever a Parser wants to pass raw characters to the ContentHandler. However, standard references are often split across different calls to characters, depending on the specific Parser used. Therefore, we simply add all characters to a StringBuilder and analyze it once the document is finished.
        Specified by:
        characters in interface org.xml.sax.ContentHandler
        characters in class ContentHandlerDecorator
      • endDocument

        public void endDocument()
                         throws org.xml.sax.SAXException
        This method is called whenever the Parser is done parsing the file. So, we check the output for any standard references.
        Specified by:
        endDocument in interface org.xml.sax.ContentHandler
        endDocument in class ContentHandlerDecorator