Class HyphenationCompoundWordTokenFilter

  • All Implemented Interfaces:
    java.io.Closeable, java.lang.AutoCloseable

    public class HyphenationCompoundWordTokenFilter
    extends CompoundWordTokenFilterBase
    A TokenFilter that decomposes compound words found in many Germanic languages.

    "Donaudampfschiff" becomes Donau, dampf, schiff so that you can find "Donaudampfschiff" even when you only enter "schiff". It uses a hyphenation grammar and a word dictionary to achieve this.

    You must specify the required Version compatibility when creating CompoundWordTokenFilterBase:

    • As of 3.1, CompoundWordTokenFilterBase correctly handles Unicode 4.0 supplementary characters in strings and char arrays provided as compound word dictionaries.
    • Method Detail

      • getHyphenationTree

        public static HyphenationTree getHyphenationTree​(java.lang.String hyphenationFilename)
                                                  throws java.io.IOException
        Create a hyphenator tree
        Parameters:
        hyphenationFilename - the filename of the XML grammar to load
        Returns:
        An object representing the hyphenation patterns
        Throws:
        java.io.IOException - If there is a low-level I/O error.
      • getHyphenationTree

        public static HyphenationTree getHyphenationTree​(java.io.File hyphenationFile)
                                                  throws java.io.IOException
        Create a hyphenator tree
        Parameters:
        hyphenationFile - the file of the XML grammar to load
        Returns:
        An object representing the hyphenation patterns
        Throws:
        java.io.IOException - If there is a low-level I/O error.
      • getHyphenationTree

        public static HyphenationTree getHyphenationTree​(org.xml.sax.InputSource hyphenationSource)
                                                  throws java.io.IOException
        Create a hyphenator tree
        Parameters:
        hyphenationSource - the InputSource pointing to the XML grammar
        Returns:
        An object representing the hyphenation patterns
        Throws:
        java.io.IOException - If there is a low-level I/O error.