Class SpellChecker

  • All Implemented Interfaces:
    java.io.Closeable, java.lang.AutoCloseable

    public class SpellChecker
    extends java.lang.Object
    implements java.io.Closeable

    Spell Checker class (Main class)
    (initially inspired by the David Spencer code).

    Example Usage:

      SpellChecker spellchecker = new SpellChecker(spellIndexDirectory);
      // To index a field of a user index:
      spellchecker.indexDictionary(new LuceneDictionary(my_lucene_reader, a_field));
      // To index a file containing words:
      spellchecker.indexDictionary(new PlainTextDictionary(new File("myfile.txt")));
      String[] suggestions = spellchecker.suggestSimilar("misspelt", 5);
     
    • Field Detail

      • F_WORD

        public static final java.lang.String F_WORD
        Field name for each word in the ngram index.
        See Also:
        Constant Field Values
    • Constructor Detail

      • SpellChecker

        public SpellChecker​(Directory spellIndex,
                            StringDistance sd)
                     throws java.io.IOException
        Use the given directory as a spell checker index. The directory is created if it doesn't exist yet.
        Parameters:
        spellIndex - the spell index directory
        sd - the StringDistance measurement to use
        Throws:
        java.io.IOException - if Spellchecker can not open the directory
      • SpellChecker

        public SpellChecker​(Directory spellIndex)
                     throws java.io.IOException
        Use the given directory as a spell checker index with a LevensteinDistance as the default StringDistance. The directory is created if it doesn't exist yet.
        Parameters:
        spellIndex - the spell index directory
        Throws:
        java.io.IOException - if spellchecker can not open the directory
      • SpellChecker

        public SpellChecker​(Directory spellIndex,
                            StringDistance sd,
                            java.util.Comparator<SuggestWord> comparator)
                     throws java.io.IOException
        Use the given directory as a spell checker index with the given StringDistance measure and the given Comparator for sorting the results.
        Parameters:
        spellIndex - The spelling index
        sd - The distance
        comparator - The comparator
        Throws:
        java.io.IOException - if there is a problem opening the index
    • Method Detail

      • setSpellIndex

        public void setSpellIndex​(Directory spellIndexDir)
                           throws java.io.IOException
        Use a different index as the spell checker index or re-open the existing index if spellIndex is the same value as given in the constructor.
        Parameters:
        spellIndexDir - the spell directory to use
        Throws:
        AlreadyClosedException - if the Spellchecker is already closed
        java.io.IOException - if spellchecker can not open the directory
      • setComparator

        public void setComparator​(java.util.Comparator<SuggestWord> comparator)
        Sets the Comparator for the SuggestWordQueue.
        Parameters:
        comparator - the comparator
      • setAccuracy

        public void setAccuracy​(float acc)
        Sets the accuracy 0 < minScore < 1; default DEFAULT_ACCURACY
        Parameters:
        acc - The new accuracy
      • suggestSimilar

        public java.lang.String[] suggestSimilar​(java.lang.String word,
                                                 int numSug)
                                          throws java.io.IOException
        Suggest similar words.

        As the Lucene similarity that is used to fetch the most relevant n-grammed terms is not the same as the edit distance strategy used to calculate the best matching spell-checked word from the hits that Lucene found, one usually has to retrieve a couple of numSug's in order to get the true best match.

        I.e. if numSug == 1, don't count on that suggestion being the best one. Thus, you should set this value to at least 5 for a good suggestion.

        Parameters:
        word - the word you want a spell check done on
        numSug - the number of suggested words
        Returns:
        String[]
        Throws:
        java.io.IOException - if the underlying index throws an IOException
        AlreadyClosedException - if the Spellchecker is already closed
        See Also:
        suggestSimilar(String, int, IndexReader, String, SuggestMode, float)
      • suggestSimilar

        public java.lang.String[] suggestSimilar​(java.lang.String word,
                                                 int numSug,
                                                 float accuracy)
                                          throws java.io.IOException
        Suggest similar words.

        As the Lucene similarity that is used to fetch the most relevant n-grammed terms is not the same as the edit distance strategy used to calculate the best matching spell-checked word from the hits that Lucene found, one usually has to retrieve a couple of numSug's in order to get the true best match.

        I.e. if numSug == 1, don't count on that suggestion being the best one. Thus, you should set this value to at least 5 for a good suggestion.

        Parameters:
        word - the word you want a spell check done on
        numSug - the number of suggested words
        accuracy - The minimum score a suggestion must have in order to qualify for inclusion in the results
        Returns:
        String[]
        Throws:
        java.io.IOException - if the underlying index throws an IOException
        AlreadyClosedException - if the Spellchecker is already closed
        See Also:
        suggestSimilar(String, int, IndexReader, String, SuggestMode, float)
      • suggestSimilar

        public java.lang.String[] suggestSimilar​(java.lang.String word,
                                                 int numSug,
                                                 IndexReader ir,
                                                 java.lang.String field,
                                                 SuggestMode suggestMode,
                                                 float accuracy)
                                          throws java.io.IOException
        Suggest similar words (optionally restricted to a field of an index).

        As the Lucene similarity that is used to fetch the most relevant n-grammed terms is not the same as the edit distance strategy used to calculate the best matching spell-checked word from the hits that Lucene found, one usually has to retrieve a couple of numSug's in order to get the true best match.

        I.e. if numSug == 1, don't count on that suggestion being the best one. Thus, you should set this value to at least 5 for a good suggestion.

        Parameters:
        word - the word you want a spell check done on
        numSug - the number of suggested words
        ir - the indexReader of the user index (can be null see field param)
        field - the field of the user index: if field is not null, the suggested words are restricted to the words present in this field.
        suggestMode - (NOTE: if indexReader==null and/or field==null, then this is overridden with SuggestMode.SUGGEST_ALWAYS)
        accuracy - The minimum score a suggestion must have in order to qualify for inclusion in the results
        Returns:
        String[] the sorted list of the suggest words with these 2 criteria: first criteria: the edit distance, second criteria (only if restricted mode): the popularity of the suggest words in the field of the user index
        Throws:
        java.io.IOException - if the underlying index throws an IOException
        AlreadyClosedException - if the Spellchecker is already closed
      • clearIndex

        public void clearIndex()
                        throws java.io.IOException
        Removes all terms from the spell check index.
        Throws:
        java.io.IOException - If there is a low-level I/O error.
        AlreadyClosedException - if the Spellchecker is already closed
      • exist

        public boolean exist​(java.lang.String word)
                      throws java.io.IOException
        Check whether the word exists in the index.
        Parameters:
        word - word to check
        Returns:
        true if the word exists in the index
        Throws:
        java.io.IOException - If there is a low-level I/O error.
        AlreadyClosedException - if the Spellchecker is already closed
      • indexDictionary

        public final void indexDictionary​(Dictionary dict,
                                          IndexWriterConfig config,
                                          boolean fullMerge)
                                   throws java.io.IOException
        Indexes the data from the given Dictionary.
        Parameters:
        dict - Dictionary to index
        config - IndexWriterConfig to use
        fullMerge - whether or not the spellcheck index should be fully merged
        Throws:
        AlreadyClosedException - if the Spellchecker is already closed
        java.io.IOException - If there is a low-level I/O error.
      • close

        public void close()
                   throws java.io.IOException
        Close the IndexSearcher used by this SpellChecker
        Specified by:
        close in interface java.lang.AutoCloseable
        Specified by:
        close in interface java.io.Closeable
        Throws:
        java.io.IOException - if the close operation causes an IOException
        AlreadyClosedException - if the SpellChecker is already closed