Class FuzzyQuery

  • All Implemented Interfaces:

    public class FuzzyQuery
    extends MultiTermQuery
    Implements the fuzzy search query. The similarity measurement is based on the Damerau-Levenshtein (optimal string alignment) algorithm, though you can explicitly choose classic Levenshtein by passing false to the transpositions parameter.

    This query uses MultiTermQuery.TopTermsScoringBooleanQueryRewrite as default. So terms will be collected and scored according to their edit distance. Only the top terms are used for building the BooleanQuery. It is not recommended to change the rewrite mode for fuzzy queries.

    At most, this query will match terms up to 2 edits. Higher distances (especially with transpositions enabled), are generally not useful and will match a significant amount of the term dictionary. If you really want this, consider using an n-gram indexing technique (such as the SpellChecker in the suggest module) instead.

    NOTE: terms of length 1 or 2 will sometimes not match because of how the scaled distance between two terms is computed. For a term to match, the edit distance between the terms must be less than the minimum length term (either the input term, or the candidate term). For example, FuzzyQuery on term "abcd" with maxEdits=2 will not match an indexed term "ab", and FuzzyQuery on term "a" with maxEdits=2 will not match an indexed term "abc".

    • Field Detail

      • defaultTranspositions

        public static final boolean defaultTranspositions
        See Also:
        Constant Field Values
      • defaultMinSimilarity

        public static final float defaultMinSimilarity
        pass integer edit distances instead.
        See Also:
        Constant Field Values
    • Method Detail

      • getMaxEdits

        public int getMaxEdits()
        the maximum number of edit distances allowed for this query to match.
      • getPrefixLength

        public int getPrefixLength()
        Returns the non-fuzzy prefix length. This is the number of characters at the start of a term that must be identical (not fuzzy) to the query term if the query is to match that term.
      • getTranspositions

        public boolean getTranspositions()
        Returns true if transpositions should be treated as a primitive edit operation. If this is false, comparisons will implement the classic Levenshtein algorithm.
      • getTerm

        public Term getTerm()
        Returns the pattern term.
      • toString

        public java.lang.String toString​(java.lang.String field)
        Description copied from class: Query
        Prints a query to a string, with field assumed to be the default field and omitted.
        Specified by:
        toString in class Query
      • equals

        public boolean equals​(java.lang.Object obj)
        equals in class MultiTermQuery
      • floatToEdits

        public static int floatToEdits​(float minimumSimilarity,
                                       int termLen)
        pass integer edit distances instead.
        Helper function to convert from deprecated "minimumSimilarity" fractions to raw edit distances.
        minimumSimilarity - scaled similarity
        termLen - length (in unicode codepoints) of the term.
        equivalent number of maxEdits