Class LevenshteinDistance

  • All Implemented Interfaces:
    EditDistance<java.lang.Integer>, SimilarityScore<java.lang.Integer>

    public class LevenshteinDistance
    extends java.lang.Object
    implements EditDistance<java.lang.Integer>
    An algorithm for measuring the difference between two character sequences.

    This is the number of changes needed to change one sequence into another, where each change is a single character modification (deletion, insertion or substitution).

    This code has been adapted from Apache Commons Lang 3.3.

    Since:
    1.0
    • Constructor Summary

      Constructors 
      Constructor Description
      LevenshteinDistance()
      This returns the default instance that uses a version of the algorithm that does not use a threshold parameter.
      LevenshteinDistance​(java.lang.Integer threshold)
      If the threshold is not null, distance calculations will be limited to a maximum length.
    • Method Summary

      All Methods Static Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      java.lang.Integer apply​(java.lang.CharSequence left, java.lang.CharSequence right)
      Find the Levenshtein distance between two Strings.
      static LevenshteinDistance getDefaultInstance()
      Gets the default instance.
      java.lang.Integer getThreshold()
      Gets the distance threshold.
      • Methods inherited from class java.lang.Object

        equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Constructor Detail

      • LevenshteinDistance

        public LevenshteinDistance()

        This returns the default instance that uses a version of the algorithm that does not use a threshold parameter.

        See Also:
        getDefaultInstance()
      • LevenshteinDistance

        public LevenshteinDistance​(java.lang.Integer threshold)

        If the threshold is not null, distance calculations will be limited to a maximum length. If the threshold is null, the unlimited version of the algorithm will be used.

        Parameters:
        threshold - If this is null then distances calculations will not be limited. This may not be negative.
    • Method Detail

      • apply

        public java.lang.Integer apply​(java.lang.CharSequence left,
                                       java.lang.CharSequence right)

        Find the Levenshtein distance between two Strings.

        A higher score indicates a greater distance.

        The previous implementation of the Levenshtein distance algorithm was from http://www.merriampark.com/ld.htm

        Chas Emerick has written an implementation in Java, which avoids an OutOfMemoryError which can occur when my Java implementation is used with very large strings.
        This implementation of the Levenshtein distance algorithm is from http://www.merriampark.com/ldjava.htm

         distance.apply(null, *)             = IllegalArgumentException
         distance.apply(*, null)             = IllegalArgumentException
         distance.apply("","")               = 0
         distance.apply("","a")              = 1
         distance.apply("aaapppp", "")       = 7
         distance.apply("frog", "fog")       = 1
         distance.apply("fly", "ant")        = 3
         distance.apply("elephant", "hippo") = 7
         distance.apply("hippo", "elephant") = 7
         distance.apply("hippo", "zzzzzzzz") = 8
         distance.apply("hello", "hallo")    = 1
         
        Specified by:
        apply in interface EditDistance<java.lang.Integer>
        Specified by:
        apply in interface SimilarityScore<java.lang.Integer>
        Parameters:
        left - the first string, must not be null
        right - the second string, must not be null
        Returns:
        result distance, or -1
        Throws:
        java.lang.IllegalArgumentException - if either String input null
      • getDefaultInstance

        public static LevenshteinDistance getDefaultInstance()
        Gets the default instance.
        Returns:
        The default instance
      • getThreshold

        public java.lang.Integer getThreshold()
        Gets the distance threshold.
        Returns:
        The distance threshold