Class NGramDistance

  • All Implemented Interfaces:
    StringDistance

    public class NGramDistance
    extends java.lang.Object
    implements StringDistance
    N-Gram version of edit distance based on paper by Grzegorz Kondrak, "N-gram similarity and distance". Proceedings of the Twelfth International Conference on String Processing and Information Retrieval (SPIRE 2005), pp. 115-126, Buenos Aires, Argentina, November 2005. http://www.cs.ualberta.ca/~kondrak/papers/spire05.pdf This implementation uses the position-based optimization to compute partial matches of n-gram sub-strings and adds a null-character prefix of size n-1 so that the first character is contained in the same number of n-grams as a middle character. Null-character prefix matches are discounted so that strings with no matching characters will return a distance of 0.
    • Constructor Summary

      Constructors 
      Constructor Description
      NGramDistance()
      Creates an N-Gram distance measure using n-grams of size 2.
      NGramDistance​(int size)
      Creates an N-Gram distance measure using n-grams of the specified size.
    • Method Summary

      All Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      boolean equals​(java.lang.Object obj)  
      float getDistance​(java.lang.String source, java.lang.String target)
      Returns a float between 0 and 1 based on how similar the specified strings are to one another.
      int hashCode()  
      java.lang.String toString()  
      • Methods inherited from class java.lang.Object

        getClass, notify, notifyAll, wait, wait, wait
    • Constructor Detail

      • NGramDistance

        public NGramDistance​(int size)
        Creates an N-Gram distance measure using n-grams of the specified size.
        Parameters:
        size - The size of the n-gram to be used to compute the string distance.
      • NGramDistance

        public NGramDistance()
        Creates an N-Gram distance measure using n-grams of size 2.
    • Method Detail

      • getDistance

        public float getDistance​(java.lang.String source,
                                 java.lang.String target)
        Description copied from interface: StringDistance
        Returns a float between 0 and 1 based on how similar the specified strings are to one another. Returning a value of 1 means the specified strings are identical and 0 means the string are maximally different.
        Specified by:
        getDistance in interface StringDistance
        Parameters:
        source - The first string.
        target - The second string.
        Returns:
        a float between 0 and 1 based on how similar the specified strings are to one another.
      • hashCode

        public int hashCode()
        Overrides:
        hashCode in class java.lang.Object
      • equals

        public boolean equals​(java.lang.Object obj)
        Overrides:
        equals in class java.lang.Object
      • toString

        public java.lang.String toString()
        Overrides:
        toString in class java.lang.Object