Package opennlp.tools.ngram
Class NGramUtils
- java.lang.Object
 - 
- opennlp.tools.ngram.NGramUtils
 
 
- 
public class NGramUtils extends java.lang.ObjectUtility class for ngrams. Some methods apply specifically to certain 'n' values, for e.g. tri/bi/uni-grams. 
- 
- 
Constructor Summary
Constructors Constructor Description NGramUtils() 
- 
Method Summary
All Methods Static Methods Concrete Methods Modifier and Type Method Description static doublecalculateBigramMLProbability(java.lang.String x0, java.lang.String x1, java.util.Collection<StringList> set)calculate the probability of a bigram in a vocabulary using maximum likelihood estimationstatic doublecalculateBigramPriorSmoothingProbability(java.lang.String x0, java.lang.String x1, java.util.Collection<StringList> set, java.lang.Double k)calculate the probability of a bigram in a vocabulary using prior Laplace smoothing algorithmstatic doublecalculateLaplaceSmoothingProbability(StringList ngram, java.lang.Iterable<StringList> set, java.lang.Double k)calculate the probability of a ngram in a vocabulary using Laplace smoothing algorithmstatic doublecalculateMissingNgramProbabilityMass(StringList ngram, java.lang.Double discount, java.lang.Iterable<StringList> set)calculate the probability of a ngram in a vocabulary using the missing probability mass algorithmstatic doublecalculateNgramMLProbability(StringList ngram, java.lang.Iterable<StringList> set)calculate the probability of a ngram in a vocabulary using maximum likelihood estimationstatic doublecalculateTrigramLinearInterpolationProbability(java.lang.String x0, java.lang.String x1, java.lang.String x2, java.util.Collection<StringList> set, java.lang.Double lambda1, java.lang.Double lambda2, java.lang.Double lambda3)calculate the probability of a trigram in a vocabulary using a linear interpolation algorithmstatic doublecalculateTrigramMLProbability(java.lang.String x0, java.lang.String x1, java.lang.String x2, java.lang.Iterable<StringList> set)calculate the probability of a trigram in a vocabulary using maximum likelihood estimationstatic doublecalculateUnigramMLProbability(java.lang.String word, java.util.Collection<StringList> set)calculate the probability of a unigram in a vocabulary using maximum likelihood estimationstatic java.util.Collection<java.lang.String[]>getNGrams(java.lang.String[] sequence, int size)Get the ngrams of dimension n of a certain input sequence of tokens.static java.util.Collection<StringList>getNGrams(StringList sequence, int size)Get the ngrams of dimension n of a certain input sequence of tokens.static StringListgetNMinusOneTokenFirst(StringList ngram)get the (n-1)th ngram of a given ngram, that is the same ngram except the last word in the ngramstatic StringListgetNMinusOneTokenLast(StringList ngram)get the (n-1)th ngram of a given ngram, that is the same ngram except the first word in the ngram 
 - 
 
- 
- 
Method Detail
- 
calculateLaplaceSmoothingProbability
public static double calculateLaplaceSmoothingProbability(StringList ngram, java.lang.Iterable<StringList> set, java.lang.Double k)
calculate the probability of a ngram in a vocabulary using Laplace smoothing algorithm- Parameters:
 ngram- the ngram to get the probability forset- the vocabularyk- the smoothing factor- Returns:
 - the Laplace smoothing probability
 - See Also:
 - Additive Smoothing
 
 
- 
calculateUnigramMLProbability
public static double calculateUnigramMLProbability(java.lang.String word, java.util.Collection<StringList> set)calculate the probability of a unigram in a vocabulary using maximum likelihood estimation- Parameters:
 word- the only word in the unigramset- the vocabulary- Returns:
 - the maximum likelihood probability
 
 
- 
calculateBigramMLProbability
public static double calculateBigramMLProbability(java.lang.String x0, java.lang.String x1, java.util.Collection<StringList> set)calculate the probability of a bigram in a vocabulary using maximum likelihood estimation- Parameters:
 x0- first word in the bigramx1- second word in the bigramset- the vocabulary- Returns:
 - the maximum likelihood probability
 
 
- 
calculateTrigramMLProbability
public static double calculateTrigramMLProbability(java.lang.String x0, java.lang.String x1, java.lang.String x2, java.lang.Iterable<StringList> set)calculate the probability of a trigram in a vocabulary using maximum likelihood estimation- Parameters:
 x0- first word in the trigramx1- second word in the trigramx2- third word in the trigramset- the vocabulary- Returns:
 - the maximum likelihood probability
 
 
- 
calculateNgramMLProbability
public static double calculateNgramMLProbability(StringList ngram, java.lang.Iterable<StringList> set)
calculate the probability of a ngram in a vocabulary using maximum likelihood estimation- Parameters:
 ngram- a ngramset- the vocabulary- Returns:
 - the maximum likelihood probability
 
 
- 
calculateBigramPriorSmoothingProbability
public static double calculateBigramPriorSmoothingProbability(java.lang.String x0, java.lang.String x1, java.util.Collection<StringList> set, java.lang.Double k)calculate the probability of a bigram in a vocabulary using prior Laplace smoothing algorithm- Parameters:
 x0- the first word in the bigramx1- the second word in the bigramset- the vocabularyk- the smoothing factor- Returns:
 - the prior Laplace smoothiing probability
 
 
- 
calculateTrigramLinearInterpolationProbability
public static double calculateTrigramLinearInterpolationProbability(java.lang.String x0, java.lang.String x1, java.lang.String x2, java.util.Collection<StringList> set, java.lang.Double lambda1, java.lang.Double lambda2, java.lang.Double lambda3)calculate the probability of a trigram in a vocabulary using a linear interpolation algorithm- Parameters:
 x0- the first word in the trigramx1- the second word in the trigramx2- the third word in the trigramset- the vocabularylambda1- trigram interpolation factorlambda2- bigram interpolation factorlambda3- unigram interpolation factor- Returns:
 - the linear interpolation probability
 
 
- 
calculateMissingNgramProbabilityMass
public static double calculateMissingNgramProbabilityMass(StringList ngram, java.lang.Double discount, java.lang.Iterable<StringList> set)
calculate the probability of a ngram in a vocabulary using the missing probability mass algorithm- Parameters:
 ngram- the ngramdiscount- discount factorset- the vocabulary- Returns:
 - the probability
 
 
- 
getNMinusOneTokenFirst
public static StringList getNMinusOneTokenFirst(StringList ngram)
get the (n-1)th ngram of a given ngram, that is the same ngram except the last word in the ngram- Parameters:
 ngram- a ngram- Returns:
 - a ngram
 
 
- 
getNMinusOneTokenLast
public static StringList getNMinusOneTokenLast(StringList ngram)
get the (n-1)th ngram of a given ngram, that is the same ngram except the first word in the ngram- Parameters:
 ngram- a ngram- Returns:
 - a ngram
 
 
- 
getNGrams
public static java.util.Collection<StringList> getNGrams(StringList sequence, int size)
Get the ngrams of dimension n of a certain input sequence of tokens.- Parameters:
 sequence- a sequence of tokenssize- the size of the resulting ngrmams- Returns:
 - all the possible ngrams of the given size derivable from the input sequence
 
 
- 
getNGrams
public static java.util.Collection<java.lang.String[]> getNGrams(java.lang.String[] sequence, int size)Get the ngrams of dimension n of a certain input sequence of tokens.- Parameters:
 sequence- a sequence of tokenssize- the size of the resulting ngrmams- Returns:
 - all the possible ngrams of the given size derivable from the input sequence
 
 
 - 
 
 -