public class NGramModel extends java.lang.Object implements java.lang.Iterable<StringList>
NGramModel
can be used to crate ngrams and character ngrams.StringList
Constructor and Description |
---|
NGramModel()
Initializes an empty instance.
|
NGramModel(java.io.InputStream in)
Initializes the current instance.
|
Modifier and Type | Method and Description |
---|---|
void |
add(java.lang.String chars,
int minLength,
int maxLength)
Adds character NGrams to the current instance.
|
void |
add(StringList ngram)
Adds one NGram, if it already exists the count increase by one.
|
void |
add(StringList ngram,
int minLength,
int maxLength)
Adds NGrams up to the specified length to the current instance.
|
boolean |
contains(StringList tokens)
Checks fit he given tokens are contained by the current instance.
|
void |
cutoff(int cutoffUnder,
int cutoffOver)
Deletes all ngram which do appear less than the cutoffUnder value
and more often than the cutoffOver value.
|
boolean |
equals(java.lang.Object obj) |
int |
getCount(StringList ngram)
Retrieves the count of the given ngram.
|
int |
hashCode() |
java.util.Iterator<StringList> |
iterator()
Retrieves an
Iterator over all StringList entries. |
int |
numberOfGrams()
Retrieves the total count of all Ngrams.
|
void |
remove(StringList tokens)
Removes the specified tokens form the NGram model, they are just dropped.
|
void |
serialize(java.io.OutputStream out)
Writes the ngram instance to the given
OutputStream . |
void |
setCount(StringList ngram,
int count)
Sets the count of an existing ngram.
|
int |
size()
Retrieves the number of
StringList entries in the current instance. |
Dictionary |
toDictionary()
Creates a dictionary which contain all
StringList which
are in the current NGramModel . |
Dictionary |
toDictionary(boolean caseSensitive)
Creates a dictionary which contains all
StringList s which
are in the current NGramModel . |
java.lang.String |
toString() |
public NGramModel()
public NGramModel(java.io.InputStream in) throws java.io.IOException, InvalidFormatException
in
- java.io.IOException
InvalidFormatException
public int getCount(StringList ngram)
ngram
- public void setCount(StringList ngram, int count)
ngram
- count
- public void add(StringList ngram)
ngram
- public void add(StringList ngram, int minLength, int maxLength)
ngram
- the tokens to build the uni-grams, bi-grams, tri-grams, ..
from.minLength
- - minimal lengthmaxLength
- - maximal lengthpublic void add(java.lang.String chars, int minLength, int maxLength)
chars
- minLength
- maxLength
- public void remove(StringList tokens)
tokens
- public boolean contains(StringList tokens)
tokens
- public int size()
StringList
entries in the current instance.public java.util.Iterator<StringList> iterator()
Iterator
over all StringList
entries.iterator
in interface java.lang.Iterable<StringList>
public int numberOfGrams()
public void cutoff(int cutoffUnder, int cutoffOver)
cutoffUnder
- cutoffOver
- public Dictionary toDictionary()
StringList
which
are in the current NGramModel
.
Entries which are only different in the case are merged into one.
Calling this method is the same as calling toDictionary(boolean)
with true.public Dictionary toDictionary(boolean caseSensitive)
StringList
s which
are in the current NGramModel
.caseSensitive
- Specifies whether case distinctions should be kept in the creation of the dictionary.public void serialize(java.io.OutputStream out) throws java.io.IOException
OutputStream
.out
- java.io.IOException
- if an I/O Error during writing occurspublic boolean equals(java.lang.Object obj)
equals
in class java.lang.Object
public java.lang.String toString()
toString
in class java.lang.Object
public int hashCode()
hashCode
in class java.lang.Object
Copyright © 2010 - 2020 Adobe. All Rights Reserved