Class JaspellTernarySearchTrie
- java.lang.Object
-
- org.apache.lucene.search.suggest.jaspell.JaspellTernarySearchTrie
-
public class JaspellTernarySearchTrie extends java.lang.ObjectImplementation of a Ternary Search Trie, a data structure for storingStringobjects that combines the compact size of a binary search tree with the speed of a digital search trie, and is therefore ideal for practical use in sorting and searching data.This data structure is faster than hashing for many typical search problems, and supports a broader range of useful problems and operations. Ternary searches are faster than hashing and more powerful, too.
The theory of ternary search trees was described at a symposium in 1997 (see "Fast Algorithms for Sorting and Searching Strings," by J.L. Bentley and R. Sedgewick, Proceedings of the 8th Annual ACM-SIAM Symposium on Discrete Algorithms, January 1997). Algorithms in C, Third Edition, by Robert Sedgewick (Addison-Wesley, 1998) provides yet another view of ternary search trees.
-
-
Constructor Summary
Constructors Constructor Description JaspellTernarySearchTrie()Constructs an empty Ternary Search Trie.JaspellTernarySearchTrie(java.io.File file)Constructs a Ternary Search Trie and loads data from aFileinto the Trie.JaspellTernarySearchTrie(java.io.File file, boolean compression)Constructs a Ternary Search Trie and loads data from aFileinto the Trie.JaspellTernarySearchTrie(java.util.Locale locale)Constructs an empty Ternary Search Trie, specifying the Locale used for lowercasing.
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description java.lang.Objectget(java.lang.CharSequence key)Retrieve the object indexed by a key.java.lang.FloatgetAndIncrement(java.lang.String key)Retrieve theFloatindexed by key, increment it by one unit and store the newFloat.org.apache.lucene.search.suggest.jaspell.JaspellTernarySearchTrie.TSTNodegetNode(java.lang.CharSequence key)Returns the node indexed by key, ornullif that node doesn't exist.java.util.List<java.lang.String>matchAlmost(java.lang.CharSequence key, int numReturnValues)Returns aListof keys that almost match the argument key.java.util.List<java.lang.String>matchAlmost(java.lang.String key)Returns aListof keys that almost match the argument key.java.util.List<java.lang.String>matchPrefix(java.lang.CharSequence prefix, int numReturnValues)Returns an alphabeticalListof all keys in the trie that begin with a given prefix.java.util.List<java.lang.String>matchPrefix(java.lang.String prefix)Returns an alphabeticalListof all keys in the trie that begin with a given prefix.intnumDataNodes()Returns the number of nodes in the trie that have non-null data.intnumNodes()Returns the total number of nodes in the trie.voidput(java.lang.CharSequence key, java.lang.Object value)Stores a value in the trie.voidremove(java.lang.String key)Removes the value indexed by key.voidsetMatchAlmostDiff(int diff)Sets the number of characters by which words can differ from target word when calling thematchAlmostmethod.voidsetNumReturnValues(int num)Sets the default maximum number of values returned from thematchPrefixandmatchAlmostmethods.
-
-
-
Constructor Detail
-
JaspellTernarySearchTrie
public JaspellTernarySearchTrie()
Constructs an empty Ternary Search Trie.
-
JaspellTernarySearchTrie
public JaspellTernarySearchTrie(java.util.Locale locale)
Constructs an empty Ternary Search Trie, specifying the Locale used for lowercasing.
-
JaspellTernarySearchTrie
public JaspellTernarySearchTrie(java.io.File file) throws java.io.IOExceptionConstructs a Ternary Search Trie and loads data from aFileinto the Trie. The file is a normal text document, where each line is of the form word TAB float.- Parameters:
file- TheFilewith the data to load into the Trie.- Throws:
java.io.IOException- A problem occured while reading the data.
-
JaspellTernarySearchTrie
public JaspellTernarySearchTrie(java.io.File file, boolean compression) throws java.io.IOExceptionConstructs a Ternary Search Trie and loads data from aFileinto the Trie. The file is a normal text document, where each line is of the form "word TAB float".- Parameters:
file- TheFilewith the data to load into the Trie.compression- If true, the file is compressed with the GZIP algorithm, and if false, the file is a normal text document.- Throws:
java.io.IOException- A problem occured while reading the data.
-
-
Method Detail
-
get
public java.lang.Object get(java.lang.CharSequence key)
Retrieve the object indexed by a key.- Parameters:
key- AStringindex.- Returns:
- The object retrieved from the Ternary Search Trie.
-
getAndIncrement
public java.lang.Float getAndIncrement(java.lang.String key)
Retrieve theFloatindexed by key, increment it by one unit and store the newFloat.- Parameters:
key- AStringindex.- Returns:
- The
Floatretrieved from the Ternary Search Trie.
-
getNode
public org.apache.lucene.search.suggest.jaspell.JaspellTernarySearchTrie.TSTNode getNode(java.lang.CharSequence key)
Returns the node indexed by key, ornullif that node doesn't exist. Search begins at root node.- Parameters:
key- AStringthat indexes the node that is returned.- Returns:
- The node object indexed by key. This object is an instance of an
inner class named
TernarySearchTrie.TSTNode.
-
matchAlmost
public java.util.List<java.lang.String> matchAlmost(java.lang.String key)
Returns aListof keys that almost match the argument key. Keys returned will have exactly diff characters that do not match the target key, where diff is equal to the last value passed in as an argument to thesetMatchAlmostDiffmethod.If the
matchAlmostmethod is called before thesetMatchAlmostDiffmethod has been called for the first time, then diff = 0.- Parameters:
key- The target key.- Returns:
- A
Listwith the results.
-
matchAlmost
public java.util.List<java.lang.String> matchAlmost(java.lang.CharSequence key, int numReturnValues)Returns aListof keys that almost match the argument key. Keys returned will have exactly diff characters that do not match the target key, where diff is equal to the last value passed in as an argument to thesetMatchAlmostDiffmethod.If the
matchAlmostmethod is called before thesetMatchAlmostDiffmethod has been called for the first time, then diff = 0.- Parameters:
key- The target key.numReturnValues- The maximum number of values returned by this method.- Returns:
- A
Listwith the results
-
matchPrefix
public java.util.List<java.lang.String> matchPrefix(java.lang.String prefix)
Returns an alphabeticalListof all keys in the trie that begin with a given prefix. Only keys for nodes having non-null data are included in theList.- Parameters:
prefix- Each key returned from this method will begin with the characters in prefix.- Returns:
- A
Listwith the results.
-
matchPrefix
public java.util.List<java.lang.String> matchPrefix(java.lang.CharSequence prefix, int numReturnValues)Returns an alphabeticalListof all keys in the trie that begin with a given prefix. Only keys for nodes having non-null data are included in theList.- Parameters:
prefix- Each key returned from this method will begin with the characters in prefix.numReturnValues- The maximum number of values returned from this method.- Returns:
- A
Listwith the results
-
numDataNodes
public int numDataNodes()
Returns the number of nodes in the trie that have non-null data.- Returns:
- The number of nodes in the trie that have non-null data.
-
numNodes
public int numNodes()
Returns the total number of nodes in the trie. The method counts nodes whether or not they have data.- Returns:
- The total number of nodes in the trie.
-
put
public void put(java.lang.CharSequence key, java.lang.Object value)Stores a value in the trie. The value may be retrieved using the key.- Parameters:
key- AStringthat indexes the object to be stored.value- The object to be stored in the Trie.
-
remove
public void remove(java.lang.String key)
Removes the value indexed by key. Also removes all nodes that are rendered unnecessary by the removal of this data.- Parameters:
key- Astringthat indexes the object to be removed from the Trie.
-
setMatchAlmostDiff
public void setMatchAlmostDiff(int diff)
Sets the number of characters by which words can differ from target word when calling thematchAlmostmethod.Arguments less than 0 will set the char difference to 0, and arguments greater than 3 will set the char difference to 3.
- Parameters:
diff- The number of characters by which words can differ from target word.
-
setNumReturnValues
public void setNumReturnValues(int num)
Sets the default maximum number of values returned from thematchPrefixandmatchAlmostmethods.The value should be set this to -1 to get an unlimited number of return values. note that the methods mentioned above provide overloaded versions that allow you to specify the maximum number of return values, in which case this value is temporarily overridden.
- Parameters:
num- The number of values that will be returned when calling the methods above.
-
-