Package opennlp.tools.tokenize
Class WhitespaceTokenizer
- java.lang.Object
-
- opennlp.tools.tokenize.WhitespaceTokenizer
-
- All Implemented Interfaces:
Tokenizer
public class WhitespaceTokenizer extends java.lang.Object
This tokenizer uses white spaces to tokenize the input text. To obtain an instance of this tokenizer use the static finalINSTANCE
field.
-
-
Field Summary
Fields Modifier and Type Field Description static WhitespaceTokenizer
INSTANCE
Use this static reference to retrieve an instance of theWhitespaceTokenizer
.
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description java.lang.String[]
tokenize(java.lang.String s)
Splits a string into its atomic partsSpan[]
tokenizePos(java.lang.String d)
Finds the boundaries of atomic parts in a string.
-
-
-
Field Detail
-
INSTANCE
public static final WhitespaceTokenizer INSTANCE
Use this static reference to retrieve an instance of theWhitespaceTokenizer
.
-
-
Method Detail
-
tokenizePos
public Span[] tokenizePos(java.lang.String d)
Description copied from interface:Tokenizer
Finds the boundaries of atomic parts in a string.- Parameters:
d
- The string to be tokenized.- Returns:
- The Span[] with the spans (offsets into s) for each token as the individuals array elements.
-
tokenize
public java.lang.String[] tokenize(java.lang.String s)
Description copied from interface:Tokenizer
Splits a string into its atomic parts
-
-