Class SimpleTokenizer

  • All Implemented Interfaces:
    Tokenizer

    public class SimpleTokenizer
    extends java.lang.Object
    Performs tokenization using character classes.
    • Constructor Summary

      Constructors 
      Constructor Description
      SimpleTokenizer()
      Deprecated.
      Use INSTANCE field instead to obtain an instance, constructor will be made private in the future.
    • Method Summary

      All Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      java.lang.String[] tokenize​(java.lang.String s)
      Splits a string into its atomic parts
      Span[] tokenizePos​(java.lang.String s)
      Finds the boundaries of atomic parts in a string.
      • Methods inherited from class java.lang.Object

        equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Constructor Detail

      • SimpleTokenizer

        @Deprecated
        public SimpleTokenizer()
        Deprecated.
        Use INSTANCE field instead to obtain an instance, constructor will be made private in the future.
    • Method Detail

      • tokenizePos

        public Span[] tokenizePos​(java.lang.String s)
        Description copied from interface: Tokenizer
        Finds the boundaries of atomic parts in a string.
        Parameters:
        s - The string to be tokenized.
        Returns:
        The Span[] with the spans (offsets into s) for each token as the individuals array elements.
      • tokenize

        public java.lang.String[] tokenize​(java.lang.String s)
        Description copied from interface: Tokenizer
        Splits a string into its atomic parts
        Specified by:
        tokenize in interface Tokenizer
        Parameters:
        s - The string to be tokenized.
        Returns:
        The String[] with the individual tokens as the array elements.