Class TokenStreamToAutomaton


  • public class TokenStreamToAutomaton
    extends java.lang.Object
    Consumes a TokenStream and creates an Automaton where the transition labels are UTF8 bytes (or Unicode code points if unicodeArcs is true) from the TermToBytesRefAttribute. Between tokens we insert POS_SEP and for holes we insert HOLE.
    • Field Summary

      Fields 
      Modifier and Type Field Description
      static int HOLE
      We add this arc to represent a hole.
      static int POS_SEP
      We create transition between two adjacent tokens.
    • Method Summary

      All Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      void setPreservePositionIncrements​(boolean enablePositionIncrements)
      Whether to generate holes in the automaton for missing positions, true by default.
      void setUnicodeArcs​(boolean unicodeArcs)
      Whether to make transition labels Unicode code points instead of UTF8 bytes, false by default
      Automaton toAutomaton​(TokenStream in)
      Pulls the graph (including PositionLengthAttribute) from the provided TokenStream, and creates the corresponding automaton where arcs are bytes (or Unicode code points if unicodeArcs = true) from each term.
      • Methods inherited from class java.lang.Object

        equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Field Detail

      • POS_SEP

        public static final int POS_SEP
        We create transition between two adjacent tokens.
        See Also:
        Constant Field Values
      • HOLE

        public static final int HOLE
        We add this arc to represent a hole.
        See Also:
        Constant Field Values
    • Constructor Detail

      • TokenStreamToAutomaton

        public TokenStreamToAutomaton()
        Sole constructor.
    • Method Detail

      • setPreservePositionIncrements

        public void setPreservePositionIncrements​(boolean enablePositionIncrements)
        Whether to generate holes in the automaton for missing positions, true by default.
      • setUnicodeArcs

        public void setUnicodeArcs​(boolean unicodeArcs)
        Whether to make transition labels Unicode code points instead of UTF8 bytes, false by default
      • toAutomaton

        public Automaton toAutomaton​(TokenStream in)
                              throws java.io.IOException
        Pulls the graph (including PositionLengthAttribute) from the provided TokenStream, and creates the corresponding automaton where arcs are bytes (or Unicode code points if unicodeArcs = true) from each term.
        Throws:
        java.io.IOException