Class RegExp
- java.lang.Object
-
- org.apache.lucene.util.automaton.RegExp
-
public class RegExp extends java.lang.ObjectRegular Expression extension toAutomaton.Regular expressions are built from the following abstract syntax:
regexp ::= unionexp | unionexp ::= interexp | unionexp (union) | interexp interexp ::= concatexp & interexp (intersection) [OPTIONAL] | concatexp concatexp ::= repeatexp concatexp (concatenation) | repeatexp repeatexp ::= repeatexp ? (zero or one occurrence) | repeatexp * (zero or more occurrences) | repeatexp + (one or more occurrences) | repeatexp {n} (n occurrences) | repeatexp {n,} (n or more occurrences) | repeatexp {n,m} (n to m occurrences, including both) | complexp complexp ::= ~ complexp (complement) [OPTIONAL] | charclassexp charclassexp ::= [ charclasses ] (character class) | [^ charclasses ] (negated character class) | simpleexp charclasses ::= charclass charclasses | charclass charclass ::= charexp - charexp (character range, including end-points) | charexp simpleexp ::= charexp | . (any single character) | # (the empty language) [OPTIONAL] | @ (any string) [OPTIONAL] | " <Unicode string without double-quotes> " (a string) | ( ) (the empty string) | ( unionexp ) (precedence override) | < <identifier> > (named automaton) [OPTIONAL] | <n-m> (numerical interval) [OPTIONAL] charexp ::= <Unicode character> (a single non-reserved character) | \ <Unicode character> (a single character) The productions marked [OPTIONAL] are only allowed if specified by the syntax flags passed to the
RegExpconstructor. The reserved characters used in the (enabled) syntax must be escaped with backslash (\) or double-quotes ("..."). (In contrast to other regexp syntaxes, this is required also in character classes.) Be aware that dash (-) has a special meaning in charclass expressions. An identifier is a string not containing right angle bracket (>) or dash (-). Numerical intervals are specified by non-negative decimal integers and include both end points, and if n and m have the same number of digits, then the conforming strings must have that length (i.e. prefixed by 0's).
-
-
Field Summary
Fields Modifier and Type Field Description static intALLSyntax flag, enables all optional regexp syntax.static intANYSTRINGSyntax flag, enables anystring (@).static intAUTOMATONSyntax flag, enables named automata (<identifier>).static intCOMPLEMENTSyntax flag, enables complement (~).static intEMPTYSyntax flag, enables empty language (#).static intINTERSECTIONSyntax flag, enables intersection (&).static intINTERVALSyntax flag, enables numerical intervals ( <n-m>).static intNONESyntax flag, enables no optional regexp syntax.
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description java.util.Set<java.lang.String>getIdentifiers()Returns set of automaton identifiers that occur in this regular expression.booleansetAllowMutate(boolean flag)Sets or resets allow mutate flag.AutomatontoAutomaton()Constructs newAutomatonfrom thisRegExp.AutomatontoAutomaton(java.util.Map<java.lang.String,Automaton> automata)Constructs newAutomatonfrom thisRegExp.AutomatontoAutomaton(AutomatonProvider automaton_provider)Constructs newAutomatonfrom thisRegExp.java.lang.StringtoString()Constructs string from parsed regular expression.
-
-
-
Field Detail
-
INTERSECTION
public static final int INTERSECTION
Syntax flag, enables intersection (&).- See Also:
- Constant Field Values
-
COMPLEMENT
public static final int COMPLEMENT
Syntax flag, enables complement (~).- See Also:
- Constant Field Values
-
EMPTY
public static final int EMPTY
Syntax flag, enables empty language (#).- See Also:
- Constant Field Values
-
ANYSTRING
public static final int ANYSTRING
Syntax flag, enables anystring (@).- See Also:
- Constant Field Values
-
AUTOMATON
public static final int AUTOMATON
Syntax flag, enables named automata (<identifier>).- See Also:
- Constant Field Values
-
INTERVAL
public static final int INTERVAL
Syntax flag, enables numerical intervals ( <n-m>).- See Also:
- Constant Field Values
-
ALL
public static final int ALL
Syntax flag, enables all optional regexp syntax.- See Also:
- Constant Field Values
-
NONE
public static final int NONE
Syntax flag, enables no optional regexp syntax.- See Also:
- Constant Field Values
-
-
Constructor Detail
-
RegExp
public RegExp(java.lang.String s) throws java.lang.IllegalArgumentExceptionConstructs newRegExpfrom a string. Same asRegExp(s, ALL).- Parameters:
s- regexp string- Throws:
java.lang.IllegalArgumentException- if an error occured while parsing the regular expression
-
RegExp
public RegExp(java.lang.String s, int syntax_flags) throws java.lang.IllegalArgumentExceptionConstructs newRegExpfrom a string.- Parameters:
s- regexp stringsyntax_flags- boolean 'or' of optional syntax constructs to be enabled- Throws:
java.lang.IllegalArgumentException- if an error occured while parsing the regular expression
-
-
Method Detail
-
toAutomaton
public Automaton toAutomaton()
Constructs newAutomatonfrom thisRegExp. Same astoAutomaton(null)(empty automaton map).
-
toAutomaton
public Automaton toAutomaton(AutomatonProvider automaton_provider) throws java.lang.IllegalArgumentException
Constructs newAutomatonfrom thisRegExp. The constructed automaton is minimal and deterministic and has no transitions to dead states.- Parameters:
automaton_provider- provider of automata for named identifiers- Throws:
java.lang.IllegalArgumentException- if this regular expression uses a named identifier that is not available from the automaton provider
-
toAutomaton
public Automaton toAutomaton(java.util.Map<java.lang.String,Automaton> automata) throws java.lang.IllegalArgumentException
Constructs newAutomatonfrom thisRegExp. The constructed automaton is minimal and deterministic and has no transitions to dead states.- Parameters:
automata- a map from automaton identifiers to automata (of typeAutomaton).- Throws:
java.lang.IllegalArgumentException- if this regular expression uses a named identifier that does not occur in the automaton map
-
setAllowMutate
public boolean setAllowMutate(boolean flag)
Sets or resets allow mutate flag. If this flag is set, then automata construction uses mutable automata, which is slightly faster but not thread safe. By default, the flag is not set.- Parameters:
flag- if true, the flag is set- Returns:
- previous value of the flag
-
toString
public java.lang.String toString()
Constructs string from parsed regular expression.- Overrides:
toStringin classjava.lang.Object
-
getIdentifiers
public java.util.Set<java.lang.String> getIdentifiers()
Returns set of automaton identifiers that occur in this regular expression.
-
-