Class UAX29URLEmailTokenizerImpl34
- java.lang.Object
-
- org.apache.lucene.analysis.standard.std34.UAX29URLEmailTokenizerImpl34
-
- All Implemented Interfaces:
StandardTokenizerInterface
@Deprecated public final class UAX29URLEmailTokenizerImpl34 extends java.lang.Object implements StandardTokenizerInterface
Deprecated.This class is only for exact backwards compatibilityThis class implements UAX29URLEmailTokenizer, except with a bug (https://issues.apache.org/jira/browse/LUCENE-3880) where "mailto:" URI scheme prepended to an email address will disrupt recognition of the email address.
-
-
Field Summary
Fields Modifier and Type Field Description static int
EMAIL_TYPE
Deprecated.static int
HANGUL_TYPE
Deprecated.static int
HIRAGANA_TYPE
Deprecated.static int
IDEOGRAPHIC_TYPE
Deprecated.static int
KATAKANA_TYPE
Deprecated.static int
NUMERIC_TYPE
Deprecated.Numbersstatic int
SOUTH_EAST_ASIAN_TYPE
Deprecated.Chars in class \p{Line_Break = Complex_Context} are from South East Asian scripts (Thai, Lao, Myanmar, Khmer, etc.).static int
URL_TYPE
Deprecated.static int
WORD_TYPE
Deprecated.Alphanumeric sequencesstatic int
YYEOF
Deprecated.This character denotes the end of filestatic int
YYINITIAL
Deprecated.lexical states
-
Constructor Summary
Constructors Constructor Description UAX29URLEmailTokenizerImpl34(java.io.Reader in)
Deprecated.Creates a new scanner
-
Method Summary
All Methods Instance Methods Concrete Methods Deprecated Methods Modifier and Type Method Description int
getNextToken()
Deprecated.Resumes scanning until the next regular expression is matched, the end of input is encountered or an I/O-Error occurs.void
getText(CharTermAttribute t)
Deprecated.Fills CharTermAttribute with the current token text.void
yybegin(int newState)
Deprecated.Enters a new lexical stateint
yychar()
Deprecated.Returns the current position.char
yycharat(int pos)
Deprecated.Returns the character at position pos from the matched text.void
yyclose()
Deprecated.Closes the input stream.int
yylength()
Deprecated.Returns the length of the matched text region.void
yypushback(int number)
Deprecated.Pushes the specified amount of characters back into the input stream.void
yyreset(java.io.Reader reader)
Deprecated.Resets the scanner to read from a new input stream.int
yystate()
Deprecated.Returns the current lexical state.java.lang.String
yytext()
Deprecated.Returns the text matched by the current regular expression.
-
-
-
Field Detail
-
YYEOF
public static final int YYEOF
Deprecated.This character denotes the end of file- See Also:
- Constant Field Values
-
YYINITIAL
public static final int YYINITIAL
Deprecated.lexical states- See Also:
- Constant Field Values
-
WORD_TYPE
public static final int WORD_TYPE
Deprecated.Alphanumeric sequences- See Also:
- Constant Field Values
-
NUMERIC_TYPE
public static final int NUMERIC_TYPE
Deprecated.Numbers- See Also:
- Constant Field Values
-
SOUTH_EAST_ASIAN_TYPE
public static final int SOUTH_EAST_ASIAN_TYPE
Deprecated.Chars in class \p{Line_Break = Complex_Context} are from South East Asian scripts (Thai, Lao, Myanmar, Khmer, etc.). Sequences of these are kept together as as a single token rather than broken up, because the logic required to break them at word boundaries is too complex for UAX#29.See Unicode Line Breaking Algorithm: http://www.unicode.org/reports/tr14/#SA
- See Also:
- Constant Field Values
-
IDEOGRAPHIC_TYPE
public static final int IDEOGRAPHIC_TYPE
Deprecated.- See Also:
- Constant Field Values
-
HIRAGANA_TYPE
public static final int HIRAGANA_TYPE
Deprecated.- See Also:
- Constant Field Values
-
KATAKANA_TYPE
public static final int KATAKANA_TYPE
Deprecated.- See Also:
- Constant Field Values
-
HANGUL_TYPE
public static final int HANGUL_TYPE
Deprecated.- See Also:
- Constant Field Values
-
EMAIL_TYPE
public static final int EMAIL_TYPE
Deprecated.- See Also:
- Constant Field Values
-
URL_TYPE
public static final int URL_TYPE
Deprecated.- See Also:
- Constant Field Values
-
-
Method Detail
-
yychar
public final int yychar()
Deprecated.Description copied from interface:StandardTokenizerInterface
Returns the current position.- Specified by:
yychar
in interfaceStandardTokenizerInterface
-
getText
public final void getText(CharTermAttribute t)
Deprecated.Fills CharTermAttribute with the current token text.- Specified by:
getText
in interfaceStandardTokenizerInterface
-
yyclose
public final void yyclose() throws java.io.IOException
Deprecated.Closes the input stream.- Throws:
java.io.IOException
-
yyreset
public final void yyreset(java.io.Reader reader)
Deprecated.Resets the scanner to read from a new input stream. Does not close the old reader. All internal variables are reset, the old input stream cannot be reused (internal buffer is discarded and lost). Lexical state is set to ZZ_INITIAL. Internal scan buffer is resized down to its initial length, if it has grown.- Specified by:
yyreset
in interfaceStandardTokenizerInterface
- Parameters:
reader
- the new input stream
-
yystate
public final int yystate()
Deprecated.Returns the current lexical state.
-
yybegin
public final void yybegin(int newState)
Deprecated.Enters a new lexical state- Parameters:
newState
- the new lexical state
-
yytext
public final java.lang.String yytext()
Deprecated.Returns the text matched by the current regular expression.
-
yycharat
public final char yycharat(int pos)
Deprecated.Returns the character at position pos from the matched text. It is equivalent to yytext().charAt(pos), but faster- Parameters:
pos
- the position of the character to fetch. A value from 0 to yylength()-1.- Returns:
- the character at position pos
-
yylength
public final int yylength()
Deprecated.Returns the length of the matched text region.- Specified by:
yylength
in interfaceStandardTokenizerInterface
-
yypushback
public void yypushback(int number)
Deprecated.Pushes the specified amount of characters back into the input stream. They will be read again by then next call of the scanning method- Parameters:
number
- the number of characters to be read again. This number must not be greater than yylength()!
-
getNextToken
public int getNextToken() throws java.io.IOException
Deprecated.Resumes scanning until the next regular expression is matched, the end of input is encountered or an I/O-Error occurs.- Specified by:
getNextToken
in interfaceStandardTokenizerInterface
- Returns:
- the next token
- Throws:
java.io.IOException
- if any I/O-Error occurs
-
-