Package org.apache.lucene.analysis.ar
Class ArabicNormalizer
- java.lang.Object
-
- org.apache.lucene.analysis.ar.ArabicNormalizer
-
public class ArabicNormalizer extends java.lang.Object
Normalizer for Arabic.Normalization is done in-place for efficiency, operating on a termbuffer.
Normalization is defined as:
- Normalization of hamza with alef seat to a bare alef.
- Normalization of teh marbuta to heh
- Normalization of dotless yeh (alef maksura) to yeh.
- Removal of Arabic diacritics (the harakat)
- Removal of tatweel (stretching character).
-
-
Field Summary
Fields Modifier and Type Field Description static char
ALEF
static char
ALEF_HAMZA_ABOVE
static char
ALEF_HAMZA_BELOW
static char
ALEF_MADDA
static char
DAMMA
static char
DAMMATAN
static char
DOTLESS_YEH
static char
FATHA
static char
FATHATAN
static char
HEH
static char
KASRA
static char
KASRATAN
static char
SHADDA
static char
SUKUN
static char
TATWEEL
static char
TEH_MARBUTA
static char
YEH
-
Constructor Summary
Constructors Constructor Description ArabicNormalizer()
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description int
normalize(char[] s, int len)
Normalize an input buffer of Arabic text
-
-
-
Field Detail
-
ALEF
public static final char ALEF
- See Also:
- Constant Field Values
-
ALEF_MADDA
public static final char ALEF_MADDA
- See Also:
- Constant Field Values
-
ALEF_HAMZA_ABOVE
public static final char ALEF_HAMZA_ABOVE
- See Also:
- Constant Field Values
-
ALEF_HAMZA_BELOW
public static final char ALEF_HAMZA_BELOW
- See Also:
- Constant Field Values
-
YEH
public static final char YEH
- See Also:
- Constant Field Values
-
DOTLESS_YEH
public static final char DOTLESS_YEH
- See Also:
- Constant Field Values
-
TEH_MARBUTA
public static final char TEH_MARBUTA
- See Also:
- Constant Field Values
-
HEH
public static final char HEH
- See Also:
- Constant Field Values
-
TATWEEL
public static final char TATWEEL
- See Also:
- Constant Field Values
-
FATHATAN
public static final char FATHATAN
- See Also:
- Constant Field Values
-
DAMMATAN
public static final char DAMMATAN
- See Also:
- Constant Field Values
-
KASRATAN
public static final char KASRATAN
- See Also:
- Constant Field Values
-
FATHA
public static final char FATHA
- See Also:
- Constant Field Values
-
DAMMA
public static final char DAMMA
- See Also:
- Constant Field Values
-
KASRA
public static final char KASRA
- See Also:
- Constant Field Values
-
SHADDA
public static final char SHADDA
- See Also:
- Constant Field Values
-
SUKUN
public static final char SUKUN
- See Also:
- Constant Field Values
-
-