Class ArabicStemmer


  • public class ArabicStemmer
    extends java.lang.Object
    Stemmer for Arabic.

    Stemming is done in-place for efficiency, operating on a termbuffer.

    Stemming is defined as:

    • Removal of attached definite article, conjunction, and prepositions.
    • Stemming of common suffixes.
    • Field Summary

      Fields 
      Modifier and Type Field Description
      static char ALEF  
      static char BEH  
      static char FEH  
      static char HEH  
      static char KAF  
      static char LAM  
      static char NOON  
      static char[][] prefixes  
      static char[][] suffixes  
      static char TEH  
      static char TEH_MARBUTA  
      static char WAW  
      static char YEH  
    • Constructor Summary

      Constructors 
      Constructor Description
      ArabicStemmer()  
    • Method Summary

      All Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      int stem​(char[] s, int len)
      Stem an input buffer of Arabic text.
      int stemPrefix​(char[] s, int len)
      Stem a prefix off an Arabic word.
      int stemSuffix​(char[] s, int len)
      Stem suffix(es) off an Arabic word.
      • Methods inherited from class java.lang.Object

        equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Constructor Detail

      • ArabicStemmer

        public ArabicStemmer()
    • Method Detail

      • stem

        public int stem​(char[] s,
                        int len)
        Stem an input buffer of Arabic text.
        Parameters:
        s - input buffer
        len - length of input buffer
        Returns:
        length of input buffer after normalization
      • stemPrefix

        public int stemPrefix​(char[] s,
                              int len)
        Stem a prefix off an Arabic word.
        Parameters:
        s - input buffer
        len - length of input buffer
        Returns:
        new length of input buffer after stemming.
      • stemSuffix

        public int stemSuffix​(char[] s,
                              int len)
        Stem suffix(es) off an Arabic word.
        Parameters:
        s - input buffer
        len - length of input buffer
        Returns:
        new length of input buffer after stemming