public class StringUtil
extends java.lang.Object
Constructor and Description |
---|
StringUtil() |
Modifier and Type | Method and Description |
---|---|
static void |
computeShortestEditScript(java.lang.String wordForm,
java.lang.String lemma,
int[][] distance,
java.lang.StringBuffer permutations)
Computes the Shortest Edit Script (SES) to convert a word into its lemma.
|
static java.lang.String |
decodeShortestEditScript(java.lang.String wordForm,
java.lang.String permutations)
Read predicted SES by the lemmatizer model and apply the
permutations to obtain the lemma from the wordForm.
|
static java.lang.String |
getShortestEditScript(java.lang.String wordForm,
java.lang.String lemma)
Get the SES required to go from a word to a lemma.
|
static boolean |
isEmpty(java.lang.CharSequence theString)
Returns
true if CharSequence.length() is
0 or null . |
static boolean |
isWhitespace(char charCode)
Determines if the specified character is a whitespace.
|
static boolean |
isWhitespace(int charCode)
Determines if the specified character is a whitespace.
|
static int[][] |
levenshteinDistance(java.lang.String wordForm,
java.lang.String lemma)
Computes the Levenshtein distance of two strings in a matrix.
|
static java.lang.String |
toLowerCase(java.lang.CharSequence string)
Converts to lower case independent of the current locale via
Character.toLowerCase(int) which uses mapping information
from the UnicodeData file. |
static java.lang.String |
toUpperCase(java.lang.CharSequence string)
Converts to upper case independent of the current locale via
Character.toUpperCase(char) which uses mapping information
from the UnicodeData file. |
public static boolean isWhitespace(char charCode)
Character.isWhitespace(int)
whitespace.Character.SPACE_SEPARATOR
).Character.isWhitespace(int)
does not include no-break spaces.
In OpenNLP no-break spaces are also considered as white spaces.charCode
- public static boolean isWhitespace(int charCode)
Character.isWhitespace(int)
whitespace.Character.SPACE_SEPARATOR
).Character.isWhitespace(int)
does not include no-break spaces.
In OpenNLP no-break spaces are also considered as white spaces.charCode
- public static java.lang.String toLowerCase(java.lang.CharSequence string)
Character.toLowerCase(int)
which uses mapping information
from the UnicodeData file.string
- public static java.lang.String toUpperCase(java.lang.CharSequence string)
Character.toUpperCase(char)
which uses mapping information
from the UnicodeData file.string
- public static boolean isEmpty(java.lang.CharSequence theString)
true
if CharSequence.length()
is
0
or null
.true
if CharSequence.length()
is 0
, otherwise
false
public static int[][] levenshteinDistance(java.lang.String wordForm, java.lang.String lemma)
wordForm
- the formlemma
- the lemmapublic static void computeShortestEditScript(java.lang.String wordForm, java.lang.String lemma, int[][] distance, java.lang.StringBuffer permutations)
wordForm
- the tokenlemma
- the target lemmadistance
- the levenshtein distancepermutations
- the number of permutationspublic static java.lang.String decodeShortestEditScript(java.lang.String wordForm, java.lang.String permutations)
wordForm
- the wordFormpermutations
- the permutations predicted by the lemmatizer modelpublic static java.lang.String getShortestEditScript(java.lang.String wordForm, java.lang.String lemma)
wordForm
- the wordlemma
- the lemmaCopyright © 2010 - 2023 Adobe. All Rights Reserved