Package opennlp.tools.tokenize
Class TokenizerFactory
- java.lang.Object
-
- opennlp.tools.util.BaseToolFactory
-
- opennlp.tools.tokenize.TokenizerFactory
-
public class TokenizerFactory extends BaseToolFactory
The factory that providesTokenizerdefault implementations and resources. Users can extend this class if their application requires overriding theTokenContextGenerator,Dictionaryetc.
-
-
Constructor Summary
Constructors Constructor Description TokenizerFactory()Creates aTokenizerFactorythat provides the default implementation of the resources.TokenizerFactory(java.lang.String languageCode, Dictionary abbreviationDictionary, boolean useAlphaNumericOptimization, java.util.regex.Pattern alphaNumericPattern)Creates aTokenizerFactory.
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description static TokenizerFactorycreate(java.lang.String subclassName, java.lang.String languageCode, Dictionary abbreviationDictionary, boolean useAlphaNumericOptimization, java.util.regex.Pattern alphaNumericPattern)Factory method the framework uses create a newTokenizerFactory.java.util.Map<java.lang.String,java.lang.Object>createArtifactMap()Creates aMapwith pairs of keys and objects.java.util.Map<java.lang.String,java.lang.String>createManifestEntries()Creates the manifest entries that will be added to the model manifestDictionarygetAbbreviationDictionary()Gets the abbreviation dictionaryjava.util.regex.PatterngetAlphaNumericPattern()Gets the alpha numeric pattern.TokenContextGeneratorgetContextGenerator()Gets the context generatorjava.lang.StringgetLanguageCode()Retrieves the language code.booleanisUseAlphaNumericOptmization()Gets whether to use alphanumeric optimization.voidvalidateArtifactMap()Validates the parsed artifacts.-
Methods inherited from class opennlp.tools.util.BaseToolFactory
create, create, createArtifactSerializersMap
-
-
-
-
Constructor Detail
-
TokenizerFactory
public TokenizerFactory()
Creates aTokenizerFactorythat provides the default implementation of the resources.
-
TokenizerFactory
public TokenizerFactory(java.lang.String languageCode, Dictionary abbreviationDictionary, boolean useAlphaNumericOptimization, java.util.regex.Pattern alphaNumericPattern)Creates aTokenizerFactory. Use this constructor to programmatically create a factory.- Parameters:
languageCode- the language of the natural textabbreviationDictionary- an abbreviations dictionaryuseAlphaNumericOptimization- if true alpha numerics are skippedalphaNumericPattern- null or a custom alphanumeric pattern (default is: "^[A-Za-z0-9]+$", provided byFactory.DEFAULT_ALPHANUMERIC
-
-
Method Detail
-
validateArtifactMap
public void validateArtifactMap() throws InvalidFormatExceptionDescription copied from class:BaseToolFactoryValidates the parsed artifacts. If something is not valid subclasses should throw anInvalidFormatException. Note: Subclasses should generally invoke super.validateArtifactMap at the beginning of this method.- Specified by:
validateArtifactMapin classBaseToolFactory- Throws:
InvalidFormatException
-
createArtifactMap
public java.util.Map<java.lang.String,java.lang.Object> createArtifactMap()
Description copied from class:BaseToolFactoryCreates aMapwith pairs of keys and objects. The models implementation should call this constructor that creates a model programmatically.The base implementation will return a
HashMapthat should be populated by sub-classes.- Overrides:
createArtifactMapin classBaseToolFactory
-
createManifestEntries
public java.util.Map<java.lang.String,java.lang.String> createManifestEntries()
Description copied from class:BaseToolFactoryCreates the manifest entries that will be added to the model manifest- Overrides:
createManifestEntriesin classBaseToolFactory- Returns:
- the manifest entries to added to the model manifest
-
create
public static TokenizerFactory create(java.lang.String subclassName, java.lang.String languageCode, Dictionary abbreviationDictionary, boolean useAlphaNumericOptimization, java.util.regex.Pattern alphaNumericPattern) throws InvalidFormatException
Factory method the framework uses create a newTokenizerFactory.- Parameters:
subclassName- the name of the class implementing theTokenizerFactorylanguageCode- the language code the tokenizer should useabbreviationDictionary- an optional dictionary containing abbreviations, or null if not presentuseAlphaNumericOptimization- indicate if the alpha numeric optimization should be enabled or disabledalphaNumericPattern- the pattern the alpha numeric optimization should use- Returns:
- the instance of the Tokenizer Factory
- Throws:
InvalidFormatException- if once of the input parameters doesn't comply if the expected format
-
getAlphaNumericPattern
public java.util.regex.Pattern getAlphaNumericPattern()
Gets the alpha numeric pattern.- Returns:
- the user specified alpha numeric pattern or a default.
-
isUseAlphaNumericOptmization
public boolean isUseAlphaNumericOptmization()
Gets whether to use alphanumeric optimization.- Returns:
- true if the alpha numeric optimization is enabled, otherwise false
-
getAbbreviationDictionary
public Dictionary getAbbreviationDictionary()
Gets the abbreviation dictionary- Returns:
- null or the abbreviation dictionary
-
getLanguageCode
public java.lang.String getLanguageCode()
Retrieves the language code.- Returns:
- the language code
-
getContextGenerator
public TokenContextGenerator getContextGenerator()
Gets the context generator- Returns:
- a new instance of the context generator
-
-