Package org.apache.tika.parser.ner.regex
Class RegexNERecogniser
- java.lang.Object
-
- org.apache.tika.parser.ner.regex.RegexNERecogniser
-
- All Implemented Interfaces:
NERecogniser
public class RegexNERecogniser extends java.lang.Object implements NERecogniser
This class offers an implementation ofNERecogniser
based on Regular Expressions.The default configuration file "ner-regex.txt" is used when no argument constructor is used to instantiate this class. The regex file is loaded via
The format of regex configuration as follows:Class.getResourceAsStream(String)
, so the file should be placed in the same package path as of this class.ENTITY_TYPE1=REGEX1 ENTITY_TYPE2=REGEX2
For example, to extract week day from text:WEEK_DAY=(?i)((sun)|(mon)|(tues)|(thurs)|(fri)|((sat)(ur)?))(day)?
- Since:
- Nov. 7, 2015
-
-
Field Summary
Fields Modifier and Type Field Description java.util.Set<java.lang.String>
entityTypes
static java.lang.String
NER_REGEX_FILE
java.util.Map<java.lang.String,java.util.regex.Pattern>
patterns
-
Fields inherited from interface org.apache.tika.parser.ner.NERecogniser
DATE, LOCATION, MISCELLANEOUS, MONEY, ORGANIZATION, PERCENT, PERSON, TIME
-
-
Constructor Summary
Constructors Constructor Description RegexNERecogniser()
RegexNERecogniser(java.io.InputStream stream)
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description java.util.Set<java.lang.String>
findMatches(java.lang.String text, java.util.regex.Pattern pattern)
finds matching sub groups in textjava.util.Set<java.lang.String>
getEntityTypes()
gets a set of entity types whose names are recognisable by thisstatic RegexNERecogniser
getInstance()
boolean
isAvailable()
checks if this Named Entity recogniser is available for servicejava.util.Map<java.lang.String,java.util.Set<java.lang.String>>
recognise(java.lang.String text)
call for name recognition action from text
-
-
-
Field Detail
-
NER_REGEX_FILE
public static final java.lang.String NER_REGEX_FILE
- See Also:
- Constant Field Values
-
entityTypes
public java.util.Set<java.lang.String> entityTypes
-
patterns
public java.util.Map<java.lang.String,java.util.regex.Pattern> patterns
-
-
Method Detail
-
getInstance
public static RegexNERecogniser getInstance()
-
isAvailable
public boolean isAvailable()
Description copied from interface:NERecogniser
checks if this Named Entity recogniser is available for service- Specified by:
isAvailable
in interfaceNERecogniser
- Returns:
- true if this recogniser is ready to recognise, false otherwise
-
getEntityTypes
public java.util.Set<java.lang.String> getEntityTypes()
Description copied from interface:NERecogniser
gets a set of entity types whose names are recognisable by this- Specified by:
getEntityTypes
in interfaceNERecogniser
- Returns:
- set of entity types/classes
-
findMatches
public java.util.Set<java.lang.String> findMatches(java.lang.String text, java.util.regex.Pattern pattern)
finds matching sub groups in text- Parameters:
text
- text containing interesting sub stringspattern
- pattern to find sub strings- Returns:
- set of sub strings if any found, or null if none found
-
recognise
public java.util.Map<java.lang.String,java.util.Set<java.lang.String>> recognise(java.lang.String text)
Description copied from interface:NERecogniser
call for name recognition action from text- Specified by:
recognise
in interfaceNERecogniser
- Parameters:
text
- text with possibly contains names- Returns:
- map of entityType -> set of names
-
-