Package org.apache.tika.parser.ner.regex
Class RegexNERecogniser
- java.lang.Object
-
- org.apache.tika.parser.ner.regex.RegexNERecogniser
-
- All Implemented Interfaces:
NERecogniser
public class RegexNERecogniser extends java.lang.Object implements NERecogniser
This class offers an implementation ofNERecogniserbased on Regular Expressions.The default configuration file "ner-regex.txt" is used when no argument constructor is used to instantiate this class. The regex file is loaded via
The format of regex configuration as follows:Class.getResourceAsStream(String), so the file should be placed in the same package path as of this class.ENTITY_TYPE1=REGEX1 ENTITY_TYPE2=REGEX2
For example, to extract week day from text:WEEK_DAY=(?i)((sun)|(mon)|(tues)|(thurs)|(fri)|((sat)(ur)?))(day)?
- Since:
- Nov. 7, 2015
-
-
Field Summary
Fields Modifier and Type Field Description java.util.Set<java.lang.String>entityTypesstatic java.lang.StringNER_REGEX_FILEjava.util.Map<java.lang.String,java.util.regex.Pattern>patterns-
Fields inherited from interface org.apache.tika.parser.ner.NERecogniser
DATE, LOCATION, MISCELLANEOUS, MONEY, ORGANIZATION, PERCENT, PERSON, TIME
-
-
Constructor Summary
Constructors Constructor Description RegexNERecogniser()RegexNERecogniser(java.io.InputStream stream)
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description java.util.Set<java.lang.String>findMatches(java.lang.String text, java.util.regex.Pattern pattern)finds matching sub groups in textjava.util.Set<java.lang.String>getEntityTypes()gets a set of entity types whose names are recognisable by thisstatic RegexNERecognisergetInstance()booleanisAvailable()checks if this Named Entity recogniser is available for servicejava.util.Map<java.lang.String,java.util.Set<java.lang.String>>recognise(java.lang.String text)call for name recognition action from text
-
-
-
Field Detail
-
NER_REGEX_FILE
public static final java.lang.String NER_REGEX_FILE
- See Also:
- Constant Field Values
-
entityTypes
public java.util.Set<java.lang.String> entityTypes
-
patterns
public java.util.Map<java.lang.String,java.util.regex.Pattern> patterns
-
-
Method Detail
-
getInstance
public static RegexNERecogniser getInstance()
-
isAvailable
public boolean isAvailable()
Description copied from interface:NERecogniserchecks if this Named Entity recogniser is available for service- Specified by:
isAvailablein interfaceNERecogniser- Returns:
- true if this recogniser is ready to recognise, false otherwise
-
getEntityTypes
public java.util.Set<java.lang.String> getEntityTypes()
Description copied from interface:NERecognisergets a set of entity types whose names are recognisable by this- Specified by:
getEntityTypesin interfaceNERecogniser- Returns:
- set of entity types/classes
-
findMatches
public java.util.Set<java.lang.String> findMatches(java.lang.String text, java.util.regex.Pattern pattern)finds matching sub groups in text- Parameters:
text- text containing interesting sub stringspattern- pattern to find sub strings- Returns:
- set of sub strings if any found, or null if none found
-
recognise
public java.util.Map<java.lang.String,java.util.Set<java.lang.String>> recognise(java.lang.String text)
Description copied from interface:NERecognisercall for name recognition action from text- Specified by:
recognisein interfaceNERecogniser- Parameters:
text- text with possibly contains names- Returns:
- map of entityType -> set of names
-
-