Class PostingsHighlighter
- java.lang.Object
-
- org.apache.lucene.search.postingshighlight.PostingsHighlighter
-
public class PostingsHighlighter extends java.lang.Object
Simple highlighter that does not analyze fields nor use term vectors. Instead it requiresFieldInfo.IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS
.PostingsHighlighter treats the single original document as the whole corpus, and then scores individual passages as if they were documents in this corpus. It uses a
BreakIterator
to find passages in the text; by default it breaks usinggetSentenceInstance(Locale.ROOT)
. It then iterates in parallel (merge sorting by offset) through the positions of all terms from the query, coalescing those hits that occur in a single passage into aPassage
, and then scores each Passage using a separatePassageScorer
. Passages are finally formatted into highlighted snippets with aPassageFormatter
.You can customize the behavior by subclassing this highlighter, some important hooks:
getBreakIterator(String)
: Customize how the text is divided into passages.getScorer(String)
: Customize how passages are ranked.getFormatter(String)
: Customize how snippets are formatted.getIndexAnalyzer(String)
: Enable highlighting of MultiTermQuerys such asWildcardQuery
.
WARNING: The code is very new and probably still has some exciting bugs!
Example usage:
// configure field with offsets at index time FieldType offsetsType = new FieldType(TextField.TYPE_STORED); offsetsType.setIndexOptions(IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS); Field body = new Field("body", "foobar", offsetsType); // retrieve highlights at query time PostingsHighlighter highlighter = new PostingsHighlighter(); Query query = new TermQuery(new Term("body", "highlighting")); TopDocs topDocs = searcher.search(query, n); String highlights[] = highlighter.highlight("body", query, searcher, topDocs);
This is thread-safe, and can be used across different readers.
-
-
Field Summary
Fields Modifier and Type Field Description static int
DEFAULT_MAX_LENGTH
Default maximum content size to process.
-
Constructor Summary
Constructors Constructor Description PostingsHighlighter()
Creates a new highlighter withDEFAULT_MAX_LENGTH
.PostingsHighlighter(int maxLength)
Creates a new highlighter, specifying maximum content length.
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description java.lang.String[]
highlight(java.lang.String field, Query query, IndexSearcher searcher, TopDocs topDocs)
Highlights the top passages from a single field.java.lang.String[]
highlight(java.lang.String field, Query query, IndexSearcher searcher, TopDocs topDocs, int maxPassages)
Highlights the top-N passages from a single field.java.util.Map<java.lang.String,java.lang.String[]>
highlightFields(java.lang.String[] fieldsIn, Query query, IndexSearcher searcher, int[] docidsIn, int[] maxPassagesIn)
Highlights the top-N passages from multiple fields, for the provided int[] docids.java.util.Map<java.lang.String,java.lang.String[]>
highlightFields(java.lang.String[] fields, Query query, IndexSearcher searcher, TopDocs topDocs)
Highlights the top passages from multiple fields.java.util.Map<java.lang.String,java.lang.String[]>
highlightFields(java.lang.String[] fields, Query query, IndexSearcher searcher, TopDocs topDocs, int[] maxPassages)
Highlights the top-N passages from multiple fields.
-
-
-
Field Detail
-
DEFAULT_MAX_LENGTH
public static final int DEFAULT_MAX_LENGTH
Default maximum content size to process. Typically snippets closer to the beginning of the document better summarize its content- See Also:
- Constant Field Values
-
-
Constructor Detail
-
PostingsHighlighter
public PostingsHighlighter()
Creates a new highlighter withDEFAULT_MAX_LENGTH
.
-
PostingsHighlighter
public PostingsHighlighter(int maxLength)
Creates a new highlighter, specifying maximum content length.- Parameters:
maxLength
- maximum content size to process.- Throws:
java.lang.IllegalArgumentException
- ifmaxLength
is negative orInteger.MAX_VALUE
-
-
Method Detail
-
highlight
public java.lang.String[] highlight(java.lang.String field, Query query, IndexSearcher searcher, TopDocs topDocs) throws java.io.IOException
Highlights the top passages from a single field.- Parameters:
field
- field name to highlight. Must have a stored string value and also be indexed with offsets.query
- query to highlight.searcher
- searcher that was previously used to execute the query.topDocs
- TopDocs containing the summary result documents to highlight.- Returns:
- Array of formatted snippets corresponding to the documents in
topDocs
. If no highlights were found for a document, the first sentence for the field will be returned. - Throws:
java.io.IOException
- if an I/O error occurred during processingjava.lang.IllegalArgumentException
- iffield
was indexed withoutFieldInfo.IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS
-
highlight
public java.lang.String[] highlight(java.lang.String field, Query query, IndexSearcher searcher, TopDocs topDocs, int maxPassages) throws java.io.IOException
Highlights the top-N passages from a single field.- Parameters:
field
- field name to highlight. Must have a stored string value and also be indexed with offsets.query
- query to highlight.searcher
- searcher that was previously used to execute the query.topDocs
- TopDocs containing the summary result documents to highlight.maxPassages
- The maximum number of top-N ranked passages used to form the highlighted snippets.- Returns:
- Array of formatted snippets corresponding to the documents in
topDocs
. If no highlights were found for a document, the firstmaxPassages
sentences from the field will be returned. - Throws:
java.io.IOException
- if an I/O error occurred during processingjava.lang.IllegalArgumentException
- iffield
was indexed withoutFieldInfo.IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS
-
highlightFields
public java.util.Map<java.lang.String,java.lang.String[]> highlightFields(java.lang.String[] fields, Query query, IndexSearcher searcher, TopDocs topDocs) throws java.io.IOException
Highlights the top passages from multiple fields.Conceptually, this behaves as a more efficient form of:
Map m = new HashMap(); for (String field : fields) { m.put(field, highlight(field, query, searcher, topDocs)); } return m;
- Parameters:
fields
- field names to highlight. Must have a stored string value and also be indexed with offsets.query
- query to highlight.searcher
- searcher that was previously used to execute the query.topDocs
- TopDocs containing the summary result documents to highlight.- Returns:
- Map keyed on field name, containing the array of formatted snippets
corresponding to the documents in
topDocs
. If no highlights were found for a document, the first sentence from the field will be returned. - Throws:
java.io.IOException
- if an I/O error occurred during processingjava.lang.IllegalArgumentException
- iffield
was indexed withoutFieldInfo.IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS
-
highlightFields
public java.util.Map<java.lang.String,java.lang.String[]> highlightFields(java.lang.String[] fields, Query query, IndexSearcher searcher, TopDocs topDocs, int[] maxPassages) throws java.io.IOException
Highlights the top-N passages from multiple fields.Conceptually, this behaves as a more efficient form of:
Map m = new HashMap(); for (String field : fields) { m.put(field, highlight(field, query, searcher, topDocs, maxPassages)); } return m;
- Parameters:
fields
- field names to highlight. Must have a stored string value and also be indexed with offsets.query
- query to highlight.searcher
- searcher that was previously used to execute the query.topDocs
- TopDocs containing the summary result documents to highlight.maxPassages
- The maximum number of top-N ranked passages per-field used to form the highlighted snippets.- Returns:
- Map keyed on field name, containing the array of formatted snippets
corresponding to the documents in
topDocs
. If no highlights were found for a document, the firstmaxPassages
sentences from the field will be returned. - Throws:
java.io.IOException
- if an I/O error occurred during processingjava.lang.IllegalArgumentException
- iffield
was indexed withoutFieldInfo.IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS
-
highlightFields
public java.util.Map<java.lang.String,java.lang.String[]> highlightFields(java.lang.String[] fieldsIn, Query query, IndexSearcher searcher, int[] docidsIn, int[] maxPassagesIn) throws java.io.IOException
Highlights the top-N passages from multiple fields, for the provided int[] docids.- Parameters:
fieldsIn
- field names to highlight. Must have a stored string value and also be indexed with offsets.query
- query to highlight.searcher
- searcher that was previously used to execute the query.docidsIn
- containing the document IDs to highlight.maxPassagesIn
- The maximum number of top-N ranked passages per-field used to form the highlighted snippets.- Returns:
- Map keyed on field name, containing the array of formatted snippets
corresponding to the documents in
docidsIn
. If no highlights were found for a document, the firstmaxPassages
from the field will be returned. - Throws:
java.io.IOException
- if an I/O error occurred during processingjava.lang.IllegalArgumentException
- iffield
was indexed withoutFieldInfo.IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS
-
-