The logical representation of a
Document for indexing and searching.
Document and IndexableField
Document is a collection of
IndexableField is a logical representation of a user's content that needs to be indexed or stored.
IndexableFields have a number of properties that tell Lucene how to treat the content (like indexed, tokenized,
stored, etc.) See the
Field implementation of
for specifics on these properties.
Working with Documents
First and foremost, a
Document is something created by the user application. It is your job
to create Documents based on the content of the files you are working with in your application (Word, txt, PDF, Excel or any other format.)
How this is done is completely up to you. That being said, there are many tools available in other projects that can make
the process of taking a file and converting it into a Lucene
DateTools is a utility class to make dates and times searchable
(remember, Lucene only searches text).
DoubleField are a special helper class
to simplify indexing of numeric values (and also dates) for fast range range queries with
(using a special sortable string representation of numeric values).
Class Summary Class Description BinaryDocValuesFieldField that stores a per-document
ByteDocValuesField Deprecated. CompressionToolsSimple utility class providing static methods to compress and decompress binary data for stored fields. DateToolsProvides support for converting dates to strings and vice-versa. DerefBytesDocValuesField Deprecated.Use
DocumentDocuments are the unit of indexing and search. DocumentStoredFieldVisitor DoubleDocValuesFieldSyntactic sugar for encoding doubles as NumericDocValues via
DoubleFieldField that indexes
doublevalues for efficient range filtering and sorting.
FieldExpert: directly create a field for a document. FieldTypeDescribes the properties of a field. FloatDocValuesFieldSyntactic sugar for encoding floats as NumericDocValues via
FloatFieldField that indexes
floatvalues for efficient range filtering and sorting.
IntDocValuesField Deprecated. IntFieldField that indexes
intvalues for efficient range filtering and sorting.
LazyDocumentDefers actually loading a field's value until you ask for it. LongDocValuesField Deprecated. LongFieldField that indexes
longvalues for efficient range filtering and sorting.
NumericDocValuesFieldField that stores a per-document
longvalue for scoring, sorting or value retrieval.
PackedLongDocValuesField Deprecated. ShortDocValuesField Deprecated. SortedBytesDocValuesField Deprecated.Use
SortedDocValuesFieldField that stores a per-document
BytesRefvalue, indexed for sorting.
SortedSetDocValuesFieldField that stores a set of per-document
BytesRefvalues, indexed for faceting,grouping,joining.
StoredFieldA field whose value is stored so that
IndexReader.document(int, org.apache.lucene.index.StoredFieldVisitor)will return the field and its value.
StringFieldA field that is indexed but not tokenized: the entire String value is indexed as a single token. TextFieldA field that is indexed and tokenized, without term vectors.
Enum Summary Enum Description DateTools.ResolutionSpecifies the time granularity. Field.Index Deprecated.This is here only to ease transition from the pre-4.0 APIs. Field.StoreSpecifies whether and how a field should be stored. Field.TermVector Deprecated.This is here only to ease transition from the pre-4.0 APIs. FieldType.NumericTypeData type of the numeric value