Class Lucene40DocValuesFormat
- java.lang.Object
-
- org.apache.lucene.codecs.DocValuesFormat
-
- org.apache.lucene.codecs.lucene40.Lucene40DocValuesFormat
-
- All Implemented Interfaces:
NamedSPILoader.NamedSPI
@Deprecated public class Lucene40DocValuesFormat extends DocValuesFormat
Deprecated.Only for reading old 4.0 and 4.1 segmentsLucene 4.0 DocValues format.Files:
- .dv.cfs:
compound container - .dv.cfe:
compound entries
- <segment>_<fieldNumber>.dat: data values
- <segment>_<fieldNumber>.idx: index into the .dat for DEREF types
There are several many types of
Formats:DocValueswith different encodings. From the perspective of filenames, all types store their values in .dat entries within the compound file. In the case of dereferenced/sorted types, the .dat actually contains only the unique values, and an additional .idx file contains pointers to these unique values.VAR_INTS.dat --> Header, PackedType, MinValue, DefaultValue, PackedStreamFIXED_INTS_8.dat --> Header, ValueSize,BytemaxdocFIXED_INTS_16.dat --> Header, ValueSize,ShortmaxdocFIXED_INTS_32.dat --> Header, ValueSize,Int32maxdocFIXED_INTS_64.dat --> Header, ValueSize,Int64maxdocFLOAT_32.dat --> Header, ValueSize, Float32maxdocFLOAT_64.dat --> Header, ValueSize, Float64maxdocBYTES_FIXED_STRAIGHT.dat --> Header, ValueSize, (Byte* ValueSize)maxdocBYTES_VAR_STRAIGHT.idx --> Header, TotalBytes, AddressesBYTES_VAR_STRAIGHT.dat --> Header, (Byte* variable ValueSize)maxdocBYTES_FIXED_DEREF.idx --> Header, NumValues, AddressesBYTES_FIXED_DEREF.dat --> Header, ValueSize, (Byte* ValueSize)NumValuesBYTES_VAR_DEREF.idx --> Header, TotalVarBytes, AddressesBYTES_VAR_DEREF.dat --> Header, (LengthPrefix +Byte* variable ValueSize)NumValuesBYTES_FIXED_SORTED.idx --> Header, NumValues, OrdinalsBYTES_FIXED_SORTED.dat --> Header, ValueSize, (Byte* ValueSize)NumValuesBYTES_VAR_SORTED.idx --> Header, TotalVarBytes, Addresses, OrdinalsBYTES_VAR_SORTED.dat --> Header, (Byte* variable ValueSize)NumValues
- Header -->
CodecHeader - PackedType -->
Byte - MaxAddress, MinValue, DefaultValue -->
Int64 - PackedStream, Addresses, Ordinals -->
PackedInts - ValueSize, NumValues -->
Int32 - Float32 --> 32-bit float encoded with
Float.floatToRawIntBits(float)then written asInt32 - Float64 --> 64-bit float encoded with
Double.doubleToRawLongBits(double)then written asInt64 - TotalBytes -->
VLong - TotalVarBytes -->
Int64 - LengthPrefix --> Length of the data value as
VInt(maximum of 2 bytes)
- PackedType is a 0 when compressed, 1 when the stream is written as 64-bit integers.
- Addresses stores pointers to the actual byte location (indexed by docid). In the VAR_STRAIGHT
case, each entry can have a different length, so to determine the length, docid+1 is
retrieved. A sentinel address is written at the end for the VAR_STRAIGHT case, so the Addresses
stream contains maxdoc+1 indices. For the deduplicated VAR_DEREF case, each length
is encoded as a prefix to the data itself as a
VInt(maximum of 2 bytes). - Ordinals stores the term ID in sorted order (indexed by docid). In the FIXED_SORTED case,
the address into the .dat can be computed from the ordinal as
Header+ValueSize+(ordinal*ValueSize)because the byte length is fixed. In the VAR_SORTED case, there is double indirection (docid -> ordinal -> address), but an additional sentinel ordinal+address is always written (so there are NumValues+1 ordinals). To determine the length, ord+1's address is looked up as well. BYTES_VAR_STRAIGHT BYTES_VAR_STRAIGHTin contrast to other straight variants uses a .idx file to improve lookup perfromance. In contrast toBYTES_VAR_DEREF BYTES_VAR_DEREFit doesn't apply deduplication of the document values.
Limitations:
- Binary doc values can be at most
MAX_BINARY_FIELD_LENGTHin length.
-
-
Field Summary
Fields Modifier and Type Field Description static intMAX_BINARY_FIELD_LENGTHDeprecated.Maximum length for each binary doc values field.
-
Constructor Summary
Constructors Constructor Description Lucene40DocValuesFormat()Deprecated.Sole constructor.
-
Method Summary
All Methods Instance Methods Concrete Methods Deprecated Methods Modifier and Type Method Description DocValuesConsumerfieldsConsumer(SegmentWriteState state)Deprecated.Returns aDocValuesConsumerto write docvalues to the index.DocValuesProducerfieldsProducer(SegmentReadState state)Deprecated.Returns aDocValuesProducerto read docvalues from the index.-
Methods inherited from class org.apache.lucene.codecs.DocValuesFormat
availableDocValuesFormats, forName, getName, reloadDocValuesFormats, toString
-
-
-
-
Field Detail
-
MAX_BINARY_FIELD_LENGTH
public static final int MAX_BINARY_FIELD_LENGTH
Deprecated.Maximum length for each binary doc values field.- See Also:
- Constant Field Values
-
-
Method Detail
-
fieldsConsumer
public DocValuesConsumer fieldsConsumer(SegmentWriteState state) throws java.io.IOException
Deprecated.Description copied from class:DocValuesFormatReturns aDocValuesConsumerto write docvalues to the index.- Specified by:
fieldsConsumerin classDocValuesFormat- Throws:
java.io.IOException
-
fieldsProducer
public DocValuesProducer fieldsProducer(SegmentReadState state) throws java.io.IOException
Deprecated.Description copied from class:DocValuesFormatReturns aDocValuesProducerto read docvalues from the index.NOTE: by the time this call returns, it must hold open any files it will need to use; else, those files may be deleted. Additionally, required files may be deleted during the execution of this call before there is a chance to open them. Under these circumstances an IOException should be thrown by the implementation. IOExceptions are expected and will automatically cause a retry of the segment opening logic with the newly revised segments.
- Specified by:
fieldsProducerin classDocValuesFormat- Throws:
java.io.IOException
-
-