Class Lucene40LiveDocsFormat


  • public class Lucene40LiveDocsFormat
    extends LiveDocsFormat
    Lucene 4.0 Live Documents Format.

    The .del file is optional, and only exists when a segment contains deletions.

    Although per-segment, this file is maintained exterior to compound segment files.

    Deletions (.del) --> Format,Header,ByteCount,BitCount, Bits | DGaps (depending on Format)

    • Format,ByteSize,BitCount --> Uint32
    • Bits --> <Byte> ByteCount
    • DGaps --> <DGap,NonOnesByte> NonzeroBytesCount
    • DGap --> VInt
    • NonOnesByte --> Byte
    • Header --> CodecHeader

    Format is 1: indicates cleared DGaps.

    ByteCount indicates the number of bytes in Bits. It is typically (SegSize/8)+1.

    BitCount indicates the number of bits that are currently set in Bits.

    Bits contains one bit for each document indexed. When the bit corresponding to a document number is cleared, that document is marked as deleted. Bit ordering is from least to most significant. Thus, if Bits contains two bytes, 0x00 and 0x02, then document 9 is marked as alive (not deleted).

    DGaps represents sparse bit-vectors more efficiently than Bits. It is made of DGaps on indexes of nonOnes bytes in Bits, and the nonOnes bytes themselves. The number of nonOnes bytes in Bits (NonOnesBytesCount) is not stored.

    For example, if there are 8000 bits and only bits 10,12,32 are cleared, DGaps would be used:

    (VInt) 1 , (byte) 20 , (VInt) 3 , (Byte) 1