Package org.apache.lucene.util
Class BytesRefHash
- java.lang.Object
-
- org.apache.lucene.util.BytesRefHash
-
public final class BytesRefHash extends java.lang.ObjectBytesRefHashis a special purpose hash-map like data-structure optimized forBytesRefinstances. BytesRefHash maintains mappings of byte arrays to ids (Map<BytesRef,int>) storing the hashed bytes efficiently in continuous storage. The mapping to the id is encapsulated insideBytesRefHashand is guaranteed to be increased for each addedBytesRef.Note: The maximum capacity
BytesRefinstance passed toadd(BytesRef)must not be longer thanByteBlockPool.BYTE_BLOCK_SIZE-2. The internal storage is limited to 2GB total byte storage.
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static classBytesRefHash.BytesStartArrayManages allocation of the per-term addresses.static classBytesRefHash.DirectBytesStartArrayA simpleBytesRefHash.BytesStartArraythat tracks memory allocation using a privateCounterinstance.static classBytesRefHash.MaxBytesLengthExceededException
-
Field Summary
Fields Modifier and Type Field Description static intDEFAULT_CAPACITY
-
Constructor Summary
Constructors Constructor Description BytesRefHash()BytesRefHash(ByteBlockPool pool)Creates a newBytesRefHashBytesRefHash(ByteBlockPool pool, int capacity, BytesRefHash.BytesStartArray bytesStartArray)Creates a newBytesRefHash
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description intadd(BytesRef bytes)Adds a newBytesRefintadd(BytesRef bytes, int code)Adds a newBytesRefwith a pre-calculated hash code.intaddByPoolOffset(int offset)Adds a "arbitrary" int offset instead of a BytesRef term.intbyteStart(int bytesID)Returns the bytesStart offset into the internally usedByteBlockPoolfor the given bytesIDvoidclear()voidclear(boolean resetPool)voidclose()Closes the BytesRefHash and releases all internally used memoryintfind(BytesRef bytes)Returns the id of the givenBytesRef.intfind(BytesRef bytes, int code)Returns the id of the givenBytesRefwith a pre-calculated hash code.BytesRefget(int bytesID, BytesRef ref)Populates and returns aBytesRefwith the bytes for the given bytesID.voidreinit()reinitializes theBytesRefHashafter a previousclear()call.intsize()Returns the number ofBytesRefvalues in thisBytesRefHash.int[]sort(java.util.Comparator<BytesRef> comp)Returns the values array sorted by the referenced byte values.
-
-
-
Field Detail
-
DEFAULT_CAPACITY
public static final int DEFAULT_CAPACITY
- See Also:
- Constant Field Values
-
-
Constructor Detail
-
BytesRefHash
public BytesRefHash()
-
BytesRefHash
public BytesRefHash(ByteBlockPool pool)
Creates a newBytesRefHash
-
BytesRefHash
public BytesRefHash(ByteBlockPool pool, int capacity, BytesRefHash.BytesStartArray bytesStartArray)
Creates a newBytesRefHash
-
-
Method Detail
-
size
public int size()
Returns the number ofBytesRefvalues in thisBytesRefHash.- Returns:
- the number of
BytesRefvalues in thisBytesRefHash.
-
get
public BytesRef get(int bytesID, BytesRef ref)
Populates and returns aBytesRefwith the bytes for the given bytesID.Note: the given bytesID must be a positive integer less than the current size (
size())- Parameters:
bytesID- the idref- theBytesRefto populate- Returns:
- the given BytesRef instance populated with the bytes for the given bytesID
-
sort
public int[] sort(java.util.Comparator<BytesRef> comp)
Returns the values array sorted by the referenced byte values.Note: This is a destructive operation.
clear()must be called in order to reuse thisBytesRefHashinstance.- Parameters:
comp- theComparatorused for sorting
-
clear
public void clear(boolean resetPool)
-
clear
public void clear()
-
close
public void close()
Closes the BytesRefHash and releases all internally used memory
-
add
public int add(BytesRef bytes)
Adds a newBytesRef- Parameters:
bytes- the bytes to hash- Returns:
- the id the given bytes are hashed if there was no mapping for the
given bytes, otherwise
(-(id)-1). This guarantees that the return value will always be >= 0 if the given bytes haven't been hashed before. - Throws:
BytesRefHash.MaxBytesLengthExceededException- if the given bytes are > 2 +ByteBlockPool.BYTE_BLOCK_SIZE
-
add
public int add(BytesRef bytes, int code)
Adds a newBytesRefwith a pre-calculated hash code.- Parameters:
bytes- the bytes to hashcode- the bytes hash codeHashcode is defined as:
int hash = 0; for (int i = offset; i < offset + length; i++) { hash = 31 * hash + bytes[i]; }- Returns:
- the id the given bytes are hashed if there was no mapping for the
given bytes, otherwise
(-(id)-1). This guarantees that the return value will always be >= 0 if the given bytes haven't been hashed before. - Throws:
BytesRefHash.MaxBytesLengthExceededException- if the given bytes are >ByteBlockPool.BYTE_BLOCK_SIZE- 2
-
find
public int find(BytesRef bytes)
Returns the id of the givenBytesRef.- See Also:
find(BytesRef, int)
-
find
public int find(BytesRef bytes, int code)
Returns the id of the givenBytesRefwith a pre-calculated hash code.- Parameters:
bytes- the bytes to look forcode- the bytes hash code- Returns:
- the id of the given bytes, or
-1if there is no mapping for the given bytes.
-
addByPoolOffset
public int addByPoolOffset(int offset)
Adds a "arbitrary" int offset instead of a BytesRef term. This is used in the indexer to hold the hash for term vectors, because they do not redundantly store the byte[] term directly and instead reference the byte[] term already stored by the postings BytesRefHash. See add(int textStart) in TermsHashPerField.
-
reinit
public void reinit()
reinitializes theBytesRefHashafter a previousclear()call. Ifclear()has not been called previously this method has no effect.
-
byteStart
public int byteStart(int bytesID)
Returns the bytesStart offset into the internally usedByteBlockPoolfor the given bytesID- Parameters:
bytesID- the id to look up- Returns:
- the bytesStart offset into the internally used
ByteBlockPoolfor the given id
-
-