Class NumericUtils
- java.lang.Object
-
- org.apache.lucene.util.NumericUtils
-
public final class NumericUtils extends java.lang.Object
This is a helper class to generate prefix-encoded representations for numerical values and supplies converters to represent float/double values as sortable integers/longs.To quickly execute range queries in Apache Lucene, a range is divided recursively into multiple intervals for searching: The center of the range is searched only with the lowest possible precision in the trie, while the boundaries are matched more exactly. This reduces the number of terms dramatically.
This class generates terms to achieve this: First the numerical integer values need to be converted to bytes. For that integer values (32 bit or 64 bit) are made unsigned and the bits are converted to ASCII chars with each 7 bit. The resulting byte[] is sortable like the original integer value (even using UTF-8 sort order). Each value is also prefixed (in the first char) by the
shift
value (number of bits removed) used during encoding.To also index floating point numbers, this class supplies two methods to convert them to integer values by changing their bit layout:
doubleToSortableLong(double)
,floatToSortableInt(float)
. You will have no precision loss by converting floating point numbers to integers and back (only that the integer form is not usable). Other data types like dates can easily converted to longs or ints (e.g. date to long:Date.getTime()
).For easy usage, the trie algorithm is implemented for indexing inside
NumericTokenStream
that can indexint
,long
,float
, anddouble
. For querying,NumericRangeQuery
andNumericRangeFilter
implement the query part for the same data types.This class can also be used, to generate lexicographically sortable (according to
BytesRef.getUTF8SortedAsUTF16Comparator()
) representations of numeric data types for other usages (e.g. sorting).- Since:
- 2.9, API changed non backwards-compliant in 4.0
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static class
NumericUtils.IntRangeBuilder
static class
NumericUtils.LongRangeBuilder
-
Field Summary
Fields Modifier and Type Field Description static int
BUF_SIZE_INT
The maximum term length (used forbyte[]
buffer size) for encodingint
values.static int
BUF_SIZE_LONG
The maximum term length (used forbyte[]
buffer size) for encodinglong
values.static int
PRECISION_STEP_DEFAULT
The default precision step used byIntField
,FloatField
,LongField
,DoubleField
,NumericTokenStream
,NumericRangeQuery
, andNumericRangeFilter
.static byte
SHIFT_START_INT
Integers are stored at lower precision by shifting off lower bits.static byte
SHIFT_START_LONG
Longs are stored at lower precision by shifting off lower bits.
-
Method Summary
All Methods Static Methods Concrete Methods Modifier and Type Method Description static long
doubleToSortableLong(double val)
Converts adouble
value to a sortable signedlong
.static TermsEnum
filterPrefixCodedInts(TermsEnum termsEnum)
Filters the givenTermsEnum
by accepting only prefix coded 32 bit terms with a shift value of 0.static TermsEnum
filterPrefixCodedLongs(TermsEnum termsEnum)
Filters the givenTermsEnum
by accepting only prefix coded 64 bit terms with a shift value of 0.static int
floatToSortableInt(float val)
Converts afloat
value to a sortable signedint
.static int
getPrefixCodedIntShift(BytesRef val)
Returns the shift value from a prefix encodedint
.static int
getPrefixCodedLongShift(BytesRef val)
Returns the shift value from a prefix encodedlong
.static int
intToPrefixCoded(int val, int shift, BytesRef bytes)
Returns prefix coded bits after reducing the precision byshift
bits.static void
intToPrefixCodedBytes(int val, int shift, BytesRef bytes)
Returns prefix coded bits after reducing the precision byshift
bits.static int
longToPrefixCoded(long val, int shift, BytesRef bytes)
Returns prefix coded bits after reducing the precision byshift
bits.static void
longToPrefixCodedBytes(long val, int shift, BytesRef bytes)
Returns prefix coded bits after reducing the precision byshift
bits.static int
prefixCodedToInt(BytesRef val)
Returns an int from prefixCoded bytes.static long
prefixCodedToLong(BytesRef val)
Returns a long from prefixCoded bytes.static float
sortableIntToFloat(int val)
Converts a sortableint
back to afloat
.static double
sortableLongToDouble(long val)
Converts a sortablelong
back to adouble
.static void
splitIntRange(NumericUtils.IntRangeBuilder builder, int precisionStep, int minBound, int maxBound)
Splits an int range recursively.static void
splitLongRange(NumericUtils.LongRangeBuilder builder, int precisionStep, long minBound, long maxBound)
Splits a long range recursively.
-
-
-
Field Detail
-
PRECISION_STEP_DEFAULT
public static final int PRECISION_STEP_DEFAULT
The default precision step used byIntField
,FloatField
,LongField
,DoubleField
,NumericTokenStream
,NumericRangeQuery
, andNumericRangeFilter
.- See Also:
- Constant Field Values
-
SHIFT_START_LONG
public static final byte SHIFT_START_LONG
Longs are stored at lower precision by shifting off lower bits. The shift count is stored asSHIFT_START_LONG+shift
in the first byte- See Also:
- Constant Field Values
-
BUF_SIZE_LONG
public static final int BUF_SIZE_LONG
The maximum term length (used forbyte[]
buffer size) for encodinglong
values.
-
SHIFT_START_INT
public static final byte SHIFT_START_INT
Integers are stored at lower precision by shifting off lower bits. The shift count is stored asSHIFT_START_INT+shift
in the first byte- See Also:
- Constant Field Values
-
BUF_SIZE_INT
public static final int BUF_SIZE_INT
The maximum term length (used forbyte[]
buffer size) for encodingint
values.
-
-
Method Detail
-
longToPrefixCoded
public static int longToPrefixCoded(long val, int shift, BytesRef bytes)
Returns prefix coded bits after reducing the precision byshift
bits. This is method is used byNumericTokenStream
. After encoding,bytes.offset
will always be 0.- Parameters:
val
- the numeric valueshift
- how many bits to strip from the rightbytes
- will contain the encoded value- Returns:
- the hash code for indexing (TermsHash)
-
intToPrefixCoded
public static int intToPrefixCoded(int val, int shift, BytesRef bytes)
Returns prefix coded bits after reducing the precision byshift
bits. This is method is used byNumericTokenStream
. After encoding,bytes.offset
will always be 0.- Parameters:
val
- the numeric valueshift
- how many bits to strip from the rightbytes
- will contain the encoded value- Returns:
- the hash code for indexing (TermsHash)
-
longToPrefixCodedBytes
public static void longToPrefixCodedBytes(long val, int shift, BytesRef bytes)
Returns prefix coded bits after reducing the precision byshift
bits. This is method is used byNumericTokenStream
. After encoding,bytes.offset
will always be 0.- Parameters:
val
- the numeric valueshift
- how many bits to strip from the rightbytes
- will contain the encoded value
-
intToPrefixCodedBytes
public static void intToPrefixCodedBytes(int val, int shift, BytesRef bytes)
Returns prefix coded bits after reducing the precision byshift
bits. This is method is used byNumericTokenStream
. After encoding,bytes.offset
will always be 0.- Parameters:
val
- the numeric valueshift
- how many bits to strip from the rightbytes
- will contain the encoded value
-
getPrefixCodedLongShift
public static int getPrefixCodedLongShift(BytesRef val)
Returns the shift value from a prefix encodedlong
.- Throws:
java.lang.NumberFormatException
- if the suppliedBytesRef
is not correctly prefix encoded.
-
getPrefixCodedIntShift
public static int getPrefixCodedIntShift(BytesRef val)
Returns the shift value from a prefix encodedint
.- Throws:
java.lang.NumberFormatException
- if the suppliedBytesRef
is not correctly prefix encoded.
-
prefixCodedToLong
public static long prefixCodedToLong(BytesRef val)
Returns a long from prefixCoded bytes. Rightmost bits will be zero for lower precision codes. This method can be used to decode a term's value.- Throws:
java.lang.NumberFormatException
- if the suppliedBytesRef
is not correctly prefix encoded.- See Also:
longToPrefixCodedBytes(long, int, org.apache.lucene.util.BytesRef)
-
prefixCodedToInt
public static int prefixCodedToInt(BytesRef val)
Returns an int from prefixCoded bytes. Rightmost bits will be zero for lower precision codes. This method can be used to decode a term's value.- Throws:
java.lang.NumberFormatException
- if the suppliedBytesRef
is not correctly prefix encoded.- See Also:
intToPrefixCodedBytes(int, int, org.apache.lucene.util.BytesRef)
-
doubleToSortableLong
public static long doubleToSortableLong(double val)
Converts adouble
value to a sortable signedlong
. The value is converted by getting their IEEE 754 floating-point "double format" bit layout and then some bits are swapped, to be able to compare the result as long. By this the precision is not reduced, but the value can easily used as a long. The sort order (includingDouble.NaN
) is defined byDouble.compareTo(java.lang.Double)
;NaN
is greater than positive infinity.- See Also:
sortableLongToDouble(long)
-
sortableLongToDouble
public static double sortableLongToDouble(long val)
Converts a sortablelong
back to adouble
.- See Also:
doubleToSortableLong(double)
-
floatToSortableInt
public static int floatToSortableInt(float val)
Converts afloat
value to a sortable signedint
. The value is converted by getting their IEEE 754 floating-point "float format" bit layout and then some bits are swapped, to be able to compare the result as int. By this the precision is not reduced, but the value can easily used as an int. The sort order (includingFloat.NaN
) is defined byFloat.compareTo(java.lang.Float)
;NaN
is greater than positive infinity.- See Also:
sortableIntToFloat(int)
-
sortableIntToFloat
public static float sortableIntToFloat(int val)
Converts a sortableint
back to afloat
.- See Also:
floatToSortableInt(float)
-
splitLongRange
public static void splitLongRange(NumericUtils.LongRangeBuilder builder, int precisionStep, long minBound, long maxBound)
Splits a long range recursively. You may implement a builder that adds clauses to aBooleanQuery
for each call to itsNumericUtils.LongRangeBuilder.addRange(BytesRef,BytesRef)
method.This method is used by
NumericRangeQuery
.
-
splitIntRange
public static void splitIntRange(NumericUtils.IntRangeBuilder builder, int precisionStep, int minBound, int maxBound)
Splits an int range recursively. You may implement a builder that adds clauses to aBooleanQuery
for each call to itsNumericUtils.IntRangeBuilder.addRange(BytesRef,BytesRef)
method.This method is used by
NumericRangeQuery
.
-
filterPrefixCodedLongs
public static TermsEnum filterPrefixCodedLongs(TermsEnum termsEnum)
Filters the givenTermsEnum
by accepting only prefix coded 64 bit terms with a shift value of 0.- Parameters:
termsEnum
- the terms enum to filter- Returns:
- a filtered
TermsEnum
that only returns prefix coded 64 bit terms with a shift value of 0.
-
filterPrefixCodedInts
public static TermsEnum filterPrefixCodedInts(TermsEnum termsEnum)
Filters the givenTermsEnum
by accepting only prefix coded 32 bit terms with a shift value of 0.- Parameters:
termsEnum
- the terms enum to filter- Returns:
- a filtered
TermsEnum
that only returns prefix coded 32 bit terms with a shift value of 0.
-
-