Converting a 64-bit hex string to INT64 with BigQuery
I recently had the need to convert a 64-bit hex string to an INT64 value in a BigQuery
user defined function (UDF). I’ll post more about why later but for now lets just say
you have a sixteen character hex string such as 7FFFFFFFFFFFFFFF
and need it as an INT64
value. Easy you say, just prepend “0x” and do CAST("0x7FFFFFFFFFFFFFFF" AS INT64)
.
But what if you have something just a bit bigger, like 8000000000000000
? Let’s try that:
SELECT CAST("0x8000000000000000" AS INT64);
and we get the error:
Could not cast literal "0x8000000000000000" to type INT64
The problem is that INT64 is a signed value in BigQuery (and other Google cloud databases like
Spanner) with values ranging from -9,223,372,036,854,775,808
to 9,223,372,036,854,775,807
.
So we have to convert from the hex using two’s compliment. I recently wrote some code in Python to do this:
def uint64_to_signedInt64(v):
MAX_INT64 = 2**63-1
if v > MAX_INT64:
return (v & MAX_INT64) - (MAX_INT64 + 1)
else:
return v
But since we can’t create an INT64 from the hex string we break it into two 32-bit values. The most
significant is the first 8 characters of the hex string (remember each character in hex is 4-bits or a “nibble”).
The least significant is the last 8 characters. v & MAX_INT64
is emulated by taking the most significant 32-bits
and AND
-ing with 0x7fffffff
and shifting to the left 32 times then OR
-ing with the least significant 32-bits.
This gives us a pure-SQL UDF like so:
CREATE TEMP FUNCTION
hex64ToInt64(hex STRING)
RETURNS INT64 AS (
IF(hex < "8000000000000000",
cast(concat("0x", hex) AS INT64),
(SELECT (((ms32 & 0x7fffffff) << 32) | ls32) - 0x7fffffffffffffff - 1
FROM (SELECT cast(concat("0x", substr(hex, 1, 8)) AS INT64) AS ms32,
cast(concat("0x", substr(hex, 9, 8)) AS INT64) AS ls32))));
Now we can calculate select hex64ToInt64("8000000000000000")
which is the most negative signed INT64 number -9223372036854775808
and select hex64ToInt64("ffffffffffffffff")
is the least negative number -1
.
I’ll soon show you how I’ve used this function as part of a BigQuery emulation of Postgres’ HLL extension from Citus