An important piece of the process of transforming a Bitcoin private key into a public address, as outlined in the fantastic Mastering Bitcoin book, is the Base58Check encoding algorithm.
The Bitcoin wiki has a great article on Base58Check encoding, and even gives an example implementation of the underlying Base58 encoding algorithm in C.
This algorithm seems especially well-suited to Elixir, so I thought it’d be a fun and useful exercise to build out Base58
and Base58Check
modules to use in future Bitcoin and Elixir experiments.
Like Base64, but Less Confusing
Base58 is a binary-to-text encoding algorithm that’s designed to encode a blob of arbitrary binary data into human readable text, much like the more well known Base64 algorithm.
Unlike Base64 encoding, Bitcoin’s Base58 encoding algorithm omits characters that can be potentially confusing or ambiguous to a human reader. For example, the characters O
and 0
, or I
and l
can look similar or identical to some readers or users of certain fonts.
To avoid that ambiguity, Base58 simply removes those characters from its alphabet.
Shrinking the length of the alphabet we map our binary data onto from sixty four characters down to fifty eight characters means that we can’t simply group our binary into six-bit chunks and map each chunk onto its corresponding letter in our alphabet.
Instead, our Base58 encoding algorithm works by treating our binary as a single large number. We repeatedly divide that number by the size of our alphabet (fifty eight), and use the remainder of that division to map onto a character in our alphabet.
Implementing Base58 in Elixir
This kind of algorithm can neatly be expressed in Elixir. We’ll start by creating a Base58
module and adding our alphabet as a module attribute:
defmodule Base58 do
@alphabet '123456789ABCDEFGHJKLMNPQRSTUVWXYZabcdefghijkmnopqrstuvwxyz'
end
Inside our Base58
module, we’ll define an encode/2
function. If we pass encode
a binary, we want to convert it into a number using Erlang’s :binary.decode_unsigned
:
def encode(data, hash \\ "")
def encode(data, hash) when is_binary(data) do
encode(:binary.decode_unsigned(data), hash)
end
Once converted, we pass our binary-come-number into a recursive call to encode/2
along with the beginning of our hash, an empty string.
For each recursive call to encode/2
, we use div
and rem
to divide our number by 58
and find the reminder. We use that remainder to map into our @alphabet
, and prepend the resulting character onto our hash
:
def encode(data, hash) do
character = <<Enum.at(@alphabet, rem(data, 58))>>
encode(div(data, 58), hash <> character)
end
We’ll continue recursing until we’ve divided our data
down to 0
. In that case, we’ll return the hash
string we’ve built up:
def encode(0, hash), do: hash
This implementation of our Base58 encoded mostly works. We can encode any text string and receive correct results:
iex(1)> Base58.encode("hello")
"Cn8eVZg"
Encoding Leading Zeros
However when we try to encode binaries with leading zero bytes, those bytes vanish from our resulting hash:
iex(1)> Base58.encode(<<0x00>> <> "hello")
"Cn8eVZg"
That zero should become a leading "1"
in our resulting hash, but our process of converting the initial binary into a number is truncating those leading bytes. We’ll need to count those leading zeros, encode them manually, and prepend them to our final hash.
Let’s start by writing a function that counts the number of leading zeros in our initial binary:
defp leading_zeros(data) do
:binary.bin_to_list(data)
|> Enum.find_index(&(&1 != 0))
end
We use Erlang’s :binary.bin_to_list
to convert our binary into a list of bytes, and Enum.find_index
to find the first byte in our list that isn’t zero. This index value is equivalent to the number of leading zero bytes in our binary.
Next, we’ll write a function to manually encode those leading zeros:
defp encode_zeros(data) do
<<Enum.at(@alphabet, 0)>>
|> String.duplicate(leading_zeros(data))
end
We simply grab the character in our alphabet that maps to a zero byte ("1"
), and duplicate it as many times as we need.
Finally, we’ll update our initial encode/2
function to prepend these leading zeros onto our resulting hash:
def encode(data, hash) when is_binary(data) do
encode_zeros(data) <> encode(:binary.decode_unsigned(data), hash)
end
Now we should be able to encode binaries with leading zero bytes and see their resulting "1"
values in our final hash:
iex(1)> Base58.encode(<<0x00>> <> "hello")
"1Cn8eVZg"
Great!
Base58 + Checksum = Base58Check
Now that we have a working implementation of the Base58 encoding algorithm, we can implement our Base58Check algorithm!
Base58Check encoding is really just Base58 with an added checksum. This checksum is important to in the Bitcoin world to ensure that public addresses aren’t mistyped or corrupted before funds are exchanged.
At a high level, the process of Base58Check encoding a blob of binary data involves hashing that data, taking the first four bytes of the resulting hash and appending them to the end of the binary, and Base58 encoding the result.
We can implement Base58Check fairly easily using our newly written Base58
module. We’ll start by creating a new Base58Check
module:
defmodule Base58Check do
end
In our module, we’ll define a new encode/2
function that takes a version byte and the binary we want to encode:
def encode(version, data)
Bitcoin uses the version
byte to specify the type of address being encoded. A version byte of 0x00
means that we’re encoding a regular Bitcoin address to be used on the live Bitcoin network.
The first thing we’ll need to do is generate our checksum from our version
and our data
. We’ll do that in a new function:
defp checksum(version, data) do
version <> data
|> sha256
|> sha256
|> split
end
We concatenate our version
and data
binaries together, hash them twice using a sha256/1
helper function, and then returning the first four bytes of the resulting hash with a call to split/1
.
split/1
is a helper function that pulls the first four bytes out of the resulting hash using binary pattern matching:
defp split(<< hash :: bytes-size(4), _ :: bits >>), do: hash
Our sha256/1
helper function uses Erlang’s :crypto.hash
function to SHA-256 hash its argument:
defp sha256(data), do: :crypto.hash(:sha256, data)
We’ve wrapped this in a helper function to facilitate Elixir-style piping.
Now that we have our four-byte checksum, we can flesh out our original encode/2
function:
def encode(version, data) do
version <> data <> checksum(version, data)
|> Base58.encode
end
We concatenate our version
, data
, and the result of our checksum
function together, and Base58 encode the result. That’s it!
Base58Check encoding our "hello"
string with a version
of <<0x00>>
should give us a result of "12L5B5yqsf7vwb"
. We can go further and verify our implementation with an example pulled from the Bitcoin wiki:
iex(1)> Base58Check.encode(<<0x00>>,
<<0x01, 0x09, 0x66, 0x77, 0x60,
0x06, 0x95, 0x3D, 0x55, 0x67,
0x43, 0x9E, 0x5E, 0x39, 0xF8,
0x6A, 0x0D, 0x27, 0x3B, 0xEE>>)
"16UwLL9Risc3QfPqBUvKofHmBQ7wMtjvM"
Perfect!
Wrapping Up
If you’d like to see both modules in their full glory, I’ve included them in my hello_bitcoin
repository on Github. Here’s a direct link to the Base58
module, and the Base58Check
module, along with a simple unit test. If that repository looks familiar, it’s because it was used in a previous article on controlling a Bitcoin node with Elixir.
I highly suggest you read through Andreas Antonopoulos’ Mastering Bitcoin book if you’re at all interested in how the Bitcoin blockchain works, or Bitcoin development in general. His book has been my primary source of inspiration and information for every Bitcoin article I’ve written to date.