I’ve never had much of a reason to worry about the “endianness” of my binary data when working on Elixir projects. For the most part, everything within an application will be internally consistent, and everything pulled in from external sources will be converted to the machine’s native ordering several layers of abstraction below where I tend to work.
That blissful ignorance came to an end when I found myself using Elixir to construct packets conforming to the Bitcoin peer-to-peer network protocol.
The Bitcoin Protocol
The Bitcoin protocol is a TCP-based protocol used by Bitcoin nodes to communicate over a peer-to-peer ad hoc network.
The real-world specifications of the protocol are defined to be “whatever the reference client does,” but this can be difficult to tease out from the code. Thankfully, the Bitcoin wiki maintains a fantastic technical description of the protocol.
The structures used throughout the protocol are a mishmash of endianness. As the wiki explains, “almost all integers are encoded in little endian,” but many other fields like checksums, strings, network addresses, and ports are expected to be big endian.
The net_addr
structure is an excellent example of this endianness confusion. Both time
and services
are expected to be little endian encoded, but the IPv6/4
and port
fields are expected to be big endian encoded.
How will we build this with Elixir?
First Attempt
My first attempt at constructing this net_addr
binary structure was to create a net_addr
function that accepts time
, services
, ip
, and port
arguments and returns a binary of the final structure in correct mixed-endian order.
def net_addr(time, services, ip, port) do
end
When manually constructing binaries, Elixir defaults to a big endian byte order. This means that I’d need to convert time
and services
into little endian byte order before adding them to the final binary.
My first attempt at endian conversion was to create a reverse/1
helper function that would take a binary, transform it into a list of bytes using :binary.bin_to_list
, reverse that list of bytes, transform it back into a binary using :binary.list_to_bin
, and return the result:
def reverse(binary) do
binary
|> :binary.bin_to_list
|> Enum.reverse
|> :binary.list_to_bin
end
Before I could pass time
and services
into reverse/1
, I needed to transform them into binaries first. Thankfully, this is easy with Elixir’s binary special form.
For example, we can convert time
into a four byte (32
bit) big endian binary and then reverse it to create its corresponding little endian representation:
reverse(<<time::32>>)
Using our helper, we can create out final net_addr
binary:
<<
<<time::32>> |> reverse::binary,
<<services::64>> |> reverse::binary,
:binary.decode_unsigned(ip)::128,
port::16
>>
This works, but there’s some room for improvement.
A Faster Second Attempt
After doing some research, I discovered this set of benchmarks for several different techniques of reversing a binary in Elixir (thanks Evadne Wu!).
I realized that I could significantly improve the performance of my packet construction process by replacing my slow list-based solution with a solution that leverages the optional Endianness
argument of :binary.decode_unsigned/2
and :binary.encode_unsigned/2
:
def reverse(binary) do
binary
|> :binary.decode_unsigned(:little)
|> :binary.encode_unsigned(:big)
end
While this was an improvement, I still wasn’t happy with my solution. Using my reverse/1
function meant that I had to transform my numbers into a binary before reversing them and ultimately concatenating them into the final binary. This nested binary structure was awkward and confusing.
After asking for guidance on Twitter, the ElixirLang account reached out with some sage advice:
Using Big and Little Modifiers
The big
and little
modifiers are binary special form modifiers, much like the bitstring
and binary
types. They can be used to specify the resulting endianness when coercing an integer
, float
, utf16
or utf32
value into a binary.
For example, we can replace our calls reversing the time
and services
binaries in our final binary concatenation by simply appending big
to the final size of each:
<<
time::32-little,
services::64-little,
:binary.decode_unsigned(ip)::128,
port::16
>>
Awesome! That’s much easier to understand.
While Elixir defaults to a big endian format for manually constructed binaries, it doesn’t hurt to be explicit. We know that our ip
and port
should be big endian encoded, so let’s mark them that way:
<<
time::32-little,
services::64-little,
:binary.decode_unsigned(ip)::128-big,
port::16-big
>>
Beautiful.
Final Thoughts
I’m continually amazed by the quantity, diversity, and quality of the tooling that ships out of the box with Elixir and Erlang. Even when it comes to something as niche as low-level binary manipulation, Elixir’s tools are top notch.
If you want to see complete examples of the endian conversion code shown in this article, check out the BitcoinNetwork.Protocol.NetAddr
module in my new bitcoin_network
project on Github.