We left off in our Bitcoin adventure by building a bare-bones Bitcoin node that connects to another peer node on the network. While our Elixir-based node was able to connect to a peer, that connection was fragile at best. Any problems with the initial connection or version messaging would leave our application dead in the water.
Thankfully, there are ways of beefing our the resilience of our Elixir node. Today we’ll be refactoring our Bitcoin node to use James Fish’s Connection behavior, rather than the basic GenServer behavior that ships with Elixir. Implementing this behavior in our node will give us more robustness in our connection process, along with the option to reconnect to a peer node in the case of failure.
Let’s get to it!
Our Starting Point
Before we dive into refactoring our Bitcoin node to use the new Connection behavior, we should go over some changes I made to simplify the BitcoinNetwork.Node
module.
Previously, every message parsed out of incoming TCP packets was assembled into a BitcoinNetowkr.Protocol.Message
struct and cast back to the current node process as a process message. In hindsight, this solution is overly complicated and weighted down with boilerplate and message passing overhead. Instead, I opted to take my own advice and “just use a function” to handle my incoming messages.
def handle_info({:tcp, _port, data}, state) do
{messages, rest} = chunk(state.rest <> data)
case handle_messages(messages, state) do
{:error, reason, _state} -> {:stop, reason}
{:ok, state} -> {:noreply, %{state | rest: rest}}
end
end
Now the assembled Message
structs are passed off to a handle_messages/2
helper function, which returns either an :error
tuple, or an :ok
tuple with the current node’s updated state after processing each of the received messages.
The handle_messages/2
filters out invalid messages, and runs each of the remaining messages through a handle_payload/2
helper function. We pass this function a new parsed_payload
field, which holds the parsed struct-based representation of the inbound Bitcoin message:
defp handle_messages(messages, state) do
messages
|> Enum.filter(&Message.verify_checksum/1)
|> Enum.reduce_while({:ok, state}, fn message, state ->
case handle_payload(message.parsed_payload, state) do
{:error, reason, state} -> {:halt, {:error, reason, state}}
{:ok, state} -> {:cont, {:ok, state}}
end
end)
end
Notice that we’re using Enum.reduce_while/3
to give our handle_payload/2
calls the opportunity to modify the state of the node before the next message is processed.
If we run into a problem handling a parsed payload, we immediately exit our reduction by returning a :halt
tuple.
The main benefit of this refactor comes from the simplicity of our handle_payload/2
methods. Here’s what our “ping” handler looks like after the refactor:
defp handle_payload(%Ping{}, state) do
with :ok <- Message.serialize("pong") |> send_message(state.socket) do
{:ok, state}
else
{:error, reason} -> {:error, reason, state}
end
end
We use pattern matching to listen for BitcoinNetwork.Protocol.Ping
messages. When we receive a Ping
, we serialize and send a “pong” back to our peer node. If anything goes wrong with sending the response, we return an :error
tuple.
Beautiful.
Connection without Connecting
The Connection behavior is a specialization of the GenServer behavior, and is intended to be used to represent connections to external resources. It mirrors the entire API of a standard GenServer, and adds two additional callbacks for us to implement: connect/2
and disconnect/2
. As you’ve probably guessed, these two callbacks are used to connect and disconnect from our external resource.
Before we start using the Connection behavior in our application, we’ll need to add it as a dependency in our mix.exs
file:
defp deps do
[
{:connection, "~> 1.0"}
]
end
Next, we’ll start our GenServer to Connection conversion by replacing our use
of the GenServer
behavior with the new Connection
behavior, and wholesale replacing GenServer
with Connection
throughout our BitcoinNetwork.Node
module:
defmodule BitcoinNetwork.Node do
use Connection
def start_link({ip, port}) do
Connection.start_link(__MODULE__, %{ip: ip, port: port, rest: ""})
end
...
Because the Connection behavior is a superset of the GenServer behavior, our node should still run like it used to given these changes. Let’s try it out.
** (Mix) Could not start application bitcoin_network: exited in: BitcoinNetwork.Application.start(:normal, [])
** (EXIT) an exception was raised:
** (ArgumentError) The module BitcoinNetwork.Node was given as
a child to a supervisor but it does not implement child_spec/1.
Uh oh.
The Connection behavior doesn’t implement a child_spec/1
callback like our old GenServer behavior did, and our application no longer likes the child specification shorthand we’re using in our BitcoinNetwork.Application
supervisor:
{BitcoinNetwork.Node,
{Application.get_env(:bitcoin_network, :ip),
Application.get_env(:bitcoin_network, :port)}}
We’ll fix this by fleshing out our child specification into a full specification map in our BitcoinNetwork.Application
module:
%{
id: BitcoinNetwork.Node,
start:
{BitcoinNetwork.Node, :start_link,
[
{
Application.get_env(:bitcoin_network, :ip),
Application.get_env(:bitcoin_network, :port)
}
]},
restart: :transient
}
With those changes, our Bitcoin node runs just like it used to.
Connecting with Connect
So far our refactor isn’t very exciting. While our Bitcoin node still works, we haven’t added any new functionality. Let’s change that by fleshing out the connect/2
callback provided by the Connection behavior.
We’ll start by sketching out the connect/2
callback within our module:
def connect(_info, state) do
end
Within our connect/2
callback, we should handle all of the behavior associated with connecting to our external resource. You may remember that this was previously being handled in our init/1
callback. Let’s start migrating that code into our connect/2
function.
The first step in connecting to our peer node is to establish a TCP connection:
:gen_tcp.connect(IP.to_tuple(state.ip), state.port, options)
The next step is sending our initial “version” message and establishing communication with the peer:
send_message(message, socket)
If both of these things go well, we can say that we’ve successfully connected to our peer Bitcoin node. In that case, the Connection behavior dictates that we should return an :ok
tuple with the new state of the process.
with {:ok, socket} <- :gen_tcp.connect(IP.to_tuple(state.ip), state.port, options),
:ok <- send_message(message, socket) do
{:ok, Map.put_new(state, :socket, socket)}
end
However, if something goes wrong, we have a couple options. We can either return a :stop
tuple to kill the current process. That’s similar to the previous functionality of our node. Alternatively, we can return a :backoff
tuple which instructs the Connection behavior to retry our connection behavior after the specified timeout
.
Let’s try reconnecting to our peer node if something goes wrong. To do this, all we need to do is add an else
block to our with
that returns our :backoff
tuple:
else
_ -> {:backoff, 1000, state}
Now, after a failed connection attempt our Bitcoin node will retry the connection after one thousand milliseconds.
Limiting Retries
Our new connection retry logic works beautifully. It almost works too well, in fact. If we try to connect to a non-existent Bitcoin peer node, we can see that our node will attempt to reconnect until the end of time. Let’s limit the number of retry attempt our node can make before it gives up.
We’ll do this by adding a retries
field to our initial state with an initial value of 0
:
def start_link({ip, port}) do
Connection.start_link(__MODULE__, %{
...
retries: 0
})
end
We’ll also add a @max_retries
module attribute to indicate how many retries we want our node to attempt:
@max_retries 3
Next, we’ll modify the :backoff
tuple returned by our connection/2
callback to increment retries
in the returned state
map:
{:backoff, 1000, Map.put(state, :retries, state.retries + 1)}
Lastly, we’ll add a new connect/2
function head that detects when we’ve reached the maximum number of allowed retries. When we reach that limit, we want to return a :stop
tuple to kill the current process:
def connect(_info, state = %{retries: @max_retries}) do
{:stop, :normal, state}
end
Beautiful. Now our Bitcoin node will stop attempting to connect to its peer node after three failed attempts, waiting one second between each.
Disconnecting with Connect
Now that we’ve revamped how we connect to our peer node, we need to consider what should happen in the event that we disconnect from that node.
If our handle_call/3
, handle_cast/2
, or handle_info/2
callbacks return a :disconnect
tuple, our Connection behavior will call our disconnect/2
callback, which will decide the next course of action.
We have several options for handling the disconnection in our disconnect/2
callback. We can return a :connect
tuple to attempt a reconnection immediately. Similarly, we can return a :backoff
tuple to delay the reconnection by the specified timestamp
. Alternatively, we can return a :noconnect
tuple to keep the current process alive, but not attempt to reconnect to our peer node. Lastly, our disconnect/2
callback can return a :stop
tuple to immediately terminate our Bitcoin node process.
When we start connecting to more nodes in the future, the loss of a single node isn’t a big deal. Losing peers is just a part of life, unfortunately. With that in mind, if we detect a disconnect, we’ll simply close our TCP connection return a :stop
tuple from our disconnect/2
callback:
def disconnect(_, state) do
:ok = :gen_tcp.close(state.socket)
{:stop, :normal, state}
end
Next, when handling the result of our call to handle_messages/2
, we’ll deal with errors slightly differently. Instead of returning a :stop
tuple when we receive an :error
while handling one of our messages, we’ll instead return a :disconnect
tuple:
case handle_messages(messages, state) do
{:error, reason, state} -> {:disconnect, reason, %{state | rest: rest}}
state -> {:noreply, %{state | rest: rest}}
end
This will drop us into our disconnect/2
callback with the given reason
for the disconnect.
That’s all there is to it!
Final Thoughts
This refactor involved quite a few moving pieces, but in the end the final product is a cleaner, simpler, and more robust piece of software. With these changes we’ve positioned ourselves very nicely to move forward and expand on the Bitcoin node project we’ve found ourselves in.
Be sure to check out the complete code on Github to get a cohesive view of what we’ve done.
Next time we’ll start expanding our network of nodes by recursively connecting with the neighboring nodes we receive from our peer node. Stay tuned!