Skip to main content

31. Achieve constant memory in hydra-node

· 2 min read
Sasha Bogicevic
Senior Software Engineer

Status

Accepted

Context

When testing out hydra-node operation under heavy or increased load we are noticing that memory consumption is far from ideal. So far we didn't bother thinking about the performance so much but time has come to try and reduce memory footprint of a running hydra-node.

There are some quick points to be scored here since our projections that are used to serve in-memory data are using a common haskell list as a data structure. We should stream the data keeping the memory bounded as the first optimisation.

It is also not necessary to output the whole history of messages by default and only do that if clients request to see the whole history. Internally our ServerOutput type could be remapped to StateChanged since the two are almost identical. Any new information must be streamed to the clients automatically.

Decision

  • Re-map ServerOutput to StateChanged by adding any missing constructor to StateChanged (eg. PeerConnected).
  • Output new client messages on newState changes instead of using ClientEffect.
  • Use StateChanged in all projections we server from the API (re-use eventId as sequence number).
  • Make hydra-node output history of messages only on demand (breaking change is to be communicated in the changelog).
  • Use conduit library to achieve constant memory by streaming the data in our projections.

Consequences

This should lead to much better performance of hydra-node in terms of used memory for the running process. This should be also confirmed by running the relevant benchmarks and do a test (even manual or a script) to assert that the memory consumption is actually reduced.

32. Network layer properties, implementation using etcd

· 4 min read
Sebastian Nagel
Software Engineering Lead

Status

Accepted

Context

  • The communication primitive of broadcast is introduced in ADR 6. The original protocol design in the paper and that ADR implicitly assume a reliable broadcast.

  • ADR 27 further specifies that the hydra-node should be tolerant to the fail-recovery failure model, and takes the decision to implement a reliable broadcast by persisting outgoing messages and using a vector clock and heartbeat mechanism, over a dumb transport layer.

    • The current transport layer in use is a simple FireForget protocol over TCP connections implemented using ouroboros-framework.
    • ADR 17 proposed to use UDP instead
    • Either this design or its implementation was discovered to be wrong, because this system did not survive fault injection tests with moderate package drops.
  • This research paper explored various consensus protocols used in blockchain space and reminds us of the correspondence between consensus and broadcasts:

    the form of consensus relevant for blockchain is technically known as atomic broadcast

    It also states that (back then):

    The most important and most prominent way to implement atomic broadcast (i.e., consensus) in distributed systems prone to t < n/2 node crashes is the family of protocols known today as Paxos and Viewstamped Replication (VSR).

Decision

  • We realize that the way the off-chain protocol is specified in the paper, the broadcast abstraction required from the Network interface is a so-called uniform reliable broadcast. Hence, any implementation of Network needs to satisfy the following properties:

    1. Validity: If a correct process p broadcasts a message m, then p eventually delivers m.
    2. No duplication: No message is delivered more than once.
    3. No creation: If a process delivers a message m with sender s, then m was previously broadcast by process s.
    4. Agreement: If a message m is delivered by some correct process, then m is eventually delivered by every correct process.

    See also Module 3.3 in Introduction to Reliable and Secure Distributed Programming by Cachin et al, or Self-stabilizing Uniform Reliable Broadcast by Oskar Lundström

  • Use etcd as a proxy to achieve reliable broadcast via its raft consensus

    • Raft is an evolution of Paxos and similar to VSR
    • Over-satisfies requirements as it provides "Uniform total order" (satisfies atomic broadcast properties)
    • Each hydra-node runs a etcd instance to realize its Network interface
    • See the following architecture diagram which also contains some notes on Network interface properties:

  • We supersede ADR 17 and ADR 27 decisions on how to implement Network with the current ADR.
    • Drop existing implementation of Ouroboros and Reliability components
    • Could be revisited, as in theory it would satisfy properties if implemented correctly?
    • Uniform reliable broadcast = only deliver when seen by everyone = not what we had implemented?

Consequences

  • Crash tolerance of up to n/2 failing nodes

  • Using etcd as-is adds a run-time dependency onto that binary.

    • Docker image users should not see any different UX
    • We can ship the binary through hydra-node.
  • Introspectability network as the etcd cluster is queryable could improve debugging experience

  • Persisted state for networking changes as there will be no acks, but the etcd Write Ahead Log (WAL) and a last seen revision.

  • Can keep same user experience on configuration

    • Full, static topology with listing everyone as --peer
    • Simpler configuration via peer discovery possible
  • PeerConnected semantics needs to change to an overall HydraNetworkConnected

    • We can only submit / receive messages when connected to the majority cluster
  • etcd has a few features out-of-the-box we could lean into, e.g.

    • use TLS to secure peer connections
    • separate advertised and binding addresses

33. Directly open head: removal of initialization phase

· 3 min read
Sebastian Nagel
Software Engineering Lead

Status

Accepted

Context

  • The Hydra Head protocol, as described in the original paper, specifies an initialization phase before a head can be opened. This phase consists of multiple on-chain transactions:

    1. Init - Creates the head on-chain in an Initial state, minting head tokens (one state thread token + one participation token per party).
    2. Commit - Each participant locks UTxO they want to bring into the head (one transaction per party).
    3. CollectCom - Once all parties have committed, collects all committed UTxO and transitions the head to Open.
    4. Abort - An alternative path allowing any party to cancel initialization and reimburse all committed UTxO.
  • This initialization phase introduced significant complexity:

    • Two additional on-chain validators (vInitial, vCommit) beyond the head validator itself.
    • An Initial off-chain state tracking pending commits and committed UTxO per party.
    • Multiple API events (HeadIsInitializing, Committed, HeadIsAborted) and a client command (Abort).
  • The "unabortable heads" problem: If a participant committed a large UTxO set, the resulting Abort transaction could exceed Cardano transaction size limits, making it impossible to abort the head. This effectively locked funds with no recourse.

  • The Commit mechanism was limited in how much UTxO could be committed per party due to on-chain transaction size constraints, while the later-added deposit/increment mechanism does not have these per-party limitations.

  • The overall lifecycle cost of opening a head was high: Init + N Commit transactions + CollectCom, each requiring on-chain fees.

Decision

Remove the initialization phase entirely. The Init transaction directly opens the head:

  • Init creates an Open head: The Init transaction mints the head tokens and creates the head output in Open state with an empty UTxO set (utxoHash = hash(∅)).
  • Funds are added post-opening: Participants use the existing deposit/increment mechanism to add funds to the head after it is opened.
  • No Commit, CollectCom, or Abort transactions: These transaction types and their associated on-chain validators are removed.

The head lifecycle simplifies from:

Idle → Init → Initial → Commit* → CollectCom → Open → ... → Final

Abort → Idle

to:

Idle → Init → Open → ... → Final

Consequences

  • The vInitial and vCommit on-chain validators are removed, reducing the on-chain script surface and audit scope.

  • The head validator state machine simplifies from Initial → Open → Closed → Final to Open → Closed → Final.

  • Off-chain complexity is reduced: the InitialState and its associated event handlers (commit tracking, abort logic) are removed from HeadLogic.

  • The API surface shrinks: HeadIsInitializing, Committed, HeadIsAborted server outputs and the Abort client input are removed. HeadIsOpen no longer carries a utxo field since heads always open empty.

  • Opening a head requires fewer on-chain transactions (single Init vs. Init + N Commits + CollectCom), reducing costs for most use cases.

  • The unabortable heads problem is eliminated since there is no abort transaction.

  • Adding funds to a head is unified under a single mechanism (deposit/increment) regardless of whether it happens at the start or later during the head's lifetime.

  • The overloaded "commit" term is dropped in favor of "deposit". Previously, the /commit HTTP endpoint served double duty (committing during initialization and depositing into an open head). With initialization removed, this endpoint can be replaced by a more REST-like /deposits resource, and server outputs renamed from CommitXXX to DepositXXX, resulting in a cleaner and more consistent API.

  • A future enhancement could allow the initiator to include initial funds directly in the Init transaction, avoiding the need for a separate deposit.