Architectural Decision Records | Hydra: Head Protocol

11. Use cardano-api

November 18, 2021 · 2 min read

Status

Accepted

Context

To implement Hydra Head's ledger we have been working with the ledger-specs packages which provide a low-level interface to work with transactions and ledgers
- We also use a lightly wrapped ledger-specs API as our interface for Off-chain transaction submission. This introduced some boilerplate in order to align with cardano-api and provide JSON serialisation.
In our initial experiments connecting directly to a cardano node we have also been using the ledger API for building transactions for want of some scripts-related features in the cardano-api
cardano-api is expected to be the supported entrypoint for clients to interact with Cardano chain while ledger-specs is reserved for internal use and direct interactions with ledgers
cardano-api now provides all the features we need to run our on-chain validators

Decision

Therefore

Use cardano-api types and functions instead of ledger-specs in Hydra.Chain.Direct component
Use cardano-api types instead of custom ones in Hydra.Ledger.Cardano component

Consequences

Removes the boilerplate in Hydra.Ledger.Cardano required to map cardano-api types sent by clients to builtin and ledger-specs types
Simplifies the Hydra.Chain.Direct component:
- Replaces custom transaction building in Tx
- Replaces custom transaction fees calculation and balancing in Wallet
- Replace low-level connection establishment using cardano-api functions connecting to the node (keeping the chain sync subscription)

12. Top-down Test-driven Design

November 25, 2021 · 2 min read

Status

Accepted

Context

Test-Driven Development or Test-Driven Design is a technique that helps team promotes simple and loosely coupled design, reduces the amount of code written, increases confidence in delivered software by providing a high level of code coverage by regression tests, and improves development speed through shorter feedback loop
While initially focused on unit tests, TDD has evolved over time to include higher-level tests like Behaviour Driven Development or Specification by Example, leading to comprehensive strategies like the Outside-In Diamond TDD
Being a foundational part of scalable applications based on Cardano blockchain, Hydra Head needs to be released early, often, and with high assurance in order to benefit from early adopters' feedback

Decision

Therefore

We start as early as possible with End-to-End tests, gradually making them more complex as we develop the various components but starting with something simple (like a system-level but dummy chain and hydra network).

We flesh out other integration tests as needed, when we refine the technological stack used for the various bits and pieces.

We do most of our work in the Executable Specifications layer while we are developing the core domain functions, eg. the Head protocol. The rationale being this is the level at which we can test the most complex behaviours in the fastest and safest possible way as we everything runs without external dependencies or can even run as pure code using io-sim.

We tactically drop to Unit tests level when dealing with the protocol's "fine prints".

Consequences

Development of each "feature", whether new or change to existing one, should start with a test defined at the highest level possible, but no higher
A detailed presentation of the various testing layers is available in the wiki

13. Plutus Contracts Testing Strategy

January 19, 2022 · 2 min read

Status

Accepted

Context

We are implementing our custom (Direct) interaction w/ Cardano blockchain and not using the PAB nor the Contract monad to define off-chain contract code
This implies we cannot use the official testing framework for Contracts which relies on Contract monad and emulator traces nor the QuickCheck based framework
We want to follow our Test-Driven Development approach for contracts as this is a critical part of Hydra
On-Chain Validators need not only to be correct and functional, but also secure and hardened against malicious parties

Decision

Therefore

We test-drive single contracts code using Mutation-Based Property Testing
Contracts are tested through the construction of actual transactions and running phase-2 ledger validation process
We start from a "healthy" transaction, that's expected to be correct and stay so
Contract code is initially const True function that validates any transaction
We flesh the contract's code piecemeal through the introduction of Mutations that turn a healthy transaction into an expectedly invalid one
We gradually build a set of combinators and generators that make it easier to mutate arbitrarily transactions, and combine those mutations

Consequences

We make the contracts' Threat model explicit through the tests we write, which should help future auditors' work
We'll need an additional layer of tests to exercise the Hydra OCV State Machine through sequence of transactions. This could be implemented using quickcheck-dynamic library, or other tools that are currently being developed by the Cardano community

14. Token usage in Hydra Scripts

February 14, 2022 · 3 min read

Status

Accepted

Context

The Hydra on-chain-verification scripts are used to validate Hydra protocol transactions and ensure they are lawful.
At least these three properties need to be enforced:
- Authentication: ensure that only Head participants can, for example, abort a Head
- Contract continuity: ensure that a Head was initialized before it can be opened by a collectCom tx.
- Completeness: ensure that all Head participants had chance to commit funds to a Head.
The Hydra Head paper introduces participation tokens (PT) and a state thread token (ST) for that matter.
Such tokens (a.k.a native assets) are identified by the CurrencySymbol, that is the hash of their MintingPolicyScript (a.k.a PolicyID in the ledger), and a ByteString, the socalled TokenName (a.k.a as AssetName in the ledger, see shelley-ma ledger spec)
There can be multiple Hydra Heads on a network and a hydra-node need to distinguish individual Head instances or even (later) keep track of multiple Heads. Concretely, this means that we need to infer a Head identifier (HeadId) from observing each of the Hydra protocol transactions.

Decision

We solve both challenges by defining that ST and PTs shall use the same MintingPolicyScript and thus have same CurrencySymbol
The MintingPolicyScript shall be parameterized by TxOutRef to yield a unique CurrencySymbol per Head (similar to the OneShotCurrency example)
ST and one PT per participant are minted in the initTx
The TokenName of the ST can be any well-known ByteString, e.g. "HydraHeadV1"
The TokenName of the PTs needs to be the PubKeyHash of the respective participant

Consequences

Heads can be identified by looking for the ST in init, collectCom, close, contest or fanout transactions, or the PT in commit transactions. In both cases, the CurrencySymbol == HeadId
Our scripts become simpler as we only need to check that ST/PT are paid forward, instead of needing to check datums
The datum produced by commit txs (and consumed by collectCom) is Just SerializedTxOut, which is simpler than also keeping the participant which committed in the datum (compare to full life-cycle of 0.3.0).
The v_head script validator does not need to be parameterized, which makes discovering new Heads (and also tracking them for metrics) easier as the address to watch for is common to all Heads (of the same v_head version).
The v_head script (path) for the abort life-cycle can be implemented already much safer by checking that all PTs are burned on the abort transaction (counting inputs in abort life-cycle of 0.3.0).
Updated diagrams for the full and abort on-chain life-cycles of a Hydra Head.

Follow-up questions

What value does the ST actually add? We could always look for the PT to identify a Head and contract continuity would already be achieved by the PTs!
In discussions it turned out to be not clear where the Head's CurrencySymbol is coming from, and consequently how to identify that an ST is indeed an ST?

15. Configuration Through an Admin API

March 17, 2022 · 4 min read

Status

Draft

Context

Hydra-node currently requires a whole slew of command-line arguments to configure properly its networking layer: --peer to connect to each peer, --cardano-verification-key and --hydra-verification-key to identify the peer on the L1 and L2 respectively.
This poses significant challenges for operating a cluster of Hydra nodes as one needs to know beforehand everything about the cluster, then pass a large number of arguments to some program or docker-compose file, before any node can be started
- This is a pain that's been felt first-hand for benchmarking and testing purpose
Having static network configuration is probably not sustainable in the long run, even if we don't add any fancy multihead capabilities to the node, as it would make it significantly harder to have automated creation of Heads.
There's been an attempt at providing a file-based network configuration but this was deemed unconvincing
Hydra paper (sec. 4, p. 13) explicitly assumes the existence of a setup phase
- This setup is currently left aside, e.g. exchange of keys for setting up multisig and identifying peers. The hydra-node executable is statically configured and those things are assumed to be known beforehand

Decision

Hydra-node exposes an Administrative API to enable configuration of the Hydra network using "standard" tools
- API is exposed as a set of HTTP endpoints on some port, consuming and producing JSON data,
- It is documented as part of the User's Guide for Hydra Head
This API provides commands and queries to:
- Add/remove peers providing their address and keys,
- List currently known peers and their connectivity status,
- Start/stop/reset the Hydra network
This API is implemented by a new component accessible through a network port separate from current Client API, that configures the Network component

The following picture sketches the proposed architectural change:

Architecture change

Q&A

Why a REST interface?
- This API is an interface over a specific resource controlled by the Hydra node, namely its knowledge of other peers with which new Head_s can be opened. As such a proper REST interface (_not RPC-in-disguise) seems to make sense here, rather than stream/event-based duplex communication channels
- We can easily extend such an API with WebSockets to provide notifications (e.g. peers connectivity, setup events...)
Why a separate component?
- We could imagine extending the existing APIServer interface with new messages related to this network configuration, however this seems to conflate different responsibilities in a single place: Configuring and managing the Hydra node itself, and configuring, managing, and interacting with the Head itself
- "Physical" separation of endpoints makes it easier to secure a very sensitive part of the node, namely its administration, e.g by ensuring this can only be accessed through a specific network interface, without relying on application level authentication mechanisms

Consequences

It's easy to deploy Hydra nodes with some standard configuration, then dynamically configure them, thus reducing the hassle of defining and configuring the Hydra network
It makes it possible to reconfigure a Hydra node with different peers
The Client API should reflect the state of the network and disable Initing a head if the network layer is not started
- In the long run, it should also have its scope reduced to represent only the possible interactions with a Head, moving things related to network connectivity and setup to the Admin API
- In a Managed Head scenario it would even make sense to have another layer of separation between the API to manage the life-cycle of the Head and the API to make transactions within the Head
Operational tools could be built easily on top of the API, for command-line or Web-based configuration

16. Keep Rejected ADRs

March 23, 2022 · One min read

Status

Accepted

Context

We have started using Architecture Decision Records as our primary way to document the most important design decisions we take while developing Hydra Node, and this has proved effective in fostering fruitful discussions about major architecture changes.

During the course of this project, we have sometimes had debates on various topics leading to rejection of some ADRs. It could be the case that a previously rejected proposal turns out to be interesting, either because the context and situation have changed enough to reevaluate a proposal, or as background for some new proposal.

Decision

therefore

We will keep rejected Architecture Decision Records alongside accepted and draft ones, in the same location and format
Rejected ADRs must have tag [Rejected] set

Consequences

Once attributed a serial number an ADR keeps it "forever", whether it's rejected or accepted

17. Use UDP protocol for Hydra networking

March 28, 2022 · 2 min read

Status

Draft

Context

Current Hydra networking layer is based on Ouroboros network framework networking stack which, among other features, provides:

An abstraction of stream-based duplex communication channels called a Snocket,
A Multiplexing connection manager that manages a set of equivalent peers, maintains connectivity, and ensures diffusion of messages to/from all peers,
Typed protocols for expressing the logic of message exchanges as a form of state machine.

While it's been working mostly fine so far, the abstractions and facilities provided by this network layer are not well suited for Hydra Head networking. Some of the questions and shortcomings are discussed in a document on Networking Requirements, and as the Hydra Head matures it seems time is ripe for overhauling current network implementation to better suite current and future Hydra Head networks needs.

Decision

Hydra Head nodes communicate by sending messages to other nodes using UDP protocol

Details

How do nodes know each other?: This is unspecified by this ADR and left for future work, it is assumed that a Hydra node operator knows the IP:Port address of its peers before opening a Head with them
Are messages encrypted?: This should probably be the case in order to ensure Heads' privacy but is also left for future work
How are nodes identified?: At the moment they are identified by their IP:Port pair. As we implement more of the setup process from section 4 of the Hydra Head paper, we should identify nodes by some public key(hash) and resolve the actual IP:Port pair using some other mechanism

Consequences

Node's HeadLogic handles lost, duplicates, and out-of-order messages using retry and timeout mechanisms
Messages should carry a unique identifier, eg. source node and index
Protocol, eg. messages format, is documented

18. Single state in Hydra.Node.

April 13, 2022 · 4 min read

Status

Superseded by ADR 23 and ADR 26

Context

Currently the hydra-node maintains two pieces of state during the life-cycle of a Hydra Head:
1. A HeadState tx provided by the HydraHead tx m handle interface and part of the Hydra.Node module. It provides the basis for the main hydra-node business logic in Hydra.Node.processNextEvent and Hydra.HeadLogic.updateCreation, Usage
2. SomeOnChainHeadState is kept in the Hydra.Chain.Direct to keep track of the latest known head state, including notable transaction outputs and information how to spend it (e.g. scripts and datums) Code, Usage 1, Usage 2, Usage 3 (There are other unrelated things kept in memory like the event history in the API server or a peer map in the network heartbeat component.)
The interface between the Hydra.Node and a Hydra.Chain component consists of
- constructing certain Head protocol transactions given a description of it (PostChainTx tx):
```
postTx :: MonadThrow m => PostChainTx tx -> m ()
```
- a callback function when the Hydra.Chain component observed a new Head protocol transaction described by OnChainTx tx:
```
type ChainCallback tx m = OnChainTx tx -> m ()
```
Given by the usage sites above, the Hydra.Chain.Direct module requires additional info to do both, construct protocol transactions with postTx as well as observe potential OnChainTx (here). Hence we see that, operation of the Hydra.Chain.Direct component (and likely any implementing the interface fully) is inherently stateful.
We are looking at upcoming features to handle rollbacks and dealing with persisting the head state.
- Both could benefit from the idea, that the HeadState is just a result of pure Event processing (a.k.a event sourcing).
- Right now the HeadState kept in Hydra.Node alone, is not enough to fully describe the state of the hydra-node. Hence it would not be enough to just persist all the Events and replaying them to achieve persistence, nor resetting to some previous HeadState in the presence of a rollback.

Decision

We define and keep a "blackbox" ChainStateType tx in the HeadState tx

It shall not be introspectable to the business logic in HeadLogic
It shall contain chain-specific information about the current Hydra Head, which will naturally need to evolve once we have multiple Heads in our feature scope
For example:

data HeadState tx
  = IdleState
  | InitialState
      { chainState :: ChainStateType tx
      -- ...
      }
  | OpenState
      { chainState :: ChainStateType tx
      -- ...
      }
  | ClosedState
      { chainState :: ChainStateType tx
      -- ...
      }

We provide the latest ChainStateType tx to postTx:

postTx :: ChainStateType tx -> PostChainTx tx -> m ()

We change the ChainEvent tx data type and callback interface of Chain to:

data ChainEvent tx
  = Observation
      { observedTx :: OnChainTx tx
      , newChainState :: ChainStateType tx
      }
  | Rollback ChainSlot
  | Tick UTCTime

type ChainCallback tx m = (ChainStateType tx -> Maybe (ChainEvent tx)) -> m ()

with the meaning, that invocation of the callback indicates receival of a transaction which is Maybe observing a relevant ChainEvent tx, where an Observation may include a newChainState.

We also decide to extend OnChainEffect with a ChainState tx to explicitly thread the used chainState in the Hydra.HeadLogic.

Consequences

We need to change the construction of Chain handles and the call sites of postTx
We need to extract the state handling (similar to the event queue) out of the HydraNode handle and shuffle the main of hydra-node a bit to be able to provide the latest ChainState to the chain callback as a continuation.
We need to make the ChainState serializable (ToJSON, FromJSON) as it will be part of the HeadState.
We can drop the TVar of keeping OnChainHeadState in the Hydra.Chain.Direct module.
We need to update DirectChainSpec and BehaviorSpec test suites to mock/implement the callback & state handling.
We might be able to simplify the ChainState tx to be just a UTxOType tx later.
As OnChainEffect and Observation values will contain a ChainStateType tx, traces will automatically include the full ChainState, which might be helpful but also possible big.

Alternative

We could extend PostChainTx (like Observation) with ChainState and keep the signatures:

postTx :: MonadThrow m => PostChainTx tx -> m ()
type ChainCallback tx m = (ChainState tx -> Maybe (ChainEvent tx) -> m ()

Not implemented as it is less clear on the need for a ChainState in the signatures.

19. Use of reference scripts

July 22, 2022 · 4 min read

Status

Proposed

Context

In the desire to make Hydra transactions smaller and cheaper (at the time of writing any abort tx was too big), we want to use the reference script and reference input features of the upcoming Babbage ledger era. See the babbage ledger spec, CIP-31 and CIP-33 for details.
With these features we do not need to (re-)include scripts in each transaction.
The CIPs do not specify how reference scripts are to be managed and we can see at least two options:
1. Add them as outputs to the init transaction or prior that as part of each Hydra Head instance
2. Post them out-of-band, separate to individual Head instances
Ownership of the outputs holding the scripts is to be considered. If these "reference outputs" are spent, they cannot be referred to anymore. This would mean all heads referring to them can be denied of service (DoS).
Each head will need to refer to the correct version of the hydra scripts. That is, consistent with the script hashes known to the hydra-node.
- This is also related to the problem of managing script versions & updates.
- Right now, the hydra-node is compiled against hydra-plutus to access compiled script content and hashes.
The general trade-off is: instead of paying ADA fees for scripts adding to the transaction size in each transaction, ADA deposits will need to be put down to have scripts be part of the UTxO set in the ledger once.

Decision

Publish outputs holding Hydra scripts out-of-band (option 2), because
- All scripts would not fit into the init transaction directly, we would need to post multiple.
- Costs (deposits) would need to be payed for each head instance.
The scripts are stored at outputs addressed to some unspendable v_publish validator.
- This is to avoid DoS risk and unnecessariy centralization
- We have considered "garbage collection" by allowing spending these outputs into re-publishing new versions of the script.
  - This would make things even more complicated and we decided to not bother about "littering the chain" right now.
We will publish scripts on release of the hydra-node, or more specifically of the hydra-plutus package.

Consequences

We need a process and/or tool to publish hydra-plutus scripts and need to pay the deposits.
- Any other party could do the same, this does not lead to centralization.
The hydra-node would be need to know the TxIns of the "right" published scripts.
- In the simplest case we would just make this configurable and provide configurations for the various networks after publishing scripts.
If we combine the v_publish validator with a "tag", this allows nodes to "discover" scripts of a known version
- For example, we could define HydraHeadV1, HydraInitialV1 and HydraCommitV1 as such tags
- We could parameterize the validator by the tag, yielding unique addresses per tag.
- Alternatively, the "tag" could be stored in a canonical form as datum on the script outputs.
- In any case, this allows for some checking consistency or easier configuration (not needing to enumerate which TxIn is which script)
By also knowing the script hashes the hydra-node can verify the integrity of "found" reference scripts
- This would be possible right now, as they are compiled into the node
- Might be undesirable later for easier system configuration
By making v_publish unspendable, we "litter" the chain. However, any garbage collection scheme would mean potential to DoS again.
Extended diagram for the abort on-chain life-cycles of a Hydra Head to include reference scripts.

20. Handling time

August 2, 2022 · 4 min read

Status

Accepted

Context

The Hydra Head protocol is expected to be isomorphic to the ledger it runs on. That means, it should support the same transaction formats and (if desired) use the same ledger rules as the layer 1.
Cardano is our layer 1 and its consensus layer separates time into discrete steps, where each step is called a Slot. The network is expected to evolve strictly monotonically on this time scale and so slot numbers (SlotNo) are always increasing.
The Cardano mainnet has a block scheduled every 20 seconds, although it may take longer.
- This is because slotLength = 1.0 and every 20th slot is "active" with f = 0.05.
- The consensus protocol requires k blocks to be produced within 3k/f slots, where k = 2160 on mainnet.
Transactions on Cardano may have a validity range with a lower and upper bound given as SlotNo.
Wall-clock time can be converted to slots (and back) using an EraHistory or EpochInterpreter provided by the consensus layer of the cardano node. This is required as the slot lengths could change over time.
- All past points in time since the SystemStart can be converted.
- Future points in time can only be converted in the "safe zone", practically being at least 3k/f slots (TODO: cross check). Refer to chapter 17 Time on the consensus spec for more details.
The Hydra Head protocol allows close and contest transactions only up before a deadline T_final, and fanout transactions after the deadline.
- In the current implementation the deadline is upper validity of closed plus the contestation period.
- We also consider protocol variants which push out the deadline by the contestation period on each contest.
- Contestation periods may very well be longer than the stability window of the protocol. For example: 7 days, while the mainnet stability window is more like 36 hours.
We have encountered two problems with handling time in the past
- Trying to convert wall-clock time to slots of the Head protocol deadline led to PastHorizonException (when using very low security parameter k)
- Trying to fanout after the deadline, but before another block has been seen by the L1 ledger led to OutsideValidityIntervalUTxO.
The second problem scenario and solution ideas are roughly visible on this whiteboard:

Decision

The head logic uses wall-clock time to track time and only convert to/from slots when constructing/observing transactions in the chain layer.
- This ensures that transactions we post or see on the chain can be converted to/from slots.
- The head logic would use UTCTime for points in time and NominalDiffTime for durations.
- The chain layer converts these using the SystemStart and EraHistory into SlotNo.
The chain layer informs the logic layer whenever time passed (on the chain) using a new Tick event.
- For the direct chain implementation, this is whenever we see a block in the chain sync protocol.
- Per above decision, the Tick shall contain a UTCTime corresponding to the new "now" as seen through the block chain.

Consequences

Conversion from UTCTime -> SlotNo and vice versa stays local to the chain layer.
The HeadLogic can track chain time in its state and condition ReadyToFanout upon seeing it pass the deadline.
- Ensures clients only see ReadyToFanout when a following Fanout would be really possible.
- Makes the Delay effect redundant and we can remove it (only delay via reenqueue on the Wait outcome)
By introducing Tick events, IOSim will not be able to detect non-progress (deadlocks).
- This means we cannot rely on early exit of simulations anymore and need to determine meaningful simulation endings instead of waitUntilTheEndOfTime.
We get a first, rough notion of time for free in our L2 and can support "timed transactions" with same resolution as the L1.
- Tracking time in the state makes it trivial to provide it to the ledger when we applyTransaction.
- Of course we could extend the fidelity of this feature using the system clock for "dead reckoning" between blocks. The conversion of wall clock to slot could even be configurable using an L2 slotLength analogous to L1 (although we might not want/need this).

Status​

Context​

Decision​

Consequences​

Status​

Context​

Decision​

Consequences​

Status​

Context​

Decision​

Consequences​

Status​

Context​

Decision​

Consequences​

Follow-up questions​

Status​

Context​

Decision​

Q&A​

Consequences​

Status​

Context​

Decision​

Consequences​

Status​

Context​

Decision​

Details​

Consequences​

Status​

Context​

Decision​

Consequences​

Alternative​

Status​

Context​

Decision​

Consequences​

Status​

Context​

Decision​

Consequences​

Status

Context

Decision

Consequences

Status

Context

Decision

Consequences

Status

Context

Decision

Consequences

Status

Context

Decision

Consequences

Follow-up questions

Status

Context

Decision

Q&A

Consequences

Status

Context

Decision

Consequences

Status

Context

Decision

Details

Consequences

Status

Context

Decision

Consequences

Alternative

Status

Context

Decision

Consequences

Status

Context

Decision

Consequences