Skip to main content

21 posts tagged with "Accepted"

View All Tags

· 2 min read

Status

Accepted

Context

  • To implement Hydra Head's ledger we have been working with the ledger-specs packages which provide a low-level interface to work with transactions and ledgers
    • We also use a lightly wrapped ledger-specs API as our interface for Off-chain transaction submission. This introduced some boilerplate in order to align with cardano-api and provide JSON serialisation.
  • In our initial experiments connecting directly to a cardano node we have also been using the ledger API for building transactions for want of some scripts-related features in the cardano-api
  • cardano-api is expected to be the supported entrypoint for clients to interact with Cardano chain while ledger-specs is reserved for internal use and direct interactions with ledgers
  • cardano-api now provides all the features we need to run our on-chain validators

Decision

Therefore

  • Use cardano-api types and functions instead of ledger-specs in Hydra.Chain.Direct component
  • Use cardano-api types instead of custom ones in Hydra.Ledger.Cardano component

Consequences

  • Removes the boilerplate in Hydra.Ledger.Cardano required to map cardano-api types sent by clients to builtin and ledger-specs types
  • Simplifies the Hydra.Chain.Direct component:
    • Replaces custom transaction building in Tx
    • Replaces custom transaction fees calculation and balancing in Wallet
    • Replace low-level connection establishment using cardano-api functions connecting to the node (keeping the chain sync subscription)

· 2 min read

Status

Accepted

Context

  • Test-Driven Development or Test-Driven Design is a technique that helps team promotes simple and loosely coupled design, reduces the amount of code written, increases confidence in delivered software by providing a high level of code coverage by regression tests, and improves development speed through shorter feedback loop
  • While initially focused on unit tests, TDD has evolved over time to include higher-level tests like Behaviour Driven Development or Specification by Example, leading to comprehensive strategies like the Outside-In Diamond TDD
  • Being a foundational part of scalable applications based on Cardano blockchain, Hydra Head needs to be released early, often, and with high assurance in order to benefit from early adopters' feedback

Decision

Therefore

We start as early as possible with End-to-End tests, gradually making them more complex as we develop the various components but starting with something simple (like a system-level but dummy chain and hydra network).

We flesh out other integration tests as needed, when we refine the technological stack used for the various bits and pieces.

We do most of our work in the Executable Specifications layer while we are developing the core domain functions, eg. the Head protocol. The rationale being this is the level at which we can test the most complex behaviours in the fastest and safest possible way as we everything runs without external dependencies or can even run as pure code using io-sim.

We tactically drop to Unit tests level when dealing with the protocol's "fine prints".

Consequences

  • Development of each "feature", whether new or change to existing one, should start with a test defined at the highest level possible, but no higher
  • A detailed presentation of the various testing layers is available in the wiki

· 2 min read

Status

Accepted

Context

  • We are implementing our custom (Direct) interaction w/ Cardano blockchain and not using the PAB nor the Contract monad to define off-chain contract code
  • This implies we cannot use the official testing framework for Contracts which relies on Contract monad and emulator traces nor the QuickCheck based framework
  • We want to follow our Test-Driven Development approach for contracts as this is a critical part of Hydra
  • On-Chain Validators need not only to be correct and functional, but also secure and hardened against malicious parties

Decision

Therefore

  • We test-drive single contracts code using Mutation-Based Property Testing
  • Contracts are tested through the construction of actual transactions and running phase-2 ledger validation process
  • We start from a "healthy" transaction, that's expected to be correct and stay so
  • Contract code is initially const True function that validates any transaction
  • We flesh the contract's code piecemeal through the introduction of Mutations that turn a healthy transaction into an expectedly invalid one
  • We gradually build a set of combinators and generators that make it easier to mutate arbitrarily transactions, and combine those mutations

Consequences

  • We make the contracts' Threat model explicit through the tests we write, which should help future auditors' work
  • We'll need an additional layer of tests to exercise the Hydra OCV State Machine through sequence of transactions. This could be implemented using quickcheck-dynamic library, or other tools that are currently being developed by the Cardano community

· 3 min read

Status

Accepted

Context

  • The Hydra on-chain-verification scripts are used to validate Hydra protocol transactions and ensure they are lawful.
  • At least these three properties need to be enforced:
    • Authentication: ensure that only Head participants can, for example, abort a Head
    • Contract continuity: ensure that a Head was initialized before it can be opened by a collectCom tx.
    • Completeness: ensure that all Head participants had chance to commit funds to a Head.
  • The Hydra Head paper introduces participation tokens (PT) and a state thread token (ST) for that matter.
  • Such tokens (a.k.a native assets) are identified by the CurrencySymbol, that is the hash of their MintingPolicyScript (a.k.a PolicyID in the ledger), and a ByteString, the socalled TokenName (a.k.a as AssetName in the ledger, see shelley-ma ledger spec)
  • There can be multiple Hydra Heads on a network and a hydra-node need to distinguish individual Head instances or even (later) keep track of multiple Heads. Concretely, this means that we need to infer a Head identifier (HeadId) from observing each of the Hydra protocol transactions.

Decision

  • We solve both challenges by defining that ST and PTs shall use the same MintingPolicyScript and thus have same CurrencySymbol
  • The MintingPolicyScript shall be parameterized by TxOutRef to yield a unique CurrencySymbol per Head (similar to the OneShotCurrency example)
  • ST and one PT per participant are minted in the initTx
  • The TokenName of the ST can be any well-known ByteString, e.g. "HydraHeadV1"
  • The TokenName of the PTs needs to be the PubKeyHash of the respective participant

Consequences

  • Heads can be identified by looking for the ST in init, collectCom, close, contest or fanout transactions, or the PT in commit transactions. In both cases, the CurrencySymbol == HeadId

  • Our scripts become simpler as we only need to check that ST/PT are paid forward, instead of needing to check datums

  • The datum produced by commit txs (and consumed by collectCom) is Just SerializedTxOut, which is simpler than also keeping the participant which committed in the datum (compare to full life-cycle of 0.3.0).

  • The v_head script validator does not need to be parameterized, which makes discovering new Heads (and also tracking them for metrics) easier as the address to watch for is common to all Heads (of the same v_head version).

  • The v_head script (path) for the abort life-cycle can be implemented already much safer by checking that all PTs are burned on the abort transaction (counting inputs in abort life-cycle of 0.3.0).

  • Updated diagrams for the full and abort on-chain life-cycles of a Hydra Head.

Follow-up questions

  • What value does the ST actually add? We could always look for the PT to identify a Head and contract continuity would already be achieved by the PTs!
  • In discussions it turned out to be not clear where the Head's CurrencySymbol is coming from, and consequently how to identify that an ST is indeed an ST?

· One min read

Status

Accepted

Context

We have started using Architecture Decision Records as our primary way to document the most important design decisions we take while developing Hydra Node, and this has proved effective in fostering fruitful discussions about major architecture changes.

During the course of this project, we have sometimes had debates on various topics leading to rejection of some ADRs. It could be the case that a previously rejected proposal turns out to be interesting, either because the context and situation have changed enough to reevaluate a proposal, or as background for some new proposal.

Decision

therefore

  • We will keep rejected Architecture Decision Records alongside accepted and draft ones, in the same location and format
  • Rejected ADRs must have tag [Rejected] set

Consequences

Once attributed a serial number an ADR keeps it "forever", whether it's rejected or accepted

· 4 min read

Status

Accepted

Context

  • The Hydra Head protocol is expected to be isomorphic to the ledger it runs on. That means, it should support the same transaction formats and (if desired) use the same ledger rules as the layer 1.

  • Cardano is our layer 1 and its consensus layer separates time into discrete steps, where each step is called a Slot. The network is expected to evolve strictly monotonically on this time scale and so slot numbers (SlotNo) are always increasing.

  • The Cardano mainnet has a block scheduled every 20 seconds, although it may take longer.

    • This is because slotLength = 1.0 and every 20th slot is "active" with f = 0.05.
    • The consensus protocol requires k blocks to be produced within 3k/f slots, where k = 2160 on mainnet.
  • Transactions on Cardano may have a validity range with a lower and upper bound given as SlotNo.

  • Wall-clock time can be converted to slots (and back) using an EraHistory or EpochInterpreter provided by the consensus layer of the cardano node. This is required as the slot lengths could change over time.

    • All past points in time since the SystemStart can be converted.
    • Future points in time can only be converted in the "safe zone", practically being at least 3k/f slots (TODO: cross check). Refer to chapter 17 Time on the consensus spec for more details.
  • The Hydra Head protocol allows close and contest transactions only up before a deadline T_final, and fanout transactions after the deadline.

    • In the current implementation the deadline is upper validity of closed plus the contestation period.
    • We also consider protocol variants which push out the deadline by the contestation period on each contest.
    • Contestation periods may very well be longer than the stability window of the protocol. For example: 7 days, while the mainnet stability window is more like 36 hours.
  • We have encountered two problems with handling time in the past

    • Trying to convert wall-clock time to slots of the Head protocol deadline led to PastHorizonException (when using very low security parameter k)
    • Trying to fanout after the deadline, but before another block has been seen by the L1 ledger led to OutsideValidityIntervalUTxO.
  • The second problem scenario and solution ideas are roughly visible on this whiteboard:

Decision

  • The head logic uses wall-clock time to track time and only convert to/from slots when constructing/observing transactions in the chain layer.

    • This ensures that transactions we post or see on the chain can be converted to/from slots.
    • The head logic would use UTCTime for points in time and NominalDiffTime for durations.
    • The chain layer converts these using the SystemStart and EraHistory into SlotNo.
  • The chain layer informs the logic layer whenever time passed (on the chain) using a new Tick event.

    • For the direct chain implementation, this is whenever we see a block in the chain sync protocol.
    • Per above decision, the Tick shall contain a UTCTime corresponding to the new "now" as seen through the block chain.

Consequences

  • Conversion from UTCTime -> SlotNo and vice versa stays local to the chain layer.

  • The HeadLogic can track chain time in its state and condition ReadyToFanout upon seeing it pass the deadline.

    • Ensures clients only see ReadyToFanout when a following Fanout would be really possible.
    • Makes the Delay effect redundant and we can remove it (only delay via reenqueue on the Wait outcome)
  • By introducing Tick events, IOSim will not be able to detect non-progress (deadlocks).

    • This means we cannot rely on early exit of simulations anymore and need to determine meaningful simulation endings instead of waitUntilTheEndOfTime.
  • We get a first, rough notion of time for free in our L2 and can support "timed transactions" with same resolution as the L1.

    • Tracking time in the state makes it trivial to provide it to the ledger when we applyTransaction.
    • Of course we could extend the fidelity of this feature using the system clock for "dead reckoning" between blocks. The conversion of wall clock to slot could even be configurable using an L2 slotLength analogous to L1 (although we might not want/need this).

· 3 min read
Sebastian Nagel
Pascal Grange
Franco Testagrossa
Arnaud Bailly
Sasha Bogicevic

Status

Proposed

Context

  • The HydraHeadV1 formal specification contains a bounded confirmation window:

    // Deadline

    T_max <= T_min + L // Bounded confirmation window
    DL’ = T_max + L // The latest possible deadline is 2*L

    with T_min and T_max being the tx validity bounds and L being the contestation period.

    • This is to avoid attacks with specified upper validity bound being too far in the future and denial of service the head with this (e.g. 10 years).

Current state of things:

  • The contestation period and upper tx validity is used for computing the contestation deadline.

  • There is a closeGraceTime currently hard-coded (to 100 slots) to set some upper bound on the closeTx. This was also required so far to compute the contestation deadline.

  • Different networks (chains) have different slot lenghts, e.g. the preview network has a slot every 1s, while our local devnets use 0.1s. This means hardcoded values like closeGraceTime need to be in sync with the underlying network.

  • The contestationPeriod can be configured by users via the Init client input. For example, the hydra-cluster test suite uses a hardcoded cperiod on the client side.

  • Default value for T_Min is negative infinity.

  • Lower tx validity being in the future does not pose a problem since other participant is able to close a head.

What we want to achieve:

  • We want to enforce topmost formula in this file in our code on-chain.

  • Introduce maxGraceTime expressed in seconds in place of closeGraceTime and adjust to appropriate value.

  • The contestation period is to be used to create bounded close transaction (together with maxGraceTime). Before it was only used for computing the contestation deadline.

  • If contestation period is higher than maxGraceTime we will pick the latter. We still need maxGraceTime since if contestationPeriod is low for the current network our txs reach the upper bound fast and become invalid. That is why we set the upper tx bound to be minimum between contestationPeriod and maxGraceTime so that txs have high enough upper bound.

  • Make sure all head participants use the same value for contestationPeriod.

  • Attack vector has a corresponding mutation test.

Decision

  • Use the specification formula on-chain.

  • Configure the contestation period (number of seconds) on the hydra-node, e.g. via a --contestation-period command line option.

  • Lower tx bound should be the last known slot as reported by the cardano-node.

  • Upper tx bound is the current time + minimum between contestationPeriod and maxGraceTime.

  • When submitting the InitTx make sure to use --contestation-period value from our node's flag.

  • If other nodes observe OnInitTx and the contestationPeriod value does not match with their --contestation-period setting - ignore InitTx.

  • Rename closeGraceTime to maxGraceTime since we are using it also for upper bound of a contest tx.

Consequences

  • Not any positive number of seconds is a valid contestation period any more!

  • Upper tx validity of close transaction is the minimum between maxGraceTime and contestationPeriod and this needs to be good enough value with respect to running network. This is a consequence required by the ledger when constructing transactions since we cannot convert arbitrary point in times to slots.

  • All parties need to aggree on contestation period before trying to run a Head protocol otherwise InitTx will be ignored.

· 2 min read

Status

Accepted

Context

  • We have been experimenting with quickcheck-dynamic for a while, leading to the implementation of basic Model-Based tests for the Hydra Head Protocol
  • These tests fill a gap in our testing strategy, between BehaviorSpec tests which test a "network" of nodes but only at the level of the off-chain Head logic, and EndToEndSpec tests which test a full blown network of nodes interconnected through real network connections and to a real cardano-node:
    • The former are fast but do not test the complete lifecycle of a Head. Furthermore, they are only unit tests so do not provide coverage into various corner cases that could arise in practice
    • The latter exercise the full lifecycle but are very slow and brittle
  • Because they run in io-sim, those Model-based tests are fast and robust as they don't depend on system interactions. Moreover, decoupling the System-under-Test from IO makes it easy to simulate an environment that deviates from the "happy path" such as delays from the network, filesystem errors, or even adversarial behaviour from the node, or the chain.

Decision

  • We will maintain and evolve the Model over time to cover more features
  • Key properties of the whole system should be written-down as proper DynamicLogic properties and thoroughly tested using quickcheck-dynamic. This includes but is not limited to:
    • Liveness of the Head
    • Consistency of the Head
    • Soundness of Chain
    • Completeness of Chain

Consequences

  • We need to ensure the Model covers the full lifecycle of a Hydra Head network which at the time of writing this ADR is not the case
  • There cannot be One Model to Rule Them All so we should refrain from defining different StateModel or different RunModel depending on what needs to be tested
  • In particular, testing against adversarial conditions will certainly require defining different instances of the Network or Chain components, for example:
    • An Active Adversary that fully the controls the protocol and the parties,
    • A Network Adversary that can delay and or drop messages,
    • A Faulty Filesystem that can causes exceptions when reading or writing files,
    • ...

· 2 min read

Status

Accepted

Context

  • ADR 18 merged both headState and chainState into one single state in the Hydra node, giving the chain layer a way to fetch and update the chainState when observing a chain event.
  • Having the headState containing the chainState made persistency easier to deal with: we ensure that we always save cohesive states.
  • When opening our first head on mainnet we suffered from a commit/rollback issue that was the result of a race condition in the management of the chainState as implemented in the context of ADR 18.
  • Reproducing the issue by introducing rollbacks in the model based tests, we discovered that, as a client of a hydra-node, we had no idea how to deal with the rollback event as it is defined now.
  • #185 plans to improve rollback management.

The following picture details the race condition through an exemple:

  1. The DirectChain component fetch some chainState 0 from the headState

  2. The DirectChain component observes a transaction and it

  • publishes an event about this observation
  • updates the headState with some chainState 1
  1. The Node processes the event and emits a new headState with a previousRecoverableState in case a rollback later happens

The problem is that HeadState 2 in the figure should point to a previous recoverable head state containing chainState 0 and not chainState 1.

race condition

Updating the chain state only in the HeadLogic leads to problems when several transactions are in the same block. This can be mitigated by keeping a volatile chain state locally while analysing the block. But then it leads to race conditions issues if, for some reason, blocks are produced faster than they are processed by the HeadLogic. Low probability in production but higher when testing.

Decision

  • We supersede ADR 18 with the current ADR.
  • A local chain state is re-introduced in the chain component, not shared with the head logic.
  • A copy of the chainState is kept in the headState to keep the benefits of ADR 18 regarding persistency.
  • The RolledBack output is removed from the API unless actionable by users or #185 implemented.

Consequences

  • The rollback logic is removed from the HeadLogic and only maintained in the chain component.
  • The Rollback event carries the ChainState.
  • At the node startup, we initialize the chain layer with the persisted chainState

· 3 min read
Arnaud Bailly

Status

Accepted

Context

  • The state of a Hydra Head is currently persisted as a whole upon each NewState outcome from the update function: The new state is serialised and the state file is overwritten with the corresponding bytes. While this is a straightforward strategy to implement, it has a huge impact on the performance of a Hydra Head as serialising a large data structure like the HeadState and completely overwriting a file is costly
    • We revisited our benchmarks and found that persistence was the major bottleneck when measuring roundtrip confirmation time,e g. the time it takes from a client's perspective to submit a transaction and observe in a ConfirmedSnapshot
  • Furthermore, the way we currently handle changes to the HeadState in the hydra-node, while conceptually being an Effect is handled differently from other Effects: The state is updated transactionally through a dedicated modifyHeadState function in the core loop of processing events, and then effects are processed.

Decision

Implement state persistence using Event Sourcing. Practically, this means:

  1. Replace the NewState outcome with a StateChanged event which can be part of the Outcome of HeadLogic's update function, representing the change to be applied to the current state.
  2. Add an aggregate function to manage applying StateChanged events on top of the current HeadState to keep it updated in-memory.
  3. Persist StateChangeds in an append-only log using a dedicated handle.
  4. Upon node startup, reread StateChanged events log and reapply those to reset the HeadState.

The following sequence diagram illustrates new event handling in the HeadLogic:

Consequences

  • 🐎 The main expected consequence of this change is an increase of the overall performance of Hydra Head network.

  • Need to pattern match twice on the HeadState, once in update and once in aggregate.

  • Terms from the specification are distributed over update and aggregate function. For example, the statements about updating all seen transactions would now be in aggregate and not anymore in update.

  • New possibilities this change introduces with respect to ServerOutput handling and client's access to a head's state:

    • Instead of having the HeadLogic emits directly a ClientEffect, the latter could be the result of a client-centric interpretation of a StateChanged.
    • Pushing this a little further, we could maintain a Query Model for clients with a dedicated Query API to ease implementation of stateless clients.
  • Calling StateChanged an event while treating it in the code alongside effects might introduce some confusion as we already use the word Event to designate the inputs (a.k.a. commands) to the Head logic state machine. We might want at some later point to unify the terminology.