Skip to main content

· 3 min read
Sebastian Nagel
Pascal Grange
Franco Testagrossa
Arnaud Bailly
Sasha Bogicevic

Status

Accepted

Context

  • The HydraHeadV1 formal specification contains a bounded confirmation window:

    // Deadline

    T_max <= T_min + L // Bounded confirmation window
    DL’ = T_max + L // The latest possible deadline is 2*L

    with T_min and T_max being the tx validity bounds and L being the contestation period.

    • This is to avoid attacks with specified upper validity bound being too far in the future and denial of service the head with this (e.g. 10 years).

Current state of things:

  • The contestation period and upper tx validity is used for computing the contestation deadline.

  • There is a closeGraceTime currently hard-coded (to 100 slots) to set some upper bound on the closeTx. This was also required so far to compute the contestation deadline.

  • Different networks (chains) have different slot lenghts, e.g. the preview network has a slot every 1s, while our local devnets use 0.1s. This means hardcoded values like closeGraceTime need to be in sync with the underlying network.

  • The contestationPeriod can be configured by users via the Init client input. For example, the hydra-cluster test suite uses a hardcoded cperiod on the client side.

  • Default value for T_Min is negative infinity.

  • Lower tx validity being in the future does not pose a problem since other participant is able to close a head.

What we want to achieve:

  • We want to enforce topmost formula in this file in our code on-chain.

  • Introduce maxGraceTime expressed in seconds in place of closeGraceTime and adjust to appropriate value.

  • The contestation period is to be used to create bounded close transaction (together with maxGraceTime). Before it was only used for computing the contestation deadline.

  • If contestation period is higher than maxGraceTime we will pick the latter. We still need maxGraceTime since if contestationPeriod is low for the current network our txs reach the upper bound fast and become invalid. That is why we set the upper tx bound to be minimum between contestationPeriod and maxGraceTime so that txs have high enough upper bound.

  • Make sure all head participants use the same value for contestationPeriod.

  • Attack vector has a corresponding mutation test.

Decision

  • Use the specification formula on-chain.

  • Configure the contestation period (number of seconds) on the hydra-node, e.g. via a --contestation-period command line option.

  • Lower tx bound should be the last known slot as reported by the cardano-node.

  • Upper tx bound is the current time + minimum between contestationPeriod and maxGraceTime.

  • When submitting the InitTx make sure to use --contestation-period value from our node's flag.

  • If other nodes observe OnInitTx and the contestationPeriod value does not match with their --contestation-period setting - ignore InitTx.

  • Rename closeGraceTime to maxGraceTime since we are using it also for upper bound of a contest tx.

Consequences

  • Not any positive number of seconds is a valid contestation period any more!

  • Upper tx validity of close transaction is the minimum between maxGraceTime and contestationPeriod and this needs to be good enough value with respect to running network. This is a consequence required by the ledger when constructing transactions since we cannot convert arbitrary point in times to slots.

  • All parties need to aggree on contestation period before trying to run a Head protocol otherwise InitTx will be ignored.

· 2 min read

Status

Accepted

Context

  • We have been experimenting with quickcheck-dynamic for a while, leading to the implementation of basic Model-Based tests for the Hydra Head Protocol
  • These tests fill a gap in our testing strategy, between BehaviorSpec tests which test a "network" of nodes but only at the level of the off-chain Head logic, and EndToEndSpec tests which test a full blown network of nodes interconnected through real network connections and to a real cardano-node:
    • The former are fast but do not test the complete lifecycle of a Head. Furthermore, they are only unit tests so do not provide coverage into various corner cases that could arise in practice
    • The latter exercise the full lifecycle but are very slow and brittle
  • Because they run in io-sim, those Model-based tests are fast and robust as they don't depend on system interactions. Moreover, decoupling the System-under-Test from IO makes it easy to simulate an environment that deviates from the "happy path" such as delays from the network, filesystem errors, or even adversarial behaviour from the node, or the chain.

Decision

  • We will maintain and evolve the Model over time to cover more features
  • Key properties of the whole system should be written-down as proper DynamicLogic properties and thoroughly tested using quickcheck-dynamic. This includes but is not limited to:
    • Liveness of the Head
    • Consistency of the Head
    • Soundness of Chain
    • Completeness of Chain

Consequences

  • We need to ensure the Model covers the full lifecycle of a Hydra Head network which at the time of writing this ADR is not the case
  • There cannot be One Model to Rule Them All so we should refrain from defining different StateModel or different RunModel depending on what needs to be tested
  • In particular, testing against adversarial conditions will certainly require defining different instances of the Network or Chain components, for example:
    • An Accepted Adversary that fully the controls the protocol and the parties,
    • A Network Adversary that can delay and or drop messages,
    • A Faulty Filesystem that can causes exceptions when reading or writing files,
    • ...

· 2 min read

Status

Accepted

Context

  • ADR 18 merged both headState and chainState into one single state in the Hydra node, giving the chain layer a way to fetch and update the chainState when observing a chain event.
  • Having the headState containing the chainState made persistency easier to deal with: we ensure that we always save cohesive states.
  • When opening our first head on mainnet we suffered from a commit/rollback issue that was the result of a race condition in the management of the chainState as implemented in the context of ADR 18.
  • Reproducing the issue by introducing rollbacks in the model based tests, we discovered that, as a client of a hydra-node, we had no idea how to deal with the rollback event as it is defined now.
  • #185 plans to improve rollback management.

The following picture details the race condition through an exemple:

  1. The DirectChain component fetch some chainState 0 from the headState

  2. The DirectChain component observes a transaction and it

  • publishes an event about this observation
  • updates the headState with some chainState 1
  1. The Node processes the event and emits a new headState with a previousRecoverableState in case a rollback later happens

The problem is that HeadState 2 in the figure should point to a previous recoverable head state containing chainState 0 and not chainState 1.

race condition

Updating the chain state only in the HeadLogic leads to problems when several transactions are in the same block. This can be mitigated by keeping a volatile chain state locally while analysing the block. But then it leads to race conditions issues if, for some reason, blocks are produced faster than they are processed by the HeadLogic. Low probability in production but higher when testing.

Decision

  • We supersede ADR 18 with the current ADR.
  • A local chain state is re-introduced in the chain component, not shared with the head logic.
  • A copy of the chainState is kept in the headState to keep the benefits of ADR 18 regarding persistency.
  • The RolledBack output is removed from the API unless actionable by users or #185 implemented.

Consequences

  • The rollback logic is removed from the HeadLogic and only maintained in the chain component.
  • The Rollback event carries the ChainState.
  • At the node startup, we initialize the chain layer with the persisted chainState

· 3 min read
Arnaud Bailly

Status

Accepted

Context

  • The state of a Hydra Head is currently persisted as a whole upon each NewState outcome from the update function: The new state is serialised and the state file is overwritten with the corresponding bytes. While this is a straightforward strategy to implement, it has a huge impact on the performance of a Hydra Head as serialising a large data structure like the HeadState and completely overwriting a file is costly
    • We revisited our benchmarks and found that persistence was the major bottleneck when measuring roundtrip confirmation time,e g. the time it takes from a client's perspective to submit a transaction and observe in a ConfirmedSnapshot
  • Furthermore, the way we currently handle changes to the HeadState in the hydra-node, while conceptually being an Effect is handled differently from other Effects: The state is updated transactionally through a dedicated modifyHeadState function in the core loop of processing events, and then effects are processed.

Decision

Implement state persistence using Event Sourcing. Practically, this means:

  1. Replace the NewState outcome with a StateChanged event which can be part of the Outcome of HeadLogic's update function, representing the change to be applied to the current state.
  2. Add an aggregate function to manage applying StateChanged events on top of the current HeadState to keep it updated in-memory.
  3. Persist StateChangeds in an append-only log using a dedicated handle.
  4. Upon node startup, reread StateChanged events log and reapply those to reset the HeadState.

The following sequence diagram illustrates new event handling in the HeadLogic:

Consequences

  • 🐎 The main expected consequence of this change is an increase of the overall performance of Hydra Head network.

  • Need to pattern match twice on the HeadState, once in update and once in aggregate.

  • Terms from the specification are distributed over update and aggregate function. For example, the statements about updating all seen transactions would now be in aggregate and not anymore in update.

  • New possibilities this change introduces with respect to ServerOutput handling and client's access to a head's state:

    • Instead of having the HeadLogic emits directly a ClientEffect, the latter could be the result of a client-centric interpretation of a StateChanged.
    • Pushing this a little further, we could maintain a Query Model for clients with a dedicated Query API to ease implementation of stateless clients.
  • Calling StateChanged an event while treating it in the code alongside effects might introduce some confusion as we already use the word Event to designate the inputs (a.k.a. commands) to the Head logic state machine. We might want at some later point to unify the terminology.

· 5 min read

Status

Proposed

Context

  • ADR-3 concluded that a full-duplex communication channels are desirable to interact with a reactive system.

  • The Client API communicates several types of messages to clients. Currently this ranges from node-level PeerConnected, over head-specific HeadIsOpen to messages about transactions like TxValid. These messages are all of type StateChanged.

  • Current capabilities of the API:

    • Clients can retrieve the whole history of StateChanged messages or opt-out using a query parameter - all or nothing.

    • There is a welcome message called Greetings which is always sent, that contains the last headStatus.

    • There exists a GetUTxO query-like ClientInput, which will respond with a GetUTxOResponse containing the confirmed UTxO set in an open head, or (!) the currently committed UTxO set when the head is initializing.

    • While overall json encoded, clients can choose choose between json or binary (cbor) output of transaction fields in several of these using a query parameter.

  • Many of these features have been added in a "quick and dirty" way, by monkey patching the encoded JSON.

  • The current capabalities even do not satisfy all user needs:

    • Need to wade through lots of events to know the latest state (except the very basic headStatus from the Greetings).

    • Need to poll GetUTxO or aggregate confirmed transactions on client side to know the latest UTxO set for constructing transactions.

    • Inclusion of the whole UTxO set in the head is not always desirable and filtering by address would be beneficial. (not addressed in this ADR though, relevant discussion #797)

    • As ADR-15 also proposes, some clients may not need (or should not have) access to administrative information.

  • It is often a good idea to separate the responsibilities of Commands and Queries (CQRS), as well as the model they use.

Decision

  • Drop GetUTxO and GetUTxOResponse messages as they advocate a request/response way of querying.

  • Realize that ClientInput data is actually a ClientCommand (renaming them) and that ServerOutput are just projections of the internal event stream (see ADR-24) into read models on the API layer.

  • Compose a versioned (/v1) API out of resource models, which compartmentalize the domain into topics on the API layer.

    • A resource has a model type and the latest value is the result of a pure projection folded over the StateChanged event stream, i.e. project :: model -> StateChanged -> model.

    • Each resource is available at some HTTP path, also called "endpoint":

      • GET requests must respond with the latest state in a single response.

      • GET requests with Upgrade: websocket headers must start a websocket connection, push the latest state as first message and any resource state updates after.

      • Other HTTP verbs may be accepted by a resource handler, i.e. to issue resource-specific commands. Any commands accepted must also be available via the corresponding websocket connection.

    • Accept request headers can be used to configure the Content-Type of the response

      • All resources must provide application/json responses

      • Some resources might support more content types (e.g. CBOR-encoded binary)

    • Query parameters may be used to further configure responses of some resources. For example, ?address=<bech32> could be used to filter UTxO by some address.

  • Keep the semantics of /, which accepts websocket upgrade connections and sends direct/raw output of ServerOutput events on /, while accepting all ClientCommand messages.

    • Define ServerOutput also in terms of the StateChanged event stream

Example resources

Example resource paths + HTTP verbs mapped to existing things to demonstrate the effects of the decision points above. The mappings may change and are to be documented by an API specification instead.

PathGETPOSTPATCHDELETE
/v1/head/statusHeadStatus(..)---
/v1/head/snapshot/utxolast confirmed snapshot utxo---
/v1/head/snapshot/transactionsconfirmed snapshot txsNewTx + responses--
/v1/head/ledger/utxolocalUTxO---
/v1/head/ledger/transactionslocalTxsNewTx + responses--
/v1/head/commit-Chain{draftCommitTx}--
/v1/headall /v1/head/* dataInitCloseFanout / Abort
/v1/protocol-parameterscurrent protocol parameters
/v1/cardano-transaction-Chain{submitTx}--
/v1/peersa list of peers---
/v1/node-versionnode version as in Greetings---
/v1/all /v1/* data---

Multiple heads are out of scope now and hence paths are not including a <headId> variable section.

Consequences

  • Clear separation of what types are used for querying and gets subscribed to by clients and we have dedicated types for sending data to clients

  • Changes on the querying side of the API are separated from the business logic.

  • Clients do not need to aggregate data that is already available on the server side without coupling the API to internal state representation.

  • Separation of Head operation and Head usage, e.g. some HTTP endpoints can be operated with authentication.

  • Clients have a fine-grained control over what to subscribe to and what to query.

  • Versioned API allows clients to detect incompatibility easily.

  • Need to rewrite how the hydra-tui is implemented.

· 4 min read
Sebastian Nagel

Status

Proposed

Context

  • ADR 18 merged both headState and chainState into one single state in the Hydra node, giving the chain layer a way to fetch and update the chainState when observing a chain event.

  • ADR 23 outlined the need for a local chain state in the chain layer again to correctly handle correct observation of multiple relevant transactions and the resulting chainState updates.

  • The ChainStateType tx for our "actual" Cardano chain layer is currently:

    data ChainStateAt = ChainStateAt
    { chainState :: ChainState
    , recordedAt :: Maybe ChainPoint
    }

    data ChainState
    = Idle
    | Initial InitialState
    | Open OpenState
    | Closed ClosedState

    where InitialState, OpenState and ClosedState hold elaborate information about the currently tracked Hydra head.

  • We face difficulties to provide sufficient user feedback when an initTx was observed but (for example) keys do not match our expectation.

    • Core problem is, that observeInit is required to take a decision whether it wants to "adopt" the Head by returning an InitialState or not.
    • This makes it impossible to provide user feedback through the HeadLogic and API layers.
  • We want to build a Hydra head explorer, which should be able to keep track and discover Hydra heads and their state changes even when the heads were initialized before starting the explorer.

Decision

  • We supersede ADR 18 with the current ADR.

Changes internal to Direct chain layer

  • Introduce a ResolvedTx type that has its inputs resolved. Where a normal Tx will only contain TxIn information of its inputs, a ResolvedTx also includes the TxOut for each input.

  • Change ChainSyncHandler signature to onRollForward :: BlockHeader -> [ResolvedTx] -> m ()

  • Change observing function signature to observeSomeTx :: ChainContext -> ResolvedTx -> Maybe (OnChainTx Tx). Notably there is no ChainState involved.

  • Do not guard observation by HeadId in the chain layer and instead do it in the HeadLogic layer.

  • Define a SpendableUTxO type that is a UTxO with potentially needed datums included.

    • TBD: instead we could decide to use inline datums and rely on UTxO containing them
  • Change transaction creation functions initialize, commit, abort, collect, close, contest and fanout in Hydra.Direct.Chain.State to take SpendableUTxO and HeadId/HeadParameters as needed.

  • Extend IsChainState type class to enforce that it can be updated by concurrent transactions update :: ChainStateType tx -> [tx] -> ChainStateType tx.

    • While this is not strictly needed "outside" of the chain layer, it will have us not fall into the same pit again.
  • Change ChainStateAt to only hold a spendableUTxO and the recordedAt.

  • Update the LocalChainState in onRollForward by using update and pushing a new ChainStateAt generically.

TBD:

  • Impact on generators

Chain interface changes

  • Add HeadId and HeadParameters to PostChainTx.

  • Add HeadId to all OnChainTx constructors.

  • Extend OnInitTx with observed chain participants.

    • TBD: How are cardano verification keys generically represented in HeadLogic?
  • Extend OnContestTx with new deadline and a list of contesters.

  • Move off-chain checks for what makes a "proper head" to HeadLogic

TBD:

  • Merge HeadSeed and HeadId? How to abstract?

Consequences

  • All logic is kept in the logic layer and no protocol decisions (i.e. whether to adopt or ignore a head initialization) are taken in the chain layer.

    • The HeadLogic gets informed of any proper initTx and can log that it is ignored and for what reason.
  • The transaction observation and construction functions can be moved into a dedicated package that is cardano-specific, but not requires special state knowledge of the "direct chain following" and can be re-used as a library.

  • All transaction observation functions used by observeSomeTx will need to be able to identify a Hydra Head transaction from only the ResolvedTx and the ChainContext

  • Any Chain Tx implementation wanting to re-use existing transaction observation functions must be able to resolve transaction inputs (against some ledger state) and produce ResolvedTx.

    • A chain-following implementation (as Hydra.Chain.Direct) can keep previous transactions around.
    • A chain indexer on "interesting" protocol addresses can be used to efficiently query most inputs.
  • We can get rid of the Hydra.Chain.Direct.State glue code altogether.

  • While this does not directly supersede ADR23, it paves the way to remove LocalChainState again as the ChainStateAt is now combinable from multiple transactions (see update above) and we can keep the state (again) only in the HeadState aggregate. Note that this would shift the rollback handling back into the logic layer.

· 5 min read
Arnaud Bailly
Pascal Grange

Status

Accepted

Context

The current Head cluster is very fragile as has been observed on several occasions: A single hiccup in the connectivity between nodes while a head is open and nodes are exchanging messages can very easily lead to the Head being stuck and require an emergency closing, possibly even manually.

We want Hydra to be Consistent in the presence of Network Partitions, under the fail-recovery model assumption, eg. processes may fail by stopping and later recovering. Our system lies in the CP space of the landscape mapped by the CAP theorem.

We have identified 3 main sources of failures in the fail-recovery model that can lead to a head being stuck:

  1. The network layer can drop messages from the moment a node broadcasts it, leading to some messages not being received at the other end
  2. The sending node can crash in between the moment the state is changed (and persisted) and the moment a message is actually sent through the network (or even when it calls broadcast)
  3. The receiving node can crash in between the moment the message has been received in the network layer, and it's processed (goes through the queue)

We agree that we'll want to address all those issues in order to provide a good user experience, as not addressing 2. and 3. can lead to hard to troubleshoot issues with heads. We have not experienced those issues yet as they would probably only crop up under heavy loads, or in the wild. But we also agree we want to tackle 1. first because it's where most of the risk lies. By providing a Reliable Broadcast layer, we will significantly reduce the risks and can then later on address the other points.

Therefore, the scope of this ADR is to address only point 1. above: Ensure broadcast messages are eventually received by all peers, given the sender does not stop before.

Discussion

  • We are currently using the ouroboros-framework and typed-protocols network stack as a mere transport layer.

    • Being built on top of TCP, ouroboros multiplexer (Mux) provides the same reliability guarantees, plus the multiplexing capabilities of course
    • It also takes care of reconnecting to peers when a failure is detected which relieves us from doing so, but any reconnection implies a reset of each peer's state machine which means we need to make sure any change to the state of pending/received messages is handled by the applicative layer
    • Our FireForget protocol ignores connections/disconnections
    • Ouroboros/typed-protocols provides enough machinery to implement a reliable broadcast protocol, for example by reusing existing [KeepAlive](https://github.com/input-output-hk/ouroboros-network/tree/master/ouroboros-network-protocols/src/Ouroboros/Network/Protocol/KeepAlive) protocol and building a more robust point-to-point protocol than what we have now
    • There is a minor limitation, namely that the subscription mechanism does not handle connections invidually, but as a set of equivalent point-to-point full duplex connections whose size (valency) needs to be maintained at a certain threshold, which means that unless backed in the protocol itself, protocol state-machine and applications are not aware of the identity of the remote peer
  • We have built our Network infrastructure over the concept of relatively independent layers, each implementing a similar interface with different kind of messages, to broadcast messages to all peers and be notified of incoming messages through a callback.

    • This pipes-like abstraction allows us to compose our network stack like:

       withAuthentication (contramap Authentication tracer) signingKey otherParties $
      withHeartbeat nodeId connectionMessages $
      withOuroborosNetwork (contramap Network tracer) localhost peers
    • This has the nice property that we can basically swap the lower layers should we need to, for example to use UDP, or add other layers for example to address specific head instances in presence of multiple heads

Decision

  • We implement our own message tracking and resending logic as a standalone Network layer
  • That layer consumes and produces Authenticated msg messages as it relies on identifying the source of messages
  • It uses a vector of monotonically increasing sequence numbers associated with each party (including itself) to track what are the last messages from each party and to ensure FIFO delivery of messages
    • This vector is used to identify peers which are lagging behind, resend the missing messages, or to drop messages which have already been received
    • The Heartbeat mechanism is relied upon to ensure dissemination of state even when the node is quiescent
  • We do not implement a pull-based message communication mechanism as initially envisioned
  • We do not persist messages either on the receiving or sending side at this time

Consequences

  • We keep our existing Network interface hence all messages will be resent to all peers
    • This could be later optimized either by providing a smarter interface with a send :: Peer -> msg -> m () unicast function, or by adding a layer with filtering capabilities, or both
  • We want to specify this protocol clearly in order to ease implementation in other languages, detailing the structure of messages and the semantics of retries and timeouts.
  • We may consider relying on the vector clock in the future to ensure perfect ordering of messages on each peer and make impossible for legit transactions to be temporarily seen as invalid. This can happen in the current version and is handled through wait and TTL

· 3 min read
Elaine Cardenas

Status

Accepted

Context

Currently, the Hydra node requires a Layer 1 Cardano node running in order to operate; The L1 node is needed to submit and watch for L1 transactions. Generally speaking, the transactions watched are for learning the state of the Hydra node, as reflected by the L1 chain. The transactions submitted are to transition between states (e.g. after submitting a Commit tx to the L1, a node watches to see when all other nodes have also Committed.)

There are applications for the Hydra node where interaction with an L1 chain is unnecessary. Offline mode will be a key component of the Gummiworm protocol, a Layer 2 protocol being built by Sundae Labs, which enables actors other than Hydra head participants to validate transactions that occur in the head.

The Hydra node offline mode would remove the dependency on the L1 Cardano node, for applications like Gummiworm where it is unneeded. It would also remove the dependency on the L1 Cardano node for peer-to-peer Hydra node communication. This would be useful for other Layer 2s that build on top of Hydra instead of duplicating its efforts, and for anyone who wants to easily validate a set of Cardano transactions.

Decision

Hydra node will be executable in offline mode, as an alternative to the default online mode. When online, the Hydra node depends on querying a Cardano node for Era History information and Genesis parameters. When offline this is not necessary, because the Hydra node will not connect to any Layer 1 .

The initial state of the head will be specified in a flag, which makes any Commit redundant. The flag will specify a file for the starting Layer 2 UTXO. The Hydra node can be configured to write the current UTXO into a file, including the starting UTXO file.

A node running in offline mode will not be able to switch between offline and online modes once started, as it is an unlikely use-case that would likely add more complexity.

Commit endpoint will return 400 instead of building a transaction, in offline mode.

Support for peer Hydra nodes in offline mode is considered out of scope, as it doesn't seem immediately useful. A node running in offline mode will not be configurable with any peer nodes, nor will it make a network connection to any peer nodes.

Consequences

The Hydra node would be usable offline, for transaction validation, and other custom L2 applications. The lifecycle & state machine associated with a Hydra would remain unchanged in both online, and offline mode.

The Hydra node can be deployed and run without an accompanying Cardano node, simplifying deployment and testing.

· 4 min read
Elaine Cardenas
Pi Lanningham
Sebastian Nagel

Status

Accepted

Context

  • The Hydra node represents a significant engineering asset, providing layer 1 monitoring, peer to peer consensus, durable persistence, and an isomorphic Cardano ledger. Because of this, it is being eyed as a key building block not just in Hydra based applications, but other protocols as well.

  • Currently the hydra-node uses a very basic persistence mechanism for it's internal HeadState, that is saving StateChanged events to file on disk and reading them back to load and re-aggregate the HeadState upon startup.

    • Some production setups would benefit from storing these events to a service like Amazon Kinesis data stream instead of local files.
  • The hydra-node websocket-based API is the only available event stream right now and might not fit all purposes.

    • See also ADR 3 and 25
    • Internally, this is realized as a single Server handle which can sendOutput :: ServerOutput tx -> m ()
    • These ServerOutputs closely relate to StateChanged events and ClientEffects are yielded by the logic layer often together with the StateChanged. For example:
    onInitialChainAbortTx newChainState committed headId =
    StateChanged HeadAborted{chainState = newChainState}
    <> Effects [ClientEffect $ ServerOutput.HeadIsAborted{headId, utxo = fold committed}]
  • Users of hydra-node are interested to add alternative implementations for storing, loading and consuming events of the Hydra protocol.

Decision

  • We create two new interfaces in the hydra-node architecture:

    • data EventSource e m = EventSource { getEvents :: m [e] }
    • data EventSink e m = EventSink { putEvent :: e -> m () }
  • We realize our current PersistenceIncremental used for persisting StateChanged events is both an EventSource and an EventSink

  • We drop the persistence from the main handle HydraNode tx m, add one EventSource and allow many EventSinks

data HydraNode tx m = HydraNode
{ -- ...
, eventSource :: EventSource (StateEvent tx) m
, eventSinks :: [EventSink (StateEvent tx) m]
}
  • The hydra-node will load events and hydrate its HeadState using getEvents of the single eventSource.

  • The stepHydraNode main loop does call putEvent on all eventSinks in sequence. Any failure will make the hydra-node process terminate and require a restart.

  • When loading events from eventSource on hydra-node startup, it will also re-submit events via putEvent to all eventSinks.

  • The default hydra-node main loop does use the file-based EventSource and a single file-based EventSink (using the same file).

  • We realize that the EventSource and EventSink handles, as well as their aggregation in HydraNode are used as an API by forks of the hydra-node and try to minimize changes to it.

Consequences

  • The default operation of the hyda-node remains unchanged.

  • There are other things called Event and EventQueue(putEvent) right now in the hydra-node. This is getting confusing and when we implement this, we should also rename several things first (tidying).

  • Interface first: Implementations of EventSink should specify their format in a non-ambiguous and versioned way, especially when a corresponding EventSource exists.

  • The API Server can be modelled and refactored as an EventSink.

  • Projects forking the hydra node have dedicated extension points for producing and consuming events.

  • Sundae Labs can build a "Save transaction batches to S3" proof of concept EventSink.

  • Sundae Labs can build a "Scrolls source" EventSink.

  • Sundae Labs can build a "Amazon Kinesis" EventSource and EventSink.

Out of scope / future work

  • Available implementations for EventSource and EventSink could be

    • configured upon hydra-node startup using for example URIs: --event-source file://state or --event-sink s3://some-bucket
    • dynamically loaded as plugins without having to fork hydra-node.
  • The Network and Chain parts qualify as EventSinks as well or shall those be triggered by Effects still?

· 3 min read
Arnaud Bailly

Status

Accepted

Context

  • The Hydra.Ledger.Cardano module provides ToJSON/FromJSON instances for Tx and AlonzoTx
    • We have specified this format as part of Hydra API
  • These instances appear in a few places as part of Hydra API:
    • In the ServerOutput sent by the node to clients
    • In the HydraNodeLog as part of Hydra's logging output
    • In the StateChanged events which are persisted and allow hydra-node to restart gracefully after stopping
  • In other places the hydra-node produces, expects, or accepts a CBOR-encoded transaction:
  • Note that in the latter 2 cases, the hydra-node accepts a hex-CBOR-encoded JSON string to represent a transaction and this particular case is handled directly in the FromJSON instance for transactions where 3 different representations are even accepted:
    • JSON object detailing the transaction
    • A JSON string representing CBOR-encoding of a transaction
    • Or a TextEnvelope which wraps the CBOR transaction in a simple JSON object
  • Using JSON-based representation of Cardano transactions is problematic because:
    • The representation we are providing is not canonical nor widely used, and therefore require maintenance when the underlying cardano-ledger API changes
    • More importantly the JSON representation contains a txId field which is computed from the CBOR encoding of the transaction. When this encoding changes, the transaction id changes even though no other part of the transaction has changed. This implies that we could send and receive transactions with incorrect or inconsistent identifiers.
  • This is true for any content-addressable piece of data, eg. any piece of data whose unique identifier is derived from the data itself, but not of say UTxO which is just data.

Decision

  • Drop support of "structured" JSON encoding of transactions in log messages, external APIs, and local storage of a node state
  • Require JSON encoding for transactions that consists in:
    • A cborHex string field containing the base16 CBOR-encoded transaction
    • An optional txId string field containing the Cardano transaction id, i.e. the base16 encoded Blake2b256 hash of the transaction body bytes
    • When present, the txId MUST be consistent with the cborHex. This will be guaranteed for data produced by Hydra, but input data (eg. through a NewTx message) that does not respect this constraint will be rejected

Consequences

  • This is a breaking change and client applications must decode the full transaction CBOR before accessing any part of it
    • Hydra clients like hydraw, hydra-auction, hydra-pay, hydra-poll and hydra-chess` need to be updated
  • By providing a txId field alongside the CBOR encoding, we still allow clients to observe the lifecycle of a transaction inside a Head as it gets validated and confirmed without requiring from them to be able to decode the CBOR body and compute the txId themselves
    • This is particularly important for monitoring which usually does not care about the details of transactions
  • We should point users to existing tools for decoding transactions' content in a human-readable format as this can be useful for troubleshooting:
    • cardano-cli transaction view --tx-file <path to tx envelope file> is one example
  • We need to version the data that's persisted and exchanged, e.g the Head state and network messages, in order to ensure nodes can either gracefully migrate stored data or detect explicitly versions inconsistency
  • We should use the cardanonical schemas should the need arise to represent transaction in JSON again