Skip to main content

3 posts tagged with "Proposed"

View All Tags

· 4 min read

Status

Proposed

Context

  • In the desire to make Hydra transactions smaller and cheaper (at the time of writing any abort tx was too big), we want to use the reference script and reference input features of the upcoming Babbage ledger era. See the babbage ledger spec, CIP-31 and CIP-33 for details.

  • With these features we do not need to (re-)include scripts in each transaction.

  • The CIPs do not specify how reference scripts are to be managed and we can see at least two options:

    1. Add them as outputs to the init transaction or prior that as part of each Hydra Head instance
    2. Post them out-of-band, separate to individual Head instances
  • Ownership of the outputs holding the scripts is to be considered. If these "reference outputs" are spent, they cannot be referred to anymore. This would mean all heads referring to them can be denied of service (DoS).

  • Each head will need to refer to the correct version of the hydra scripts. That is, consistent with the script hashes known to the hydra-node.

    • This is also related to the problem of managing script versions & updates.
    • Right now, the hydra-node is compiled against hydra-plutus to access compiled script content and hashes.
  • The general trade-off is: instead of paying ADA fees for scripts adding to the transaction size in each transaction, ADA deposits will need to be put down to have scripts be part of the UTxO set in the ledger once.

Decision

  • Publish outputs holding Hydra scripts out-of-band (option 2), because

    • All scripts would not fit into the init transaction directly, we would need to post multiple.
    • Costs (deposits) would need to be payed for each head instance.
  • The scripts are stored at outputs addressed to some unspendable v_publish validator.

    • This is to avoid DoS risk and unnecessariy centralization
    • We have considered "garbage collection" by allowing spending these outputs into re-publishing new versions of the script.
      • This would make things even more complicated and we decided to not bother about "littering the chain" right now.
  • We will publish scripts on release of the hydra-node, or more specifically of the hydra-plutus package.

Consequences

  • We need a process and/or tool to publish hydra-plutus scripts and need to pay the deposits.

    • Any other party could do the same, this does not lead to centralization.
  • The hydra-node would be need to know the TxIns of the "right" published scripts.

    • In the simplest case we would just make this configurable and provide configurations for the various networks after publishing scripts.
  • If we combine the v_publish validator with a "tag", this allows nodes to "discover" scripts of a known version

    • For example, we could define HydraHeadV1, HydraInitialV1 and HydraCommitV1 as such tags
    • We could parameterize the validator by the tag, yielding unique addresses per tag.
    • Alternatively, the "tag" could be stored in a canonical form as datum on the script outputs.
    • In any case, this allows for some checking consistency or easier configuration (not needing to enumerate which TxIn is which script)
  • By also knowing the script hashes the hydra-node can verify the integrity of "found" reference scripts

    • This would be possible right now, as they are compiled into the node
    • Might be undesirable later for easier system configuration
  • By making v_publish unspendable, we "litter" the chain. However, any garbage collection scheme would mean potential to DoS again.

  • Extended diagram for the abort on-chain life-cycles of a Hydra Head to include reference scripts.

· 5 min read
Arnaud Bailly
Pascal Grange

Status

Draft

Context

The current Head cluster is very fragile as has been observed on several occasions: A single hiccup in the connectivity between nodes while a head is open and nodes are exchanging messages can very easily lead to the Head being stuck and require an emergency closing, possibly even manually.

We want Hydra to be Consistent in the presence of Network Partitions, under the fail-recovery model assumption, eg. processes may fail by stopping and later recovering. Our system lies in the CP space of the landscape mapped by the CAP theorem.

We have identified 3 main sources of failures in the fail-recovery model that can lead to a head being stuck:

  1. The network layer can drop messages from the moment a node broadcasts it, leading to some messages not being received at the other end
  2. The sending node can crash in between the moment the state is changed (and persisted) and the moment a message is actually sent through the network (or even when it calls broadcast)
  3. The receiving node can crash in between the moment the message has been received in the network layer, and it's processed (goes through the queue)

We agree that we'll want to address all those issues in order to provide a good user experience, as not addressing 2. and 3. can lead to hard to troubleshoot issues with heads. We have not experienced those issues yet as they would probably only crop up under heavy loads, or in the wild. But we also agree we want to tackle 1. first because it's where most of the risk lies. By providing a Reliable Broadcast layer, we will significantly reduce the risks and can then later on address the other points.

Therefore, the scope of this ADR is to address only point 1. above: Ensure broadcast messages are eventually received by all peers, given the sender does not stop before.

Discussion

  • We are currently using the ouroboros-framework and typed-protocols network stack as a mere transport layer.

    • Being built on top of TCP, ouroboros multiplexer (Mux) provides the same reliability guarantees, plus the multiplexing capabilities of course
    • It also takes care of reconnecting to peers when a failure is detected which relieves us from doing so, but any reconnection implies a reset of each peer's state machine which means we need to make sure any change to the state of pending/received messages is handled by the applicative layer
    • Our FireForget protocol ignores connections/disconnections
    • Ouroboros/typed-protocols provides enough machinery to implement a reliable broadcast protocol, for example by reusing existing [KeepAlive](https://github.com/input-output-hk/ouroboros-network/tree/master/ouroboros-network-protocols/src/Ouroboros/Network/Protocol/KeepAlive) protocol and building a more robust point-to-point protocol than what we have now
    • There is a minor limitation, namely that the subscription mechanism does not handle connections invidually, but as a set of equivalent point-to-point full duplex connections whose size (valency) needs to be maintained at a certain threshold, which means that unless backed in the protocol itself, protocol state-machine and applications are not aware of the identity of the remote peer
  • We have built our Network infrastructure over the concept of relatively independent layers, each implementing a similar interface with different kind of messages, to broadcast messages to all peers and be notified of incoming messages through a callback.

    • This pipes-like abstraction allows us to compose our network stack like:

       withAuthentication (contramap Authentication tracer) signingKey otherParties $
      withHeartbeat nodeId connectionMessages $
      withOuroborosNetwork (contramap Network tracer) localhost peers
    • This has the nice property that we can basically swap the lower layers should we need to, for example to use UDP, or add other layers for example to address specific head instances in presence of multiple heads

Decision

  • We implement our own message tracking and resending logic as a standalone Network layer
  • That layer consumes and produces Authenticated msg messages as it relies on identifying the source of messages
  • It uses a vector of monotonically increasing sequence numbers associated with each party (including itself) to track what are the last messages from each party and to ensure FIFO delivery of messages
    • This vector is used to identify peers which are lagging behind, resend the missing messages, or to drop messages which have already been received
    • The Heartbeat mechanism is relied upon to ensure dissemination of state even when the node is quiescent
  • We do not implement a pull-based message communication mechanism as initially envisioned
  • We do not persist messages either on the receiving or sending side at this time

Consequences

  • We keep our existing Network interface hence all messages will be resent to all peers
    • This could be later optimized either by providing a smarter interface with a send :: Peer -> msg -> m () unicast function, or by adding a layer with filtering capabilities, or both
  • We want to specify this protocol clearly in order to ease implementation in other languages, detailing the structure of messages and the semantics of retries and timeouts.
  • We may consider relying on the vector clock in the future to ensure perfect ordering of messages on each peer and make impossible for legit transactions to be temporarily seen as invalid. This can happen in the current version and is handled through wait and TTL

· 3 min read
Arnaud Bailly

Status

Proposed

Context

  • The Hydra.Ledger.Cardano module provides ToJSON/FromJSON instances for Tx and AlonzoTx
    • We have specified this format as part of Hydra API
  • These instances appear in a few places as part of Hydra API:
    • In the ServerOutput sent by the node to clients
    • In the HydraNodeLog as part of Hydra's logging output
    • In the StateChanged events which are persisted and allow hydra-node to restart gracefully after stopping
  • In other places the hydra-node produces, expects, or accepts a CBOR-encoded transaction:
  • Note that in the latter 2 cases, the hydra-node accepts a hex-CBOR-encoded JSON string to represent a transaction and this particular case is handled directly in the FromJSON instance for transactions where 3 different representations are even accepted:
    • JSON object detailing the transaction
    • A JSON string representing CBOR-encoding of a transaction
    • Or a TextEnvelope which wraps the CBOR transaction in a simple JSON object
  • Using JSON-based representation of Cardano transactions is problematic because:
    • The representation we are providing is not canonical nor widely used, and therefore require maintenance when the underlying cardano-ledger API changes
    • More importantly the JSON representation contains a txId field which is computed from the CBOR encoding of the transaction. When this encoding changes, the transaction id changes even though no other part of the transaction has changed. This implies that we could send and receive transactions with incorrect or inconsistent identifiers.
  • This is true for any content-addressable piece of data, eg. any piece of data whose unique identifier is derived from the data itself, but not of say UTxO which is just data.

Decision

  • Drop support of "structured" JSON encoding of transactions in log messages, external APIs, and local storage of a node state
  • Require JSON encoding for transactions that consists in:
    • A cborHex string field containing the base16 CBOR-encoded transaction
    • An optional txId string field containing the Cardano transaction id, i.e. the base16 encoded Blake2b256 hash of the transaction body bytes
    • When present, the txId MUST be consistent with the cborHex. This will be guaranteed for data produced by Hydra, but input data (eg. through a NewTx message) that does not respect this constraint will be rejected

Consequences

  • This is a breaking change and client applications must decode the full transaction CBOR before accessing any part of it
    • Hydra clients like hydraw, hydra-auction, hydra-pay, hydra-poll and hydra-chess` need to be updated
  • By providing a txId field alongside the CBOR encoding, we still allow clients to observe the lifecycle of a transaction inside a Head as it gets validated and confirmed without requiring from them to be able to decode the CBOR body and compute the txId themselves
    • This is particularly important for monitoring which usually does not care about the details of transactions
  • We should point users to existing tools for decoding transactions' content in a human-readable format as this can be useful for troubleshooting:
    • cardano-cli transaction view --tx-file <path to tx envelope file> is one example
  • We need to version the data that's persisted and exchanged, e.g the Head state and network messages, in order to ensure nodes can either gracefully migrate stored data or detect explicitly versions inconsistency
  • We should use the cardanonical schemas should the need arise to represent transaction in JSON again