On Trust Minimization and Horizontal Scaling
January 9th, 2024
This article is also published on Mirror.
Ethereum is a permissionless world computer that possesses (arguably) the highest amount of economic security at the time of writing, acting as the settlement ledger for a vast number of assets, applications, and services. Ethereum does have its limitations – blockspace is a scarce and expensive resource on Ethereum layer one (L1). Layer two (L2) scaling has been seen as the solution to this problem, with numerous projects coming to market in recent years, mostly in the form of rollups. However, rollups, in the strict sense of the term (meaning rollup data is on Ethereum L1), does not allow Ethereum to scale indefinitely, only allowing up to few thousands of transactions per second.
Trust-minimized – (a feature of) an L2 system is trust-minimized if it functions without requiring trust external to the base L1.
Horizontal scaling – a system is horizontally scalable if instances can be added without imposing global bottlenecks.
In this article, we argue that trust-minimized and horizontally scalable systems are the most promising way of scaling blockchain applications, yet they are currently under-explored. We present the argument by exploring three questions:
- Why should applications be trust-minimized?
- Why build systems that are horizontally scalable?
- How can we maximize both trust minimization and horizontally scalability?
(Disclaimer: although we will focus on Ethereum as the base L1 in this article, most of what we discuss here applies to decentralized settlement layers beyond Ethereum.)
Why should applications be trust-minimized?
Applications can be connected to Ethereum in a trusted manner – they can write to and read from the Ethereum blockchain but trust is placed on the operators to execute business logic correctly. Centralized exchanges like Binance and Coinbase are great examples of trusted applications. Being connected to Ethereum means that applications can tap into a global settlement network with a diverse set of assets.
There are significant risks associated with trusted off-chain services. The collapse of major exchanges and services in 2022, such as FTX and Celsius, is a great cautionary tale of what happens when trusted services misbehave and fail.
On the other hand, trust-minimized applications can write to and read from Ethereum verifiably. Examples include smart contract applications such as Uniswap, rollups such as Arbitrum or zkSync, and coprocessors such as Lagrange and Axiom. Broadly speaking, trust is removed as applications become secured by the Ethereum network, as more functionalities (see below) get outsourced to L1. As a result, trust-minimized financial services can be offered without counterparty or custodian risks.
There are three key properties that applications and services can have, which can be outsourced to L1:
- Liveness (and ordering): user-submitted transactions should be included (executed and settled) in a timely manner.
- Validity: transactions are processed according to prespecified rules.
- Data (and state) availability: historical data, as well as current application state, is made accessible to the user.
For each of the above properties, we can think of what is the trust assumption required; in particular, does Eth L1 provide the property or is external trust required. The table below categorizes this for different architecture paradigms.
|L1 / External
|Data availability (DA)
Why build systems that are horizontally scalable?
Horizontal scaling refers to scaling via the addition of independent or parallel instances of a system, e.g. application or rollup. This requires no global bottleneck to be present. Horizontal scaling enables and facilitates exponential growth.
Vertical scaling refers to scaling via the increase of throughput of a monolithic system, such as Eth L1 or a data availability layer. When horizontal scaling runs into bottlenecks on such a shared resource, vertical scaling is often required.
Claim 1: (Transaction-data) rollups cannot horizontally scale because they can be bottlenecked by data availability (DA). Vertically scaling DA solutions require making compromises on decentralization.
Data availability (DA) remains the bottleneck for rollups. Currently, each L1 block has a maximum size target of ~1 MB (85 KB/s). With EIP-4844, there will be an additional ~2 MB (171 KB/s) made available (in the long-term). With Danksharding, Eth L1 may eventually support up to 1.3 MB/s of DA bandwidth. Eth L1 DA is a shared resource that many applications & services compete for. Therefore, although using L1 for DA provides the best security, it bottlenecks the potential scalability of such systems. Systems that utilize L1 for DA will (typically) not be able to horizontally scale and have diseconomies of scale. Alternative DA layers, such as Celestia or EigenDA, also have bandwidth limits (although larger, at 6.67 MB/s and 15 MB/s, respectively). But it comes at the expense of shifting the trust assumption from Ethereum to another (often less decentralized) network, compromising on (economic) security.
Claim 2: The only way to horizontally scale trust-minimized services is to obtain (close to) zero marginal L1 data per transaction. The two known approaches are state-diff rollups (SDR) and validiums.
State-diff rollups (SDRs) are rollups that post state differences across an aggregated batch of transactions to Ethereum L1. For the EVM, as transaction batches grow larger, the per transaction data posted to L1 diminishes to a constant that is much smaller than that of transaction-data rollups.
For example, during the stress-test event of high in-flood of inscriptions, zkSync saw a reduction of calldata per transaction down to as low as 10 bytes per transaction. In contrast, transaction-data rollups like Arbitrum, Optimism, and Polygon zkEVM, typically see around 100 bytes per transaction for normal traffic.
A validium is a system that posts validity proofs of state transitions to Ethereum, without associated transaction data or state. Validiums are highly horizontally scalable, even under low traffic conditions. This is especially true as the settlement of different validiums can be aggregated.
Besides horizontal scalability, a validium can also provide onchain privacy (from public observers). A validium with private DA has centralized and gated data and state availability, meaning that users have to authenticate themselves before accessing data and that the operator can enforce good privacy measures. This enables a level of user experience similar to traditional web or financial services – user activities are hidden from public scrutiny but there is a trusted custodian of user data, in this case the validium operator.
What about centralized vs decentralized sequencers? To keep systems horizontally scalable, it is crucial to instantiate independent sequencers, either centralized or decentralized. Notably, although systems using shared sequencers enjoy atomic composability, they cannot horizontally scale, as the sequencer can become a bottleneck as more systems are added.
What about interoperability? Horizontally scalable systems can interoperate without additional trust if they all settle to the same L1, as messages can be sent from one system to another via the shared settlement layer. There is a tradeoff between operating cost and messaging delay (which can potentially be solved at the application layer).
Trust-minimization for horizontally scalable systems
Can we further minimize trust requirements for liveness, ordering, and data availability in horizontally scalable systems?
It is of note that, at the cost of horizontal scalability, we know how to salvage trustless liveness and data availability. For example, L2 transactions can be initiated from the L1 for guaranteed inclusion. Volition can offer opt-in L1 state availability for users.
Another solution is to simply decentralize (but not rely on the L1). Instead of a single sequencer, systems could become more decentralized by utilizing decentralized sequencers (such as Espresso Systems or Astria), therefore minimizing the trust required for liveness, ordering, and data availability. Doing so places limitations compared to single-operator solutions: (1) performance may be bounded by the performance of the distributed system, and (2) for validiums with private DA, the default privacy guarantee is lost if the decentralized sequencer network is permissionless.
How much trust can we additionally minimize for single-operator validiums or SDRs? There are a couple of open directions here.
Open direction 1: Trust-minimized data availability in validiums. Plasma solves the state availability problem to a certain extent–It solves the problem either for withdrawals only for certain state models (which includes the UTXO state model), or requires users to be online periodically (Plasma Free).
Open direction 2: Accountable pre-confirmations in SDRs and validiums. The goal here is to provide users with fast pre-confirmation of transaction inclusion from a sequencer, and the confirmation should allow the user to challenge and slash the economic stake of the sequencer if the inclusion promise is not fulfilled. The challenge here is that proving non-inclusion (necessary for slashing) likely requires additional data for the user, which a sequencer can simply withhold. Therefore, it is reasonable to assume that we at least require the SDR or validium to employ a (potentially permissioned) data availability committee for its full calldata or transaction history, which enables the same committee to provide proof of non-inclusion (of pre-confirmed transactions) upon a user request.
Open direction 3: Fast recovery from liveness failures. Single-operator systems can suffer from liveness failures (e.g. Arbitrum went offline during the inscription event). Can we design systems that provide minimal service disruption in this scenario? In some sense, L2s that allow self-sequence and state proposals do provide guarantees against prolonged liveness failures. Designing single-operator systems that are more resilient against shorter liveness failures is currently under-explored. One potential solution here is to make liveness failures accountable, by providing slashing against liveness failures. Another potential solution is to simply shorten the delay period (which is currently set to be around a week) before a take-over can happen.
Scaling a global settlement ledger while maintaining trust minimization is a hard problem. There has not been a clear distinction between vertical scaling and horizontal scaling in the rollup and data availability world today. To truly scale trust-minimized systems to everyone on earth, we need to build trust-minimized and horizontally scalable systems.
Many thanks to Vitalik Buterin and Terry Chung for feedback and discussion, as well as Diana Biggs for her editorial comments.