Data availability refers to the ability of block producers to publish all transaction data in a block to the network, allowing validators to download it. The role of data availability in L2 development has become a topic of controversy, as Ethereum Foundation researcher Dankrad Feist tweeted that without data availability on Ethereum, it is not considered L2. This would mean that many chains, such as Arbitrum Nova, Polygon, and Mantle, would be excluded from the L2 category.
So, what exactly is data availability? What data availability issues does L2 face? And why is there so much controversy surrounding the data availability layer in L2? This article will focus on these questions and attempt to unveil the mystery of data availability.
Simply put, data availability refers to the action of block producers publishing all transaction data in a block to the network, allowing validators to download it.
If a block producer releases complete data and allows validators to download it, we say that the data is available. If it hides some data, making it impossible for validators to download the complete data, we say that the data is unavailable.
Difference between data availability and data retrievability
We often confuse data availability with data retrievability, but they are quite different.
Data availability pertains to the stage when a block is produced but has not yet been added to the blockchain through consensus. Therefore, data availability is not related to historical data, but rather to whether newly released data can pass through consensus.
Data retrievability pertains to the stage when data has already passed through consensus and is permanently stored on the blockchain, allowing retrieval of historical data. Nodes storing all historical data in Ethereum are known as archival nodes.
Therefore, the co-founder of L2BEAT once stated in a lengthy tweet that full nodes are not obligated to provide us with historical data, and the reason we can obtain it is simply because full nodes are kind enough.
He also suggested replacing the term “data availability” with “data publishing” as the former could lead to misunderstandings. This viewpoint was supported by the founder of Celestia.
Although the concept of data availability originated from Ethereum, our current focus is on data availability in the L2 layer.
In L2, sequencers act as block producers, and they need to release sufficient transaction data for validators to verify the validity of transactions. (To learn more about sequencers, please read the previous article “Research Report: Principles, Status, and Future of Sequencers” in the HoloLens Weekly.)
However, in this process, two problems arise: ensuring secure verification mechanisms and reducing the cost of releasing data. These will be explained in detail below.
Issues concerning secure verification mechanisms
We know that OP Rollup uses fraud proofs to verify the validity of transactions, while ZK Rollup uses validity proofs.
For OP Rollup: If sequencers do not release complete data that can trace back to a block, challengers in fraud proofs will be unable to launch effective challenges.
For ZK Rollup: Although validity proofs themselves do not require data availability, ZK Rollup as a whole still requires data availability. If there is no data that can trace back to a block, users will not know their balances and may lose their assets.
To ensure secure verification, current L2 sequencers generally publish L2 state data and transaction data on Ethereum, which has stronger security. They rely on Ethereum for settlement and data availability.
Therefore, the data availability layer is actually where L2 releases transaction data. Currently, mainstream L2 solutions use Ethereum as the data availability layer.
Reducing the cost of releasing data
L2 currently simplifies data availability by conducting settlement and data availability on Ethereum. Although this provides sufficient security, it also incurs high costs. This is the second problem faced by L2: how to reduce the cost of releasing data.
The total gas paid by users to L2 mainly consists of the gas consumed by L2 for transaction execution and the gas consumed by L2 to submit data to L1. The former costs very little, while the latter constitutes the majority of user costs. Among these, the transaction data released to ensure data availability accounts for the major portion of the gas consumed by L2 to submit data to L1, while the proof data for verifying transaction validity only accounts for a small portion.
To make L2 more cost-effective, the cost of releasing data must be reduced. There are two main methods:
1. Reduce the cost of releasing data on L1, such as the upcoming EIP-4844 upgrade on Ethereum. For those interested in the EIP-4844 upgrade, you can read the previous article “Web3 Introduction: Understanding the Great Benefits of Layer2: EIP-4844.”
2. Follow the example of Rollup and separate transaction execution from L1, as well as separate data availability from L1 to reduce costs. This means not using Ethereum as the data availability layer.
The controversy surrounding L2 and the data availability layer stems from the concept of modular blockchains. Modular blockchains decouple the core functions of the entire blockchain into relatively independent parts and expand the performance of a single blockchain through various combinations of specialized networks.
Although there is still some controversy regarding the layering of modular blockchains, the widely accepted approach is to divide modular blockchains into four layers: execution, settlement, consensus, and data availability. The functions of each module are shown in the following diagram.
Modular blockchains are similar to Lego blocks. By customizing and using the best blocks, a well-functioning model can be built, alleviating the “impossible triangle” problem of blockchains.
However, current L2 solutions, apart from separating the execution layer from Ethereum, still rely on Ethereum for the other three layers. However, due to cost considerations, many L2 solutions are preparing to separate the data availability layer from Ethereum and only use Ethereum as the settlement and consensus layer.
Interestingly, Ethereum seems reluctant to let L2 obtain data availability from other sources. Dankrad Feist, a researcher at the Ethereum Foundation, once stated in a tweet that not using Ethereum as the data availability layer means it is not a Rollup and therefore not L2.
At the same time, in the latest definition of L2 by L2BEAT, it is stated that any scalability solution that does not release data on L1 is not L2, as using off-chain data availability solutions cannot guarantee that operators will provide the released data.
Of course, there is currently no definitive conclusion on what L2 is. Both Ethereum Foundation members and L2BEAT insist that the data availability layer should remain on Ethereum seemingly due to security considerations. However, is there actually a concern about undermining Ethereum’s position?
Ethereum’s vision is to become a supercomputer platform. Later, in order to improve network performance, it had to develop Rollup and allow many ecosystems to develop on more cost-effective L2 solutions. However, because security is provided by Ethereum, it has not had a significant impact on Ethereum’s position. But if L2 removes the data availability layer involved in data publishing from Ethereum, it essentially weakens the reliance on Ethereum’s security and gradually moves away from Ethereum, posing a threat to Ethereum’s position.
Nevertheless, this does not prevent the flourishing development of projects related to the data availability layer. In the next article on data availability, the author will provide a detailed introduction to the main data availability solutions and related projects currently available on the market. Stay tuned.
Related Reports
Co-founder of Polygon “Flying Solo”: An Overview of Avail’s Data Availability Vision
Disassembling the “Data Availability Layer”: The Overlooked Lego Blocks in Modular Future
In the Layer2 era, how can we save the fragmented liquidity?