The storage demand for Ethereum is continuously increasing, posing significant challenges. This article explores the reasons behind the problem and proposes solutions and future prospects. The article is sourced from an article by EthStorage and compiled, translated, and written by Shenzhen.
Table of Contents:
Background
Challenges in storage
Ethereum storage roadmap and its consequences
Solution 1: Ethereum Portal Network
Solution 2: EthStorage Network
Future outlook
On October 22, 2023, Péter Szilágyi, the lead developer of the famous Go-Ethereum (Geth), expressed his deep concern on Twitter. He pointed out that while Geth retains all historical data, other Ethereum clients like Nethermind and Besu have the option to delete certain historical Ethereum data (such as historical blocks and headers). This inconsistency in behavior among clients is unfair to Geth. This sparked intense discussions and debates regarding Ethereum’s storage problem as outlined in the Ethereum roadmap.
Why did Nethermind and Besu choose to stop storing historical data? What are the issues behind this decision? From our perspective, there are two main reasons:
1. The storage requirements for Ethereum clients are increasing.
2. There is no protocol-level incentive or punishment for storing Ethereum historical data.
The first reason stems from the growing storage demands of Ethereum clients. To understand the specific requirements, the pie chart below shows the storage distribution of a new Geth node as of block 18,779,761 on December 13, 2023.
[Image]
As shown in the chart:
Total storage size: 925.39 GB
Historical data (blocks / transaction receipts): approximately 628.69 GB
State data in Merkle Patricia Trie (MPT): approximately 269.74 GB
The second reason is the lack of protocol-level incentives or punishments for storing historical blocks. While the protocol requires nodes to store all historical data, it fails to provide any mechanisms to encourage storage or penalize non-compliant behavior. Node operators store and share historical data purely out of altruism, and client implementers are free to delete or modify all historical data without any consequences. In contrast, Validator nodes must maintain and update the complete state locally to prevent slashing due to proposing/voting for invalid blocks.
Therefore, it is not surprising that some node operators choose to delete historical data when storage costs become a significant burden. Without historical data, node clients can significantly reduce storage costs, from approximately 1TB to around 300GB.
With the upcoming Ethereum Data Availability (DA) upgrade, the storage challenges will intensify.
The road to comprehensive Ethereum DA scalability began with the EIP-4844 in the DenCun upgrade, which introduced a fixed-size Binary Large Object (BLOB) and an independent fee model called blobGasPrice. Each BLOB is set to 128KB, and EIP-4844 allows a maximum of 6 BLOBs per block. To scale data throughput, Ethereum plans to adopt 1D Reed-Solomon codes, initially allowing 32 BLOBs per block and eventually reaching 256 BLOBs per block in full scalability.
If Ethereum DA is implemented at full capacity (256 BLOBs per block), the Ethereum DA network is expected to receive approximately 80TB of DA data per year, far exceeding the storage capacity of most nodes.
[Image]
[Image]
Vitalik’s tweet about the Ethereum roadmap mentions Purge, which mainly involves storage aspects.
The rising storage costs have attracted the attention of Ethereum ecosystem researchers. To address this issue and ensure consistency among all clients, researchers are developing proposals to explicitly delete historical storage. Two main proposals are:
EIP-4444: Limiting historical data in executing clients: This proposal allows clients to delete historical blocks older than one year. Assuming an average block size of 100K, the maximum historical block data would be approximately 250GB (100K * (3600 * 24 * 365) / 12, assuming a block time of 12 seconds).
EIP-4844: Sharding BLOB transactions: EIP-4844 discards BLOBs older than 18 days. This is a more aggressive approach compared to EIP-4444, limiting the historical BLOB size to around 100GB ((18 * 3600 * 24) * 128K * 6 / 12, assuming a block time of 12 seconds).
What are the consequences of deleting historical data for all clients? One major issue is that new nodes cannot synchronize to the latest state through “full sync” mode, which is a synchronous process that executes transactions from the genesis block to the latest block. Consequently, we must rely on “snap sync” or “state sync” to directly synchronize the latest state from Ethereum nodes. This approach has been implemented in Geth and is the default synchronization method.
Similarly, this consequence applies to all Layer 2 (L2) solutions, where new L2 nodes cannot fully sync to the latest L2 state by replaying from L2 genesis to the latest L2 block. Additionally, since L1 nodes do not maintain L2 state, the “snap sync” method for L2 cannot derive the latest L2 state from L1, violating an important L2 assumption of inheriting Ethereum’s security guarantees. The proposed solution would rely on third-party services like Infura/Etherscan/L2 projects themselves to store historical L2 data or state copies, creating a centralized solution achieved through off-chain, indirect incentives.
The core questions we need to explore are:
Can we find better decentralized solutions for storage and access?
Is it possible to have a solution that is directly incentivized and consistent with Ethereum (e.g., on top of L1 contracts)?
Based on all of this, can we provide a fully decentralized, protocol-level incentivized solution for Ethereum storage?
The Ethereum Portal Network is a lightweight, decentralized access network for connecting to the Ethereum protocol. It provides Ethereum JSON-RPC interfaces like eth_call, eth_getBlockByNumber, etc., which translate JSON-RPC requests into P2P requests to a decentralized hash table (DHT), similar to the IPFS network. Unlike IPFS, which allows storing any data type and is susceptible to junk data, the Portal P2P network specifically hosts Ethereum data, such as historical block headers and block transaction data. This is achieved through the built-in light client verification technology of the Portal network.
An important feature of the Portal network is its lightweight design and compatibility with resource-constrained devices. It can run on nodes with only a few megabytes of storage space and low memory, facilitating decentralization. Even mobile phones or Raspberry Pi devices can join the network and contribute to the availability of Ethereum data.
The development of the Portal network aligns with the concept of Ethereum client diversity, with clients written in Rust, JavaScript, and Nim. The Beacon network and Historical network are already available, and the State network is actively being developed. It is worth noting that the Portal network does not directly incentivize data storage; all nodes in the network operate in an altruistic manner.
[Image]
Image: Rust client (Trin) of the Portal network with a 100MB storage limit running
The EthStorage Network is a decentralized incentivized storage network specifically designed for storing EIP-4844 BLOBs and is funded by the ESP project.
Minimum trust: Unlike existing solutions that require centralized data bridges, EthStorage relies on Ethereum’s consensus and a 1/m trust model of permissionless EthStorage storage nodes. The process of storing BLOBs works as follows: Users sign a transaction carrying the BLOB and call the put(key, blob_idx) method of the storage contract. The storage contract then records the BLOB hash on the chain. Subsequently, storage providers download and store the BLOB directly from the Ethereum DA network, bypassing the data bridge problem.
Storage cost aligned with incentives: When calling the put() method, the transaction must send a storage fee (via msg.value) and deposit it into the contract. Once storage nodes successfully submit and verify storage proofs on the chain, this storage fee will be gradually allocated to the storage nodes over time. Unlike the existing Ethereum storage fee model that pays a one-time storage fee to the proposer, the paid storage fee follows a discounted cash flow model over time, assuming storage costs will decrease relative to the ETH price. This significant innovation introduced by EthStorage ensures that the cost aligns with the storage contributions of nodes.
Storage proofs: Storage proofs are inspired by data availability sampling, and in EthStorage, sampling is done for a period of storage for BLOBs. To efficiently verify on-chain samples, EthStorage leverages smart contracts and the latest developments in SNARK technology.
Permissionless operation: Any storage node in EthStorage can contribute by storing data and regularly submitting storage proofs on the chain, and they will be rewarded.
From the perspective of modular blockchains, EthStorage acts as an Ethereum storage L2 but charges storage fees instead of transaction fees. By indexing BLOB hashes on the chain, EthStorage becomes a modular storage layer for Ethereum, enhancing storage scalability and reducing costs (targeted at approximately 1000x).
In terms of development, EthStorage has already integrated with EIP-4844 on the Ethereum Sepolia testnet. We have stress-tested EthStorage and the Ethereum Sepolia testnet, including writing hundreds of GBs of BLOBs to EthStorage. Over 100 community participants joined the network and successfully proved their local storage.
The main advantage of the EthStorage network is that it provides decentralized direct incentives on top of Ethereum, which, to our current knowledge, is a groundbreaking feature. However, the limitation of this network is that it is specifically designed for fixed-size BLOBs.
[Image]
Dashboard of EthStorage on the Ethereum Sepolia testnet
Although Ethereum storage has not received significant attention yet, it is of significant importance in the Ethereum ecosystem. With the rapid growth of the Ethereum network, storage and accessibility of Ethereum data become critical challenges. The Portal network and EthStorage network are still in early stages, and there are many important long-term development directions to focus on:
Decentralized low-latency access to Ethereum state data network: Accessing Ethereum state in a decentralized and verifiable manner is a critical but challenging task. Querying account information using traditional DHT network models often requires multiple queries to internal trie nodes stored in different P2P nodes, resulting in considerable latency. Finding ways to leverage the structure of the state tree to accelerate access is crucial. The upcoming state network of the Ethereum Portal network is designed to address this issue.
Integration of Portal network and EthStorage network: The Portal network can seamlessly expand its capabilities to support BLOB data. The EthStorage team has partially implemented this feature. The next step is to unify these networks and provide a decentralized JSON-RPC network that can programmatically access BLOBs through contracts. By combining the application logic in contracts with scalable BLOB storage provided by EthStorage, new dApps can be enabled on Ethereum, such as dynamic decentralized websites (e.g., decentralized Twitter/YouTube/Wikipedia).
Decentralized access through browsers: Similar to accessing data in the IPFS network using the ipfs:// protocol, the web3 industry needs a native Ethereum access protocol to support direct browser access, unlocking the enormous potential of Ethereum’s rich data. These data cover a wide range of areas, from token ownership and account balances to NFT images and dynamic decentralized websites, all benefiting from the capabilities of smart contracts and future Ethereum storage. In this field, the web3:// protocol defined by ERC-4804/6860 is currently being actively developed and promoted to achieve this goal.
Advanced storage proofs for dynamic-sized data: Besides fixed BLOBs, exploring advanced storage proofs is essential for addressing dynamic-sized data, such as historical blocks or even state objects. Developing sophisticated algorithms can enhance the adaptability of storage solutions.
In our pursuit, we hope to contribute to the Ethereum roadmap and lay the foundation for decentralized storage solutions in the future Ethereum ecosystem through these efforts.
Related Reports:
Dapps are getting closer to everyday life: the end of reflexivity era
Vitalik criticized MPC wallets: “fundamental flaws,” multi-signature smart contract wallets are the only choice
Highlights from Ethereum EDCON Montenegro: Hot trends and projects to watch