EIP-4444 addresses the historical growth problem of Ethereum and creates space for increasing the gas limit.
Historical growth has become the biggest bottleneck for Ethereum scaling, even surpassing the growth of the state. Within a few years, the historical data will exceed the storage capacity of many Ethereum nodes.
The good news is that historical growth is an easier problem to solve than state growth, and a solution is actively being developed. Resolving historical growth will alleviate the problem of state growth.
In this article, we will continue to explore the scalability issues of Ethereum, shifting our focus from state growth to historical growth. Using detailed data sets, our goals are to 1) technically understand the scalability bottleneck of Ethereum and 2) facilitate discussions on the optimal solution for Ethereum gas limits.
So, what is historical growth? It is the collection of all blocks and transactions executed by Ethereum throughout its existence, from the genesis block to the current block. Historical growth is the accumulation of new blocks and transactions over time.
Figure 1 shows the relationship between historical growth and various protocol metrics and hardware constraints of Ethereum nodes. Compared to state growth, historical growth is subject to a different set of hardware constraints. Historical growth puts pressure on network IO as new blocks and transactions need to be propagated across the network. It also puts pressure on node storage space as each Ethereum node stores a complete copy of the historical records. If historical growth exceeds these hardware constraints, nodes will no longer be able to achieve stable consensus with their peer nodes. For an overview of state growth and other scalability bottlenecks, please refer to Part 1 of this series.
Figure 1: Ethereum scalability bottlenecks.
Until recently, most of the network throughput of each node was used for transmitting historical records such as new blocks and transactions. However, the introduction of blobs in the Dencun hard fork has changed this dynamic. Blobs now account for a significant portion of node network activity. However, blobs are not considered part of the historical records because 1) they are only stored by nodes for 2 weeks and then discarded, and 2) they do not require the replication of data since Ethereum’s genesis. Due to (1), blobs do not significantly increase the storage burden of each Ethereum node. We will discuss blobs in more detail later in this article.
In this article, we will focus on historical growth and discuss the relationship between history and state. Since state growth and historical growth have overlapping hardware constraints, they are related issues, and solving one problem can help solve the other.
How fast is historical growth? Figure 2 shows the historical growth rate of Ethereum since its genesis. Each vertical line represents a month of growth, and the y-axis represents the gigabytes of historical growth for that month. Transactions are categorized by their “target address” and represented in bytes using the RLP (Recursive Length Prefix) encoding. Contracts that cannot be easily identified are classified as “unknown.” The “other” category includes a range of smaller categories such as infrastructure and gaming.
Figure 2: Historical growth rate of Ethereum over time.
Some key points from the above chart are:
Historical growth is 6 to 8 times faster than state growth: The historical growth rate recently peaked at 36.0 GiB/month and is currently at 19.3 GiB/month. The peak state growth rate is approximately 6.0 GiB/month, currently at 2.5 GiB/month. The comparison between historical and state growth in terms of growth and cumulative size will be discussed later in this section.
Historical growth has been accelerating until Dencun: While state growth has been roughly linear over the years (see Part 1), historical growth has been superlinear. Considering that a linear growth rate leads to a quadratic growth in overall scale, a superlinear growth rate results in an overall scale exceeding quadratic growth. This acceleration suddenly stopped after Dencun. This is the first significant drop in historical growth rate that Ethereum has experienced.
Recent historical growth is mostly driven by Rollups: Each L2 publishes its transaction copies back to the mainnet, which generates a significant amount of historical records and makes Rollup the largest contributor to historical growth in the past year. However, Dencun allows L2s to use blobs instead of historical records to publish their transaction data, so Rollup no longer generates the majority of Ethereum’s historical records. We will discuss Rollups in more detail later in this article.
Who are the biggest contributors to Ethereum historical growth? The amount of historical records generated by different contract categories reveals how Ethereum’s usage patterns have evolved over time. Figure 3 shows the relative contributions of various contract categories to historical growth. It is the same data as Figure 2, normalized.
Figure 3: Contributions of different contract categories to historical growth.
These data reveal four different periods of Ethereum usage patterns:
Early days (purple): There was almost no on-chain activity in the initial years of Ethereum. Most of these early contracts are now difficult to identify and are labeled as “unknown” in the chart.
ERC-20 era (green): The ERC-20 standard was finalized at the end of 2015 but didn’t see significant development until 2017 and 2018. ERC-20 contracts became the largest contributors to historical growth in 2019.
DEX/DeFi era (brown): DEX and DeFi contracts appeared on the chain as early as 2016 and gained attention in 2017. However, it wasn’t until the DeFi summer of 2020 that they became the largest category of historical growth. DeFi and DEX contracts have accounted for over 50% of historical growth in 2021 and 2022.
Rollup era (gray): Starting from early 2023, L2 Rollups began executing more transactions than the mainnet. They generated around 2/3 of Ethereum’s historical records in the months leading up to Dencun.
Each era represents more complex Ethereum usage patterns than the previous one. Over time, complexity can be seen as a form of Ethereum scaling that cannot be measured by simple metrics like transactions per second.
In the most recent data month (April 2024), Rollups no longer generate the majority of historical records. It is currently unclear whether future historical records will come from DEX and DeFi or if there will be new usage patterns.
What about blobs? Dencun introduced blobs, which significantly changed the dynamics of historical growth by allowing Rollups to use cheap blobs instead of historical records to publish data. Figure 4 zooms in on the historical growth rate before and after the Dencun upgrade. The chart is similar to Figure 2, but each vertical line represents a day instead of a month.
As a professional translator, here is the translation of the news article:
Unlike state data, historical data is only appended and has a much lower access frequency. Therefore, it is theoretically possible to store historical data separately on cheaper storage media. This can be achieved through some clients such as Geth.
In addition to storage capacity, network IO is another major limitation for historical growth. Unlike storage capacity, network IO limitations will not cause problems for nodes in the short term, but these limitations will become important for future increases in Gas limits.
To understand how much historical growth typical Ethereum nodes’ network capacity can support, it is necessary to know the relationship between historical growth and various network health indicators, such as reorg rate, slot miss, finality miss, proof miss, sync committee miss, and block submission delay. The analysis of these indicators is beyond the scope of this article but more information can be found in previous investigations into the health of the consensus layer. Additionally, Ethereum Foundation’s Xatu project has been building public datasets to accelerate such analyses.
How to solve the problem of historical growth?
Historical growth is a problem that is easier to solve than state growth. It can be almost completely addressed by the proposed EIP-4444. This EIP changes the nodes’ storage from saving the entire Ethereum history to only saving one year of historical data. After implementing EIP-4444, data storage will no longer be a bottleneck for Ethereum scalability, and in the long run, there will be no constraints on increasing the Gas limit. EIP-4444 is necessary for the long-term sustainability of the network, otherwise historical growth will quickly outpace the hardware updates of network nodes.
Figure 6 shows the impact of EIP-4444 on the storage burden per node over the next 3 years. This is the same as Figure 4 but with additional lighter lines representing the storage burden after the implementation of EIP-4444.
Figure 6: The impact of EIP-4444 on Ethereum node storage burden
From this chart, several key conclusions can be drawn:
EIP-4444 will halve the current storage burden. The storage burden will decrease from 1.2 TiB to 633 GiB.
EIP-4444 will stabilize the historical storage burden. Assuming a constant historical growth rate, historical data will be discarded at the rate it is generated.
After EIP-4444, it will take many years for the storage burden of nodes to reach today’s levels. This is because state growth will be the only factor increasing the storage burden, and the growth rate of state is slower than historical growth.
After implementing EIP-4444, historical growth will still impose some storage burden as nodes will store one year of historical records. However, even if Ethereum reaches a global scale, this burden can be easily addressed. Once the method of storing historical records is proven to be reliable, the one-year expiration time of EIP-4444 may be shortened to a few months, weeks, or even less.
How to store Ethereum’s historical records?
EIP-4444 raises the question of how historical records should be stored if not by Ethereum nodes themselves. Historical records play a crucial role in the validation, computation, and analysis of Ethereum, so preserving historical records is essential. Fortunately, historical record storage is a straightforward problem that only requires 1/n honest data providers. This is in stark contrast to the state consensus problem that requires 1/3 to 2/3 of participants to be honest. Node operators can verify the authenticity of the historical dataset by 1) replaying all transactions since the genesis block and 2) checking if these transactions reproduce the same state root as the current blockchain.
There are many methods for storing historical records.
Torrents/P2P: Torrents are the simplest and most reliable method. Ethereum nodes can periodically package parts of the historical records and share them as public Torrent files. For example, a node may create a new historical Torrent file every 100,000 blocks. Node clients like Erigon have already implemented this process to some extent in a non-standardized way. To standardize this process, all node clients must use the same data format, parameters, and P2P network. Nodes will be able to participate in this network based on their storage and bandwidth capabilities. The advantage of Torrents is that they use a high lindy open standard that is supported by a large number of data tools.
Portal Network: Portal Network is a new network designed specifically for hosting Ethereum data. It is a method similar to Torrents but with additional features that make data verification easier. The advantage of Portal Network is that these additional verification layers provide utilities for lightweight clients to efficiently verify and query shared datasets.
Cloud Hosting: Cloud storage services like AWS’s S3 or Cloudflare’s R2 provide a cheap and high-performance option for storing historical records. However, this method carries more legal and operational risks as there is no guarantee that these cloud services will always be willing and able to host cryptocurrency data.
The remaining implementation challenges are more social challenges than technical challenges. The Ethereum community needs to coordinate specific implementation details to integrate them directly into each node client. In particular, performing a full sync from the genesis block (rather than snapshot sync) would require retrieving historical records from historical data providers rather than Ethereum nodes. These changes do not require a hard fork technically, so they can be implemented earlier than Ethereum’s next hard fork, Pectra.
All these historical storage methods can also be used by L2 to store the blob data they publish to the mainnet. Compared to historical storage, blob storage is 1) more challenging because the total amount of data is much larger and 2) less critical as blob is not necessary for replaying the mainnet history. However, blob storage is still necessary for each L2 to replay its own history. Therefore, some form of blob storage is important for the entire Ethereum ecosystem. Additionally, if L2 develops robust blob storage infrastructure, they may also be able to easily store L1 historical data.
Directly comparing the storage datasets of different node configurations before and after EIP-4444 would be helpful. Figure 7 shows the storage burden of different types of Ethereum nodes. State data is accounts and contracts, historical data is blocks and transactions, and archive data is a set of optional data indexes. The byte numbers in this table are based on the most recent reth snapshot, but the numbers for other node clients should be roughly similar.
Figure 7: Storage burden of different types of Ethereum nodes
In other words,
Archive nodes store both state data and historical data, as well as archive data. Archive nodes can be used when someone wants to easily query the historical chain state.
Full nodes only store historical data and state data. Most nodes today are full nodes. The storage burden of full nodes is approximately half of archive nodes.
After EIP-4444, full nodes will only store state data and the most recent year of historical data. This will reduce the storage burden of nodes from 1.2 TiB to 633 GiB and stabilize the storage space for historical data.
Stateless nodes, also known as “light nodes,” do not store any dataset and are able to immediately verify at the end of the chain. Once Verkle tries or other state commitment schemes are added to Ethereum, this type of node becomes possible.
Finally, there are some additional EIPs that can limit the historical growth rate, not just adapt to the current growth rate. These will help keep within network IO constraints in the short term and within storage constraints in the long term. Although EIP-4444 is still necessary for the long-term sustainability of the network, these other EIPs will help Ethereum scale more efficiently in the future:
EIP-7623: Repricing call data to make certain call data-heavy transactions more expensive. Making these usage patterns more expensive will force some of them to be converted from call data to blob. This will reduce the historical growth rate.
EIP-4488: Imposing limits on the total amount of call data that can be included in each block. This will impose stricter limits on the growth rate of historical records.
These EIPs are easier to implement than EIP-4444, so they could serve as short-term solutions before EIP-4444 is put into production.
Conclusion
The purpose of this article is to provide new insights into the problem of historical growth through data and understand the methods to solve this problem. Many of the data in this article are difficult to obtain through traditional means, so we hope to provide some new insights into the issue of historical growth.
Historical growth as a bottleneck for Ethereum scalability has not received enough attention. Even without increasing the Gas limit, the current practice of storing historical records in Ethereum will require many nodes to upgrade their hardware within a few years. Fortunately, this is not an insurmountable problem. There is already a clear solution in EIP-4444. We believe that the implementation of this EIP should be accelerated to leave room for future increases in the Gas limit.
Tags
DeFi
Gas
Paradigm
Portal
Rollup
Ethereum
Hard Fork
Source link:
https://foresightnews.pro/article/detail/59749
Note: This translation is solely the translator’s understanding and interpretation of the original article.