Why is "Available Data" Important for Blockchain Scaling?

区块律动BlockBeats view 32261 2022-1-17 09:20

As you may have heard, Ethereum's sharding roadmap has largely eliminated anti-sharding and today focuses on data sharding, maximizing the throughput of the Ethereum data space.

You may have seen a recent discussion on modular blockchains, dig into a rollup, learn about the concepts and usability, and hear about "data-intensive solutions".

But along the way, you can also ask, "What are the facts?" »

Before we start, we can take a look at how most blockchains work.

Market, nodes and the famous blockchain trilemma

If the annual interest rate had been a surprise when I encountered the new OHM fork, I would have pressed the “participate” button without hesitation. But what if I refer to Metamask?

为什么「数据可用性」对区块链扩容来说至关重要？

Easily provided, your business will fit into mempool. Assuming the bribe paid to miners or employees is high enough, your company will enter the next block and then add the blockchain for people to visit. . The block containing the exchange is then sent to the network of blockchain nodes. Then all nodes will output this new block, complete and count all the changes it has (including yours, of course), while identifying at the same time that these are available supplies. For example, in a company, all nodes can check if they stole someone else's money and if they have enough ETH to pay for fuel, etc. Therefore, the main function of all nodes is to implement blockchain rules for miners and validators.

It is because of this mechanism that existing blockchains have scalability issues. A blockchain can't run more than 1 second without hardware upgrade (hardware is better to improve performance, more total energy will be more because there are more changes can be warranted (one block that can have multiple However, if the device needs to operate all nodes increase, the number of all nodes will decrease and the decentralization process may suffer. follow the rules. nce).

为什么「数据可用性」对区块链扩容来说至关重要？

Data availability is one of the main reasons why you can't have scalability, security, and decentralization all at the same time.

This process also highlights the importance of consolidating data into existing monolithic blockchains. Block makers (miners/verifiers) need to report and exchange data for the blocks they create so they can verify that all nodes are working. If the blockchain manufacturer does not provide this information, all nodes will not be able to identify their operations and there is no way to ensure that they are following the blockchain rules.

Now that we understand why data is so important in traditional monolithic blockchains, let's take a look at how data is included in our scalability solution, Rollup.

How important is the information contained in the content?

First, let's go back to how Rollup solves the scalability problem. Instead of increasing the hardware requirements to run all nodes, why not reduce the number of changes required for all nodes to be valid? You can streamline and complete operations from all nodes to multiple powerful computers (also known as sequencers).

But does that also mean you have to trust the serializer? If you want to keep the hardware rules of all lower nodes, it should be slower than the sequencer when testing. So how do you make sure that the new block mentioned by this serializer is valid (eg to make sure the serializer doesn't steal money). Since this question has been asked so many times, I think you already know the answer to this question. However, be patient to read the comments below. review):

In the case of an optimistic cumulation, you can rely on evidence of fraud to increase the credibility of the sequencer (unless someone submits false evidence that the sequencer is invalid is bad to buy, the sequencer often defaults to reliable operation). However, if you want others to be able to count the evidence of fraud, you will need to get the market information completed by the sequencer to submit the evidence of fraud. In other words, the sequencer must modify the data. Otherwise, no one can guarantee the reliability of the prospects of success.

为什么「数据可用性」对区块链扩容来说至关重要？

With ZK rollup, make sure your serializer is more flexible. When the serializer completes a batch operation, it must present a usability certificate (ZK-SNARK or ZK-STARK), which can guarantee that there is no malfunction or malfunction. . It appears in the serializer. Moreover, anyone (even a smart contract) can easily verify this proof. But for ZK-Rollup's serializer, data is still very important. Because as a Rollup user, you need to know the balance you have on Rollup if you want to invest in Shitcoin quickly. However, without the redemption information, your balance will not be known and you will not be able to meet the deduction.

From the above, you can understand why people always admire rollups. Since not all nodes have to follow the sequencer, how can we turn it into a powerful computer? This change will allow the sequencer to make multiple changes in one second, which will reduce fuel costs and everyone will be happy. However, the serializer must still exchange data. This means that even though the serializer is a real supercomputer, the number of changes per second it can count can still be really limited by the data contained below or the data contained in the process that uses the data. Limits crossed.

In short, if the file contains a resolution or if the data layer used by the rollup cannot save the data that the rollup sequencer data is trying to get rid of, there is nothing the sequencer (and therefore the rollup) can do, even if he wants to. More offers. At the same time, Ethereum oil prices may continue to rise.

This is why it is important to have important information. Once the data has been reliably processed, the behavior of the rollup process may be affected, and when the rollup is ready to be maximized, the data has a solution or the data processing maximizes the throughput of the data source . It also became important.

But, as you know, I haven't answered all the questions whether the sequencer will work or not. If all the calculation speed of the nodes of the main chain of cumulation does not need to go with the sequencer, the sequencer will contain a significant part of the data exchange. The question is how do the main chain nodes force the serializer to flush the data on top of the layered data? And if Node doesn't, we won't be successful at scalability at all. Because we have to rely on the serializer or pay for the supercomputer ourselves.

The above problems are also called "data problems".

Solutions to "data problems"

The simplest solution to problematic data is to force the entire node to extract all deleted data from the system in a standardized process or solution. . But at the same time, we also know that it will not help us because it will require all nodes to quickly follow the sequencer industry standard and equipment must be completed. Dispersion.

So we need a better solution to this problem, and luckily we have a solution.

Evidence of information available

Each time the serializer flushes a new block of data exchange, the node can "sample" the data to check the available data to ensure that the data is getting from the serializer.

The role of proof documents is mainly mathematical and descriptive, but I want to clarify that (see John Adler).

First, you might want to make sure that data file blocks that have been deleted by the serializer should be encoded. That is, the size of the old file is doubled, and then the new files and additional files are encoded as duplicate files. delete code). After erasure coding, you can use 50% of the erasure coded data to recover the entire contents of the old data.

为什么「数据可用性」对区块链扩容来说至关重要？

Eliminate custom procedures and Fortnite games using strategies that allow you to continue to harass your evil cousin and his friends even after fearing the cat in time.

However, after a block of operations is erasure coded, the sequencer must contain at least 50% of the block's data to function properly. However, if the block is not deleted encoded, the sequencer may not work properly with only 1% of the data remaining. Thus, by coding data erasure, the entire node can ensure that the serializer can process the available data.

However, we want to be as reliable as possible so that the sequencer can provide all the data. The best thing you can do to make the sequencer reliable is to directly extract all the blocks from the file swap, and in practice this is accomplished. You can extract some files from a block. If the sequencer is malfunctioning, there is less than a 50% chance that the entire node will download a random application while the sequencer tries to persist data. This is because if the serializer malfunctions and you want to keep the data, you should keep 50% or more of the erasure-encoded data.

At the same time, it also means that if all nodes can do this twice, the probability of fraud will be reduced. By selecting a single file from the data for the second pull, the Total Node can reduce the risk of fraud by less than 25%. In fact, the 7th random data fetch from all nodes has a less than 1% chance of failing to capture the sequencer containing the data.

This process is called sampling with document sampling or simply data sampling. The test results are very good because the nodes can extract a small part of the data through the main chain sequencer and can guarantee the same effect by extracting and analyzing the whole blocks (the nodes can use the main merkle). base of the chain to find stitches and areas for patterns). To make it a little more understandable for everyone, I can imagine the power of the data in an example if a community could burn as many calories as running 10 miles in 10 minutes.

If all nodes in the keychain can have sampled data available, the rolling process can prevent a malfunction. Now we can all be sure that Rollup will continue our favorite blockchain, so everyone should be happy. But before leaving this page, remember that we have to find a way to extend the existing information on its own? If you want everyone in the world to join the blockchain and make more money, you have to create competition, and if you want to use rollups to measure your blockchain, you have to limit the sequencer to malfunctions, including costs. the sequencer to reject the data exchanged. to reduce throughput the data sources must be expanded.

Evidence of data availability is also important for measuring cross-sectional data.

A recent set 1 with a method focused on cross-platform data measurement is Ethereum. We plan to measure access to data sources by data sharing, which means that not all validators will continue to pull the same data changes based on the current node (these validators can run nodes). Instead, Ethereum essentially divides the validator network into several partitions, also known as "sharding". Assuming you have 1000 validators and they all use to store the same data, splitting them into 4 groups of 250 users each will quadruple the data dump in the immediate space. Sounds simple enough, right?

为什么「数据可用性」对区块链扩容来说至关重要？

Ethereum wants to implement 64 data shards in its “near-term” data sharing roadmap.

The problem, however, is that users of a partition can only extract swap data and dump it into the partition. This means that a fragment's validators cannot guarantee that all data rejected by the sequencer exists. They can only guarantee that data dumped on partitions is available, but cannot guarantee that data from other partitions is missing.

There may be situations where the validator of one shard does not know what is happening in another shard, so it is not possible to determine if the sequencer is working correctly, and this can also be illustrated with the data that 'it contains. If you are the user of one partition, you can sample data using the credentials of information available from all other partitions. This way you equalize the validators of each shard, so available data is guaranteed and Ethereum secures your shard data.

Other chains, tories and polygon use can also want to expand data out of places. Unlike other protection, celestia and polygon soil is both in the block and industry and information available. In other words, to ensure the relanner of Celestia and polygon is useful, needed materials to provide appropriate properly and order necessary information. However, this information is not required to do all the achievements (eg, completion, or count), so you do not have to use all of reliability. Instead, the light of the ability to complete the documents that have a practical processing is to be responsible for completion of all of the data structure. In other words, they're available for nodes enough that can prove the file, you can easily use the phone number of information. Continue the block and improve the hardware of proofs that must be made to increase the information.

为什么「数据可用性」对区块链扩容来说至关重要？

Let's conclude now. Data issues are perhaps at the heart of the blockchain trilemma that affects all of our scaling efforts. Fortunately, we can solve the problem of possessing data by using critical technology as evidence. This provides access to data sources to measure size, reduce the cost of data aggregations dumping a large amount of industry data to manage market flows for the entire world. Proven data also ensures reliability without having to rely on deployment processes. We hope this article has helped you understand why data is so important to open all information resources.