A SECURE SHARDING PROTOCOL FOR OPEN BLOCKCHAINS

A SECURE SHARDING PROTOCOL FOR OPEN BLOCKCHAINS
Loi Luu, Viswesh Narayanan, Kunal Baweja, Chaodong Zheng, Seth Gilbert, Prateek Saxena National University of Singapore

Blockchains Satoshi Nakamoto, 2009 Blockchain agreement protocol
Public database Shared & maintained between network participants Blockchain agreement protocol Periodically agree on a new block of data So what is blockchain? Blockchain was first introduced back in 2009, by an author with a pseudo name Satoshi Nakamoto. You can view a blockchain as a public database, which has a special data structure: a block of data after another. This public database is shared and maintained by all network participants, or nodes in the network. Since the introduction, blockchain has been used in many popular cryptocurrency. For example Bitcoin and Ethereum. Network participants, or nodes in these cryptocurrencies will run something called a blockchain agreement protocol to periodically agree on a new block of data. The new block will be appended to the blockchain. Block 4 Block 3 Block 2 Block 1

The Blockchain Agreement Problem
Set of N computational nodes No inherent identity No PKI Agree on a block Includes a set of new transactions (or data) Two properties Agreement Validity: all transactions satisfy some validity conditions Block 3 Block 2 Block 1 Basically, the blockchain agreement protocol solves the blockchain agreement problem, in which we have a set of N computational nodes in an open network. What open network means is that nodes do not have inherent identities, and there is no PKI in the network. These nodes want to agree on a block in every epoch, which includes a set of new transactions, or data. The agreed blocks must satisfy two important properties: First, agreement. That means all the honest nodes will agree on the same block. Second, validity, which means all transactions must satisfy some validity conditions. For example, in cryptocurrencies we have the double spending property.

Demand from Practice: 1,200 - 50,000 TXs/s
Scalability Issue 1 MB block per 10 mins 3-7 TXs per second Support limited computations Several DoS attacks recently Demand from Practice: 1, ,000 TXs/s Currently these cryptocurrencies have the scalability issue. For example, Bitcoin, which is a decentralized peer to peer payment system, can process upto only 7 transactions per second. Similarly, ethereum, which allows users to run their decentralized applications on the blockchain, only supports a limited amount of computations. Still, the ethereum network is under several DoS attacks recently. If we look at at the demand from practice, all the centralized payment processors like Paypal, Visa and MasterCard, they can process up to 50, 000 transactions per second, which is 4 to 5 order of magnitude more than Bitcoin transaction rate. Thus, there is a huge gap between what these cryptocurrencies can support and what we need.

Existing protocols are not scalable
Known static identity set O(n2) messages High ? BFT Protocols Constant TX rate Re-parameterization is not a long term solution The main reason for the issue is because currently we do not have any scalable blockchain protocols yet. If we take a look at the graph between the transaction rate, and the number of nodes in the network. On one hand, we have Nakamoto consensus protocol, which is used in many popular cryptocurrencies. Nakamoto consensus works very efficient in large open networks However it only supports a constant transaction rate, regardless of how many nodes in the network. Recently many people have proposed solutions like increasing the block size, or reducing the block time. But these solutions are not long term, and can offer at most one order of magnitude scalability. On the other hand, we have a line of byzantine agreement, or BFT, protocols, which have been studied for decades. BFT protocols allow a net of known static nodes to reach agreement on some value, and many previous works have proposed to apply BFT protocols to build blockchain-based applications like cryptocurrency. However, BFT protocols require a quadratic number of messages, hencetheir performance worsen when the size of network increases. So BFT protocols only works in a setting where everyone knows everyone, and the network size is small. Thus, in this work, we ask whether we can design a blockchain protocol which can scale up the transaction rate just by adding more nodes to the network. Nakamoto Consensus Low Small Large

Contributions Evaluation Elastico Protocol
Near-linear computational scalability Tolerate up to 25% adversary Secure sharding in open networks Elastico Protocol 1600 nodes on Amazon EC2 Confirm nearly linear scalability Evaluation Our contributions in this work are twofold. First, we introduce elastico which provides near-linear computational scalability. What computational scalability means is that if we double CPUs in the network, we can double the transaction rate. Elastico can tolerate up to 25% of adversary, and is the first candidate for a secure sharding protocol for blockchains in open networks. We implemented a prototype of Elastico, and evaluated it on Amazon Ec2. We run up to 1600 nodes to confirm our scalability property

Problem Statement & Assumptions
Agree on O(N) blocks per epoch Costs per node stay constant Assumptions Synchronous network Bounded delay from a node to all other honest nodes At most 1/4 computation power is controlled by adversary Estimation on the number N of CPU nodes Nodes have equal computation power Non-Assumptions PKI Shared random coin Formally, the problem that we want to solve in this paper is that, we want to allow the network participants to agree on O(N) blocks per every protocol run. Each block includes the same amount of transcactions, so this will allow the transaction rate increases with the size of the network. Most importantly, this is done with the condition that the costs per node stay almost constant. That means we don’t blow up the bandwidth or computation cost per node when scaling up. We want to solve this problem with some realistic assumptions. First, our network is synchronous, that means there is a known bounded delay when a node send a message to other honest nodes. Second, we assume that at most ¼ of the computation power in the network is controlled by the adversary. These are practical assumptions since bitcoin is relying on them, and many previous works also assume the same. Additionally, we assume that we can roughly estimate the number of CPU nodes in the network, and all the nodes have equal computation power. Note that we do not assume any PKI in the network, also we don’t assume any shared random coin. These are impractical assumptions in the network.

Concept of Sharding: More Nodes, More Transaction Blocks
At a high level, Elastico achieves the scalability by distributing large network into different smaller committees, each of the committee will run a less efficient agreement protocol to propose a block which includes a set of different transactions. So, if we have more nodes, more committees are created, and more transaction blocks are generated. This technique is known as sharding and has been used in the distributed database with centralized operators. What we are showing here is how to apply this sharding technique in decentralized and open environments. Block I1 Block I2 Block I3 Block I4 Block I5 Block I6

Sharding: A Straw-man Solution
Assumptions A known list of nodes joining simultaneously H (Coin || PK) Common random coin Just to introduce how sharding works, let us propose a straw man solution for our problem. In this strawman solution, just for the illustration purpose, we assume two impractical assumptions that we do not make in our setting. We first assume a set of known nodes joining the network simultaneously before the protocol starts. We also assume there exists a random common coin. With these assumption, the sharding is quite straightforward. to do the partitioning, each of the node computes the hash value of their public key and the random coin to know which committee they belongs to. They then broadcast the value to the network so that committee members in the same committee will know each other. After that, each committee will run a classical BFT protocol to generate their separate transaction block. Run BFT Protocols Block I1 Block I2 Block I3 Broadcast all blocks

Analysis of The Straw-man Solution
Costs O(N2) messages to broadcast hash values. Not scalable! H (Coin || PK) O(C2) messages for BFT protocols So, lets analyze the costs of the strawman protocol. First, the cost to broadcast all hash values are O(N^2), which is not scalable, since N can be as large as ten thousands, even a million. The cost to run a BFT protocol in each of the committee is normally O(C^2), given the size of each committee is C, which is ok. However, after generating all blocks, if we still broadcast all blocks to the network and node still need to verify every of the blocks, then we are not better than just increasing the block size. So this is to show that, although with all the impractical assumptions made, the straw-man protocol is not efficient. Hence, designing a secure and efficient sharding protocol is very challenging. Run BFT Protocols Block I1 Block I2 Block I3 Download and verify O(N) blocks. Not Scalable! Broadcast all blocks

Our Solution: Elastico
Proof of Work Use PoW to establish identities Reduce cost to O(NC) messages Generate a set of different random numbers In our solution, namely Elastico, we address all these performance and security challenges. First, we use PoW to allows to to establish their identities. Second, we reduce the cost of committee assignment from O(N^2) to only O(NC), in which C is the committee size. We also generate a set of different random numbers to avoid the common random coin assumption. We also have an good mechanism to reconcile the results without broadcasting and verifying all data blocks. Reconcile results without verifying and broadcasting data blocks Block I1 Block I2 Block I3

I will next talk in detail how we do it in each of the step
ELASTICO in DETAIL

Step 1: Identity establishment
Solve PoW ID = H ( EpochRandomness, IP, Pubkey, Nonce) < D Random seed for the PoW For communication later Difficulty In step 1, we ask the nodes to establish their identities by solving PoW puzzles. PoW allows us to limit the number of identities that a node can create to the amount of computation that the node has. Basically the node has to search for a nonce value so that the output this hash function satisfies the above condition. The hash function takes several inputs. One of which is the EpochRandomness. This randomness is to initalize the hash function, I will discuss how we generate this randomness later, but for now you just assume that there exists one. The node also puts the IP and the Pubkey to the hash function, this is just for the communication later. D is some global parameter, which determines how much work a node has to do in order to establish its identitiy. As the output of this step, we have a list of ID, IP and Pubkey. No. ID IP Pubkey 1 000011…01 0x01aaff.. 2 000010…10 0x02adff.. .. … ….

Step 2: Assigning committees
Goals Fairly distribute identities to committees Guarantee at most 1/3 malicious (for PBFT) Use last k bits of the ID: 2k committees Finish solving PoW Each committee has at least C members 00001…100 00000…100 No. ID 1 00001…100 2 00000…000 3 00000…101 4 00001…111 .. … 00000…000 In step 2, we need to distribute these nodes in step 1 to committees. We want to assign the committee uniformly at random, so that we can guarantee that for every committee, at most 1/3 of the committee members are malicious with high probability. This is the condition to run any practical byzantine agreement protocol. Our solution is to use the last k bits of the node’s ID to determine its committee number ,assuming that there are 2^k committees in the network. For example, all the nodes with the last two bits in their id as 00 will belong to committee 1, and similarly for nodes with the last 2 bits as 01. The network finish step 1, or stop generating new identities when each of the committee has at least C members. However, knowing this condition is tricky. 00000…101 00001…001 00001…101

When each committee has at least C members?
Naïve solution: broadcast all identities O(N2) messages Doesn’t scale A naïve solution is to broadcast all identities to the network. This is costly because it requires O(N^2) messages, which is not scalable.

Solution: Use directory committee
First C identities become directory servers Nodes can have different C directories Latter nodes send IDs to directories Directories send committee list to nodes Once each has >= C members No. messages O(NC) Directory Server ID ID ID ID ID ID ID ID ID Directory Server ID Our solution here is to use a directory committee. So, we ask the first C identities in the network become the directory severs. These directory servers will help the network keep track of who belong to which committees. Note that nodes in the network do not have to agree on which C nodes are the directories servers. The latter nodes, once they find a valid PoW solution, will send their IDs to these directory servers. And the directory will check if every committee has at least C members, if so, they send the list of members in a committee to each of the node. This process takes only O(NC) messages.

Security guarantees Honest nodes in the same committee know each other
May include different malicious nodes Less than 1/3 malicious nodes In the union views of all committee members While the whole process is a bit more complicated, you can find more about it in the paper. There are several security guarantees that we provide. First, we guarantee that all honest nodes in a committee know each other, although they may have different malicious nodes in their committee list. Second, we guarantee that in each of the committee, with high probability, the at most 1/3 of committee members are malicious. This is the condition so that we can Run any byzantine agreement protocol.

Step 3: Propose a block within a committee
Run a classical Byzantine agreement protocol Members agree & sign on one valid data block No. of messages ≈ O(C2) Valid data blocks have 2C/3+1 signatures Header 1 Header 2 Header 3 TXs in Block 1 TXs in Block 2 TXs in Block 3 With these security guarantee, step 3 is very straight forward. Each of the committees run an instance of a byzantine agreement protocol. Committee members will agree and sign on a single data block of transactions. The block header includes the commitment of the data block and at least 2C/3 + 1 signatures from a committee. However, now we have to reconcile all data blocks efficiently. 00001…001 00000…10 00000…11 00000…101 00001…10 00001…11 … … … How to reconcile all data blocks?

Step 4: Final committee unions all blocks
Run BFT protocol to produce Final Block 00001…00 00000…00 Broadcast Final Block to everyone Final Block Final Block Final Block Only send block header In elastico, we introduce the final committee to union all data blocks in to a single final block. This final committee is selected in the same way as how we select other committees. Specifically, all nodes having the last two bits as 00 will belong to the final committee. The data committees will send only the block headers to the final committee. Sending the block header is enough because the final committee can check all the signatures included in the header, and can be guaranteed that all transactions in the data blocks are valid. They don’t need to download all data blocks from other committees. Final committee will then run a byzantine agreement protocol to produce a final block, this final block will be broadcast to everyone. Header 1 Header 2 Header 3 TXs in Data Block 3 TXs in Data Block 1 TXs in Data Block 2 00001…000 00000…100 00000…10 00001…10 00000…01 00001…01

Don’t need to verify TXs again
How about Data Blocks? Depends on the applications Broadcast Don’t Broadcast Verifying TX requires entire history of the blockchain Verifying TX is independent of history of the blockchain When Example ** You may wonder whether we need to broadcast the data blocks to the network. The answer is it really depends on the applications. For some applications, where verifying a transaction requires the entire history of the blockchain, we have to broadcast all data blocks. Examples of those applications are cryptocurrencies like Bitcoin and Ethereum. It is important to note that, in Elastico, once you receive a data block, you don’t have to verify all transactions include in the block. You only need to check if the data block is included in the final block. This reduces the verificaiton cost significantly compared to other protocols. ** On the other hand, we don’t have to broadcast data blocks at all in applications where verifying a transaction do not depend on the previous data on the blockchain. Examples of these appliactions are all the proof of publication chains, where one wants to prove to others that they have published or submitted some data to the blockchain at some specific time in the past. Our team is also working on a new cryptocurrency which is based on elastico and does not require broadcasting data blocks. We plan to release our results soon. Proof of Publication Chains Elastico-based Cryptocurrency Don’t need to verify TXs again

Step 5: Generate epoch randomness
H ( EpochRandomness, IP, Pubkey, Nonce) < D Goals Generate a fresh randomness for next epoch Adversary cannot control, predict or pre-compute H(.) Common approach: Use final block hash 00001…00 00000…00 Problem: adversary can predict the final block early!! Remember in step 1 we use the epoch randomness to initialize our hash function. In step 5 we ask the network to generate this epoch randomness. We want to generate this randomness afresh to prevent the adversary from controlling, predicting and pre-computing the hash function to get more advanrage than other. The common approach in many previous work is to use the final block hash. However this is problematic since the final committee will take some time to gnenerate the final block, thus the advesary can predict the final block early. Final Block

Solution: Generate Multiple Random Numbers
Agreement on a single random number is hard! Our approach Allow different random numbers picked by nodes Unbiased and verifiably random Pick Ri Each final committee member pick a random Ri Commit Include H(Ri ) in final block Release Broadcast Ri with final block Generate Nodes XOR any set of 2C/3 Ri to get randomness Our observation is that agreeing on a single random number is hard, so we gave up on that. In stead, our approach is to allow the nodes to have different random numbers, as long as they can prove that the numbers are unbiased and verifiably random. Specifically, we ask each of the final committee to pick a random number R_i before they start their byzantine agreement. They then include all the committeement of R_i, which are the hash values of R_i in the final block. When they broadcast the final block, they will broadcast the R_i value as well. Other nodes, when they start the new epoch, they just select a set of 2C/3 R_i value to generate their own random number. <simplify more>

Security Guarantee The randomness has bounded bias
Adversary can control up to 1/3 of Ri Choose the size of Ri properly Verifiable Randomness When send PoW solutions: say which Ri are used Others verify if the randomness is correctly constructed We provide several security guarantees here. First, all random numbers have bounded bias, its because the adversary can control up to 1/3 values of R_i. Hence if we chose the size of R_i properly, we can remove the bias. Second, all the random numbers are verifable. When a node send their PoW solution, they need to send the information which indicates which R_i are used to generate their randomness, thus other nodes can verify if their random number is generated correctly.

Evaluation Goals Implementation Setup Scalability property of Elastico
4000 C++ LoC, on top of Bitcoin Reuse Bitcoin Network implementation Setup Running on Amazon EC2 2 nodes per c4.large EC2 instances 1600 nodes (800 instances) in Oregon and California regions For evaluation, we want to evaluate the scalability of Elastico in open network. We implemented a prototype of Elastico on top of Bitcoin code base and run our prototype on Amazon EC2. We use up to 800 EC2 instances to run up to 1600 nodes.

Elastico: Scalability
Committee size: C = 100 Network size: N = Epoch time stays almost constant Near linear scalability For the setup, we choose the committee size as 100. Why 100, we explain in the paper. Its basically because of the PBFT performance in open network. We run a set of experiments to see that in our testing environment, C should be no more than 100. We varied the network size from 100 to 1600, so that we can produce from 1 to 16 data blocks per epoch. We first report the epoch time to generate all these blocks for every network size. The grey bar indicates the time taken for the nodes to to establish their identities and form all committees. The black bar is the time taken to run the consensus protocol within committees. We see that there is a slight increase in the time for committee formation. We explain why in the paper, but basically its because of the ball and bin problem. Because of this increase in the epoch time, elastico only achieve near-linear scalabiliy. Thered line in this graph is the block-rate improvement achieved by elastico.

Elastico: Bandwidth & Messages
Results Reduces slightly as more nodes in the network One committee runs as Data Committee and Final Committee In terms of number of messages and bandwidth per node. The left axis of the graph is for the number of messages, and the right one is for the bandwidth consumed per node. We see that both the costs reduce as we introduce more nodes in the network. Ideally the costs should stay almost constant. This reduction is purely because of our experiment setup where we have one committee runs as both data committee and final committee. So if we have more committees in the network, the cost for running the final committee is distributed to all new committees.

Conclusion Elastico Elastico: computationally scalable protocol
By sharding securely More computation power, higher transaction rate Evaluate in real network Up to 1600 nodes BFT Protocols Nakamoto Consensus Elastico So in conclusion, we introduce Elastico, which scales up the tx rate almost linearly to the computation power in the network. Our solution is based on sharding in open network. We evaluated our implemention in real network by running up to 1600 nodes.

Loi Luu loiluu@comp.nus.edu.sg
Q&A Loi Luu

Related work Bitcoin-NG & Ghost Lighting network Sidechains
Allow more blocks Does not separate consensus plane and data plane Lighting network Allows more micro transactions Does not solve scalability problem Sidechains Good for experimenting new blockchains Does not make Bitcoin scalable

Step 3: TX sets in data blocks are disjoint
Shard by TX’s ID In cryptocurrency: by TXs’ inputs Data Block 1 Data Block 2 Data Block 3 Data Block 4 No. TX ID 1 2 3 … 4 … No. TX ID 1 2 3 … 4 … No. TX ID 1 2 3 … 4 … No. TX ID 1 2 3 … 4 …

PBFT Performance: Bandwidth
Each node connects to 4 peers Bandwidth and no. of messages increases linearly

PBFT Performance: Latency
Latency increases linearly Quickly hits the constraint Practical committee size: C <=100

A SECURE SHARDING PROTOCOL FOR OPEN BLOCKCHAINS

Similar presentations

Presentation on theme: "A SECURE SHARDING PROTOCOL FOR OPEN BLOCKCHAINS"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

A SECURE SHARDING PROTOCOL FOR OPEN BLOCKCHAINS

Similar presentations

Presentation on theme: "A SECURE SHARDING PROTOCOL FOR OPEN BLOCKCHAINS"— Presentation transcript:

Similar presentations

About project

Feedback