Download presentation
Presentation is loading. Please wait.
Published byBrett Briggs Modified over 9 years ago
1
BFTCloud: A Byzantine Fault Tolerance Framework for Voluntary-Resource Cloud Computing Yilei Zhang, Zibin Zheng, and Michael R. Lyu {ylzhang,zbzheng,lyu}@cse.cuhk.edu.hk Department of Computer Science & Engineering The Chinese University of Hong Kong Hong Kong, China School of Computer Science National University of Defence Technology Changsha, China CLOUD 2011, Washington DC, USA, July 4 - 9, 2011
2
Outline Introduction System Architecture System Design Experiments Conclusion 2
3
Cloud Computing Cloud computing provides a model for enabling convenient, on-demand network access to a shared pool of computing resources : Networks Servers Databases Services Voluntary-resource infrastructure which consists of numerous user- contributed computing resources 3
4
Cloud Applications Building on a number of distributed cloud components Large-scale Complicated Time sensitive High-quality Case 1: New York Times Used EC2 and S3 to convert 15 million scanned news articles to PDF (4TB data) 100 Linux computers 24 hours Case 2: Nasdaq Uses S3 to deliver historic stock and fund information Millions of files showing price changes of entities over 10 minute segments 4
5
Reliability of Cloud Applications The reliability of cloud applications is greatly influenced by the reliability of cloud modules Traditional testing has limited improvement on the reliability of a cloud module under voluntary-resource cloud infrastructure: – Computing resources, denoted as nodes in the cloud, are frangible – Communication links between modules are not reliable Our Goal: It is extremely urgent to design a fault tolerance mechanism for handling different faults under voluntary-resource cloud infrastructure 5
6
BFTCloud BFTCloud uses replication techniques for overcoming failures since a broad pool of nodes are available in the cloud BFTCloud guarantees robustness of systems when up to of totally 3 + 1 resource providers are faulty BFTCloud can tolerant different types of failures: – Crash – Network faults, like disconnection – Byzantine faults,like malicious behaviors – Etc… 6
7
System Architecture 7
8
8
9
Work Procedures of BFTCloud 9
10
Primary Selection 10
11
Replica Selection QoS Score: Failure Probability of a BFT group: Replica selection problem: 11
12
Request Execution 1.The cloud module first forms a request sequence and sends the sequence of requests to the primary. 2.The primary will order the requests and forward the ordered requests to all the BFT group members. 3.Each member of the BFT group will execute the sequence of requests and send the corresponding responses back to the cloud module. 4.The cloud module collects all the received responses from the BFT group members and make a judgment on the consistence of responses. 12
13
Consistence Judgment Case 1: The cloud module receives 3 + 1 consist responses from the BFT group. No fault happens in the current BFT group. Case 2: The cloud module receives between 2 +1 to 3 consist responses. Less than + 1 faults happened. Case 3: The cloud module receives less than 2 + 1 response messages. Either the primary is faulty or more than +1 replicas are faulty. Case 4: The cloud module receives more than 2 + 1 responses, but fewer than + 1 responses are consistency. This indicates inconsistent ordering of requests by the primary 13
14
Primary Updating A replica which suspects the primary to be faulty sends an primary election proposal to all the other replicas. If a replica receives + 1 primary election proposals, it indicates that the primary is really faulty. It will send a primary selection request to the cloud module. The cloud module then will start the primary selection phase and return a new primary which is one of the current replicas. 14
15
Replica Updating The failure probability of a BFT group under the condition that a set of replicas are already faulty is: The new BFT group ′, which can tolerate up to ′nodes failure, should satisfy ′ > 0. Therefore, the replica updating problem is reduced to a replication degree decision problem: 15
16
Experimental Setup We have implemented our BFTCloud approach by Java language and deployed it as a middleware in a voluntary-resource cloud environment The cloud infrastructure is constructed by 257 distributed computers located in 26 countries from Planet-lab In our experiments, each node in the cloud is configured with a random malicious failure probability, which denotes the probability of arbitrary behavior happens in the node Each node keeps the QoS information of all the other nodes and updates the information periodically. 16
17
Performance Comparison NoFT: No fault tolerance strategy is employed for task execution in the voluntary-resource cloud. Zyzzyva: A state-of-the-art Byzantine Fault tolerance approach. The cloud module sends requests to a fixed primary and a group of replicas. There is no mechanism designed for adopting the dynamic voluntary-resource cloud environment. BFTCloud: The Byzantine Fault tolerance framework proposed in this paper. The cloud module mask faults and adopt the highly dynamic voluntary-resource environment. BFTRandom: The framework is the same with BFTCloud. However, this approach just randomly selects nodes in primary selection, replica selection, primary updating, and replica updating phases. 17
18
Experimental Results 18
19
Experimental Results 19
20
Conclusion We identify the Byzantine fault tolerance problem in voluntary-resource cloud and propose a Byzantine fault tolerance framework, named BFTCloud, for guaranteeing the robustness of cloud application We have implemented the BFTCloud system and test it on a voluntary-resource cloud We conduct large-scale real-world experiments to study the performance of BFTCloud on reliability improvement compared with other approaches 20
21
Thank you! Email: ylzhang@cse.cuhk.edu.hk 21
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.