Prophecy: Using History for High- Throughput Fault Tolerance Siddhartha Sen Joint work with Wyatt Lloyd and Mike Freedman Princeton University.

Slides:



Advertisements
Similar presentations
Wyatt Lloyd * Michael J. Freedman * Michael Kaminsky David G. Andersen * Princeton, Intel Labs, CMU Dont Settle for Eventual : Scalable Causal Consistency.
Advertisements

Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, Hari Balakrishnan MIT and Berkeley presented by Daniel Figueiredo Chord: A Scalable Peer-to-peer.
High throughput chain replication for read-mostly workloads
Henry C. H. Chen and Patrick P. C. Lee
SPORC: Group Collaboration using Untrusted Cloud Resources Ariel J. Feldman, William P. Zeller, Michael J. Freedman, Edward W. Felten Published in OSDI’2010.
Web Server Benchmarking Using the Internet Protocol Traffic and Network Emulator Carey Williamson, Rob Simmonds, Martin Arlitt et al. University of Calgary.
ZHT 1 Tonglin Li. Acknowledgements I’d like to thank Dr. Ioan Raicu for his support and advising, and the help from Raman Verma, Xi Duan, and Hui Jin.
1 Attested Append-Only Memory: Making Adversaries Stick to their Word Byung-Gon Chun (ICSI) October 15, 2007 Joint work with Petros Maniatis (Intel Research,
Distributed Systems Fall 2010 Replication Fall 20105DV0203 Outline Group communication Fault-tolerant services –Passive and active replication Highly.
Web Caching Schemes1 A Survey of Web Caching Schemes for the Internet Jia Wang.
Computer Science Lecture 14, page 1 CS677: Distributed OS Consistency and Replication Today: –Introduction –Consistency models Data-centric consistency.
2/18/2004 Challenges in Building Internet Services February 18, 2004.
2/23/2009CS50901 Implementing Fault-Tolerant Services Using the State Machine Approach: A Tutorial Fred B. Schneider Presenter: Aly Farahat.
Distributed Lookup Systems
Attested Append-only Memory: Making Adversaries Stick to their Word Distributed Storage Systems CS presented by: Hussam Abu-Libdeh.
Distributed Systems Fall 2009 Replication Fall 20095DV0203 Outline Group communication Fault-tolerant services –Passive and active replication Highly.
Computer Science Lecture 14, page 1 CS677: Distributed OS Consistency and Replication Introduction Consistency models –Data-centric consistency models.
Introspective Replica Management Yan Chen, Hakim Weatherspoon, and Dennis Geels Our project developed and evaluated a replica management algorithm suitable.
Squirrel: A decentralized peer- to-peer web cache Paul Burstein 10/27/2003.
Wide-area cooperative storage with CFS
1 A Framework for Highly Available Services Based on Group Communication Alan Fekete Idit Keidar University of Sidney MIT.
Byzantine fault tolerance
PNUTS: YAHOO!’S HOSTED DATA SERVING PLATFORM FENGLI ZHANG.
Distributed Data Stores – Facebook Presented by Ben Gooding University of Arkansas – April 21, 2015.
Highly Available ACID Memory Vijayshankar Raman. Introduction §Why ACID memory? l non-database apps: want updates to critical data to be atomic and persistent.
Supporting Strong Cache Coherency for Active Caches in Multi-Tier Data-Centers over InfiniBand S. Narravula, P. Balaji, K. Vaidyanathan, S. Krishnamoorthy,
Word Wide Cache Distributed Caching for the Distributed Enterprise.
FARSITE: Federated, Available, and Reliable Storage for an Incompletely Trusted Environment.
Replicated Data Management DISHELT FRANCISCO TORRES PAZ.
Consistency and Replication CSCI 4780/6780. Chapter Outline Why replication? –Relations to reliability and scalability How to maintain consistency of.
Fault Tolerance via the State Machine Replication Approach Favian Contreras.
Low-Overhead Byzantine Fault-Tolerant Storage James Hendricks, Gregory R. Ganger Carnegie Mellon University Michael K. Reiter University of North Carolina.
Copyright © George Coulouris, Jean Dollimore, Tim Kindberg This material is made available for private study and for direct.
Practical Byzantine Fault Tolerance
From Viewstamped Replication to BFT Barbara Liskov MIT CSAIL November 2007.
1 ZYZZYVA: SPECULATIVE BYZANTINE FAULT TOLERANCE R.Kotla, L. Alvisi, M. Dahlin, A. Clement and E. Wong U. T. Austin Best Paper Award at SOSP 2007.
Byzantine fault tolerance
Fast Crash Recovery in RAMCloud. Motivation The role of DRAM has been increasing – Facebook used 150TB of DRAM For 200TB of disk storage However, there.
S-Paxos: Eliminating the Leader Bottleneck
Distributed Systems CS Consistency and Replication – Part IV Lecture 21, Nov 10, 2014 Mohammad Hammoud.
1 Secure Peer-to-Peer File Sharing Frans Kaashoek, David Karger, Robert Morris, Ion Stoica, Hari Balakrishnan MIT Laboratory.
Hwajung Lee.  Improves reliability  Improves availability ( What good is a reliable system if it is not available?)  Replication must be transparent.
Systems Research Barbara Liskov October Replication Goal: provide reliability and availability by storing information at several nodes.
Replication Improves reliability Improves availability ( What good is a reliable system if it is not available?) Replication must be transparent and create.
Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition, Lecture 24: GFS.
Fail-Stop Processors UNIVERSITY of WISCONSIN-MADISON Computer Sciences Department CS 739 Distributed Systems Andrea C. Arpaci-Dusseau One paper: Byzantine.
CalvinFS: Consistent WAN Replication and Scalable Metdata Management for Distributed File Systems Thomas Kao.
Gargamel: A Conflict-Aware Contention Resolution Policy for STM Pierpaolo Cincilla, Marc Shapiro, Sébastien Monnet.
BChain: High-Throughput BFT Protocols
CS6320 – Performance L. Grewe.
Principles of Computer Security
Be Fast, Cheap and in Control
EECS 498 Introduction to Distributed Systems Fall 2017
Providing Secure Storage on the Internet
Implementing Consistency -- Paxos
Principles of Computer Security
Building Peer-to-Peer Systems with Chord, a Distributed Lookup Service
Replication Improves reliability Improves availability
Consistency and Replication
Distributed Systems CS
From Viewstamped Replication to BFT
Prophecy: Using History for High-Throughput Fault Tolerance
Scalable Causal Consistency
Fault-Tolerant State Machine Replication
Distributed Systems CS
Architectures of distributed systems
The SMART Way to Migrate Replicated Stateful Services
Performance-Robust Parallel I/O
Implementing Consistency -- Paxos
Sisi Duan Assistant Professor Information Systems
Presentation transcript:

Prophecy: Using History for High- Throughput Fault Tolerance Siddhartha Sen Joint work with Wyatt Lloyd and Mike Freedman Princeton University

Non-crash failures happen Model as Byzantine (malicious)

Mask Byzantine faults Clients Service

Mask Byzantine faults Replicated service Clients Throughput

Mask Byzantine faults Replicated service Clients Throughput Linearizability (strong consistency)

Byzantine fault tolerance (BFT) Low throughput Modifies clients Long-lived sessions

Prophecy High throughput + good consistency No free lunch: – Read-mostly workloads – Slightly weakened consistency

Byzantine fault tolerance (BFT) Low throughput Modifies clients Long-lived sessions D-Prophecy Prophecy

Traditional BFT reads Clients Replica Group application Agree?

A cache solution Clients Replica Group application cache Agree?

A cache solution Clients Replica Group application cache Agree? Problems: Huge cache Invalidation

A compact cache Clients Replica Group application cache RequestsResponses req1resp1 req2resp2 req3resp3

A compact cache Clients Replica Group application cache RequestsResponses sketch(req1)sketch(resp1) sketch(req2)sketch(resp2) sketch(req3)sketch(resp3) RequestsResponses

A sketcher Clients Replica Group application sketcher

A sketcher Clients Replica Group …… …… …… sketchwebpage

Executing a read Clients Replica Group …… ………… …… Agree? Fast, load-balanced reads sketchwebpage

Executing a read Clients Replica Group …… …… …… …… Agree? sketchwebpage

Executing a read Clients Replica Group …… …… …… sketchwebpage key-value store replicated state machine

Executing a read Clients Replica Group …… …… …… Agree? sketchwebpage …… Maintain a fresh cache

NO! Did we achieve linearizability?

Executing a read Clients Replica Group …… …… …… sketchwebpage ……

Executing a read Clients Replica Group …… …… …… Agree? sketchwebpage ……

Executing a read Clients Replica Group …… …… …… Agree? sketchwebpage …… Fast reads may be stale

Load balancing Clients Replica Group …… …… …… Agree? sketchwebpage …… Pr(k stale) = g k

Traditional BFT: Each replica executes read Linearizability D-Prophecy: One replica executes read “Delay-once” linearizability D-Prophecy vs. BFT Clients Replica Group

Byzantine fault tolerance (BFT) Low throughput Modifies clients Long-lived sessions D-Prophecy Prophecy

Key-exchange overhead 11% 3%

Internet services Clients Replica Group

Sketcher A proxy solution Clients Replica Group Proxy Consolidate sketchers

Sketcher Clients Replica Group Trusted A proxy solution Sketcher must be fail-stop

Sketcher Clients Replica Group Trusted A proxy solution Trust middlebox already Small and simple

…… …… Sketcher …… Executing a read Clients Replica Group Trusted q ReqResp s(q)s(q)  …… …… …… Fast, load-balanced reads Prophecy

Sketcher Clients Replica Group Trusted ………… …… …… …… …… …… Fast reads may be stale …… ReqResp s(q)s(q) 

Delay-once linearizability

 W, R, W, R, W, R  Delay-once linearizability Read-after-write property

 W, R, W, W, R, R, W, R  Delay-once linearizability Read-after-write property

Example application Upload embarrassing photos 1. Remove colleagues from ACL 2. Upload photos 3. (Refresh) Weak may reorder Delay-once preserves order

Byzantine fault tolerance (BFT) Low throughput Modifies clients Long-lived sessions D-Prophecy Prophecy

Implementation Modified PBFT – PBFT is stable, complete – Competitive with Zyzzyva et. al. C++, Tamer async I/O – Sketcher:  2000 LOC – PBFT library:  1140 LOC – PBFT client:  1000 LOC

Evaluation Prophecy vs. proxied-PBFT – Proxied systems D-Prophecy vs. PBFT – Non-proxied systems

Evaluation Prophecy vs. proxied-PBFT – Proxied systems We will study: – Performance on “null” workloads – Performance with real replicated service – Where system bottlenecks, how to scale

Basic setup Sketcher Clients (100) Replica Group (PBFT) (concurrent)

Fraction of failed fast reads Alexa top sites: < 15%

Small benefit on null reads

Apache webserver setup Sketcher Clients Replica Group

Large benefit on real workload 3.7x 2.0x

Benefit grows with work 94  s (Apache) Null workloads are misleading!

Benefit grows with work

Single sketcher bottlenecks

Scaling out

Scales linearly with replicas

Summary Prophecy good for Internet services – Fast, load-balanced reads D-Prophecy good for traditional services Prophecy scales linearly while PBFT stays flat Limitations: – Read-mostly workloads (meas. study corroborates) – Delay-once linearizability (useful for many apps)

Thank You

Additional slides

Transitions Prophecy good for read-mostly workloads Are transitions rare in practice?

Measurement study Alexa top sites Access main page every 20 sec for 24 hrs

Mostly static content

15%

Dynamic content Rabin fingerprinting on transitions 43% differ by single contiguous change Sampled 4000 of them, over half due to: – Load balancing directives – Random IDs in links, function parameters