Commensal Cuckoo: Secure Group Partitioning for Large-Scale Services Siddhartha Sen and Mike Freedman Princeton University.

Slides:



Advertisements
Similar presentations
Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, Hari Balakrishnan MIT and Berkeley presented by Daniel Figueiredo Chord: A Scalable Peer-to-peer.
Advertisements

Byzantine Generals. Outline r Byzantine generals problem.
TAP: A Novel Tunneling Approach for Anonymity in Structured P2P Systems Yingwu Zhu and Yiming Hu University of Cincinnati.
Storage management and caching in PAST, a large-scale, persistent peer-to-peer storage utility Antony Rowstron, Peter Druschel Presented by: Cristian Borcea.
Distribution and Revocation of Cryptographic Keys in Sensor Networks Amrinder Singh Dept. of Computer Science Virginia Tech.
Kademlia: A Peer-to-peer Information System Based on the XOR Metric Petar Mayamounkov David Mazières A few slides are taken from the authors’ original.
Chord: A scalable peer-to- peer lookup service for Internet applications Ion Stoica, Robert Morris, David Karger, M. Frans Kaashock, Hari Balakrishnan.
Predicting Tor Path Compromise by Exit Port IEEE WIDA 2009December 16, 2009 Kevin Bauer, Dirk Grunwald, and Douglas Sicker University of Colorado Client.
CSE 486/586, Spring 2013 CSE 486/586 Distributed Systems Byzantine Fault Tolerance Steve Ko Computer Sciences and Engineering University at Buffalo.
Fabian Kuhn, Microsoft Research, Silicon Valley
Common approach 1. Define space: assign random ID (160-bit) to each node and key 2. Define a metric topology in this space,  that is, the space of keys.
Cuckoo Hashing : Hardware Implementations Adam Kirsch Michael Mitzenmacher.
Tirgul 10 Rehearsal about Universal Hashing Solving two problems from theoretical exercises: –T2 q. 1 –T3 q. 2.
Peer to Peer File Sharing Huseyin Ozgur TAN. What is Peer-to-Peer?  Every node is designed to(but may not by user choice) provide some service that helps.
Pastry: Scalable, decentralized object location and routing for large-scale peer-to-peer systems Antony Rowstron and Peter Druschel Proc. of the 18th IFIP/ACM.
Robust Mixing for Structured Overlay Networks Christian Scheideler Institut für Informatik Technische Universität München.
Random Key Predistribution Schemes for Sensor Networks Authors: Haowen Chan, Adrian Perrig, Dawn Song Carnegie Mellon University Presented by: Johnny Flowers.
Distributed Lookup Systems
Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek and Hari alakrishnan.
Distributed Systems Fall 2009 Replication Fall 20095DV0203 Outline Group communication Fault-tolerant services –Passive and active replication Highly.
Timed Quorum Systems … for large-scale and dynamic environments Vincent Gramoli, Michel Raynal.
Localized Self- healing using Expanders Gopal Pandurangan Nanyang Technological University, Singapore Amitabh Trehan Technion - Israel Institute of Technology,
Searching in Unstructured Networks Joining Theory with P-P2P.
Wide-area cooperative storage with CFS
Correctness of Gossip-Based Membership under Message Loss Maxim Gurevich, Idit Keidar Technion.
 Structured peer to peer overlay networks are resilient – but not secure.  Even a small fraction of malicious nodes may result in failure of correct.
Distributed Data Stores – Facebook Presented by Ben Gooding University of Arkansas – April 21, 2015.
Symmetric Replication in Structured Peer-to-Peer Systems Ali Ghodsi, Luc Onana Alima, Seif Haridi.
Developing Analytical Framework to Measure Robustness of Peer-to-Peer Networks Niloy Ganguly.
GeoGrid: A scalable Location Service Network Authors: J.Zhang, G.Zhang, L.Liu Georgia Institute of Technology presented by Olga Weiss Com S 587x, Fall.
Towards Scalable and Robust Overlay Networks Christian Scheideler Institut für Informatik Technische Universität München Baruch Awerbuch Dept. of Computer.
Failure Resilience in the Peer-to-Peer-System OceanStore Speaker: Corinna Richter.
1 The Design of a Robust Peer-to-Peer System Rodrigo Rodrigues, Barbara Liskov, Liuba Shrira Presented by Yi Chen Some slides are borrowed from the authors’
BFTCloud: A Byzantine Fault Tolerance Framework for Voluntary-Resource Cloud Computing Yilei Zhang, Zibin Zheng, and Michael R. Lyu
1 Distributed Hash Tables (DHTs) Lars Jørgen Lillehovde Jo Grimstad Bang Distributed Hash Tables (DHTs)
Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, Hari Balakrishnan MIT and Berkeley presented by Daniel Figueiredo Chord: A Scalable Peer-to-peer.
Practical Byzantine Fault Tolerance
CIS 640-2, Presenter: Yun Mao1 Security for Structured Peer- to-peer Overlay Networks By Miguel Castro et al. OSDI ’ 02 Presented by Yun Mao in CIS640.
Byzantine fault-tolerance COMP 413 Fall Overview Models –Synchronous vs. asynchronous systems –Byzantine failure model Secure storage with self-certifying.
A. Haeberlen Fault Tolerance and the Five-Second Rule 1 HotOS XV (May 18, 2015) Ang Chen Hanjun Xiao Andreas Haeberlen Linh Thi Xuan Phan Department of.
1 Secure Peer-to-Peer File Sharing Frans Kaashoek, David Karger, Robert Morris, Ion Stoica, Hari Balakrishnan MIT Laboratory.
Chord Advanced issues. Analysis Theorem. Search takes O (log N) time (Note that in general, 2 m may be much larger than N) Proof. After log N forwarding.
Chord Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Mike Burrows, Tushar Chandra, Andrew Fikes, Robert E. Gruber Google,
EEC 688/788 Secure and Dependable Computing Lecture 10 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 2: Distributed Hash.
Chord Advanced issues. Analysis Search takes O(log(N)) time –Proof 1 (intuition): At each step, distance between query and peer hosting the object reduces.
UNIVERSITY of WISCONSIN-MADISON Computer Sciences Department
Building Dependable Distributed Systems, Copyright Wenbing Zhao
Towards a Scalable and Robust DHT Baruch Awerbuch Johns Hopkins University Christian Scheideler Technical University of Munich.
Robust Random Number Generation for Peer-to-Peer Systems Baruch Awerbuch Johns Hopkins University Christian Scheideler Technical University of Munich.
Fail-Stop Processors UNIVERSITY of WISCONSIN-MADISON Computer Sciences Department CS 739 Distributed Systems Andrea C. Arpaci-Dusseau One paper: Byzantine.
Chord: A Scalable Peer-to-Peer Lookup Service for Internet Applications * CS587x Lecture Department of Computer Science Iowa State University *I. Stoica,
CS791Aravind Elango Maintenance-Free Global Data Storage Sean Rhea, Chris Wells, Patrick Eaten, Dennis Geels, Ben Zhao, Hakim Weatherspoon and John Kubiatowicz.
EEC 688/788 Secure and Dependable Computing Lecture 10 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University
Christian Scheideler Dept. of Computer Science
(slides by Nick Feamster)
Providing Secure Storage on the Internet
Chord Advanced issues.
Jacob Gardner & Chuan Guo
Chord Advanced issues.
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
Prophecy: Using History for High-Throughput Fault Tolerance
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
Presentation transcript:

Commensal Cuckoo: Secure Group Partitioning for Large-Scale Services Siddhartha Sen and Mike Freedman Princeton University

Shard data/ functionality Scalable peer-to-peer service untrusted participants Peer-to-peer service Clients

untrusted participants f < 1/3 Mask failures with replication How do we make it reliable? F < 1/4 Clients f < 1/3 Byzantine Fault Tolerant (BFT) Scalable peer-to-peer service Observe: F  f Want small groups

Prior work using many small groups Systems: – [Rampart95], [SecureRing98], [OceanStore00], [Farsite02], [CastroDGRW02], [Rosebud03], [Myrmic06], [Fireflies06], [Salsa06], [SinghNDW06], [Halo08], [Flightpath08], [Shadowwalker09], [Census09] Theory: – [HildrumK03], [NaorW07] Problem: Assume randomly or perfectly distributed faults (i.e., static)

Rosebud [RL03] 1 0 Consistent hashing ring BFT group

Rosebud [RL03] 1 0 F = f < 1/3 Unrealistic: Don’t know faulty nodes Best case is uniformly random   (1) faults per group Real adversary is dynamic!

Join-leave attack 1 0 leave join Vanish system compromised by join-leave attack (2010) f > 1/3

[FiatSY05], [AwerbuchS04], [Scheideler05] State-of-the-art is cuckoo rule [AwerbuchS06, AwerbuchS07] Prior work tolerating join-leave attacks Problems: Impractical (large constant factors) Groups must be impractically large or F trivially low

Contributions: – Demonstrate failures of prior work – Analyze and understand failures – Devise algorithm that overcomes them Assumptions – Correct nodes randomly distributed and stable – Adversary controls global fraction F of nodes in system, rejoins them maliciously – System fails when one group fails, i.e. f  1/3 Goal: Provably secure + practical group partitioning scheme

Cuckoo rule (CR) [AS06] F < f < 1/

For poly(n) rounds, all regions of size O(log n)/n have: O(log n) nodes f < 1/3 Cuckoo rule (CR) [AS06] 1 0 leave join random location in [0,1) k-region primary join secondary join random locations in [0,1) Adversary strategy: rejoin from least faulty group

Cuckoo rule (CR) [AS06] In summary: 1.On primary join, cuckoo (evict) nodes in immediate k- region to selected random ID 2.Select new random IDs for cuckood nodes, join them as secondary joins (i.e., no subsequent cuckoos) Ignore implementation issues: – Route messages securely – Verify messages from other groups – Bootstrap the system, handle heavy churn

CR tolerates very few faults in practice NGlobal fault % (F) Group size = 64, Rounds = 100,000

What if we allow larger groups? NGroup size F = 10% Group size F = 20% Increased group size in powers of 2

CR: Evolution of a faulty group N = 4096, F  5%, Group size = 64, k = 4 Expected faulty fraction per group

closely-spaced primary joins = bad news faulty group! Why does this happen? 1 0 primary joins create holes empty k-regions cuckoo less 12 34

CR: Cuckoo size is erratic Expected cuckoo size N = 4096, F  5%, Group size = 64, k = 4 holes clumps

CR: Primary join spacing is erratic Expected secondary joins N = 4096, F  5%, Group size = 64, k = 4

Cuckoo rule is “parasitic”

New algorithm (Fixing CR) 1)Holes and clumpiness: Cuckoo k nodes chosen randomly from group Scale k relative to average group size (larger groups cuckoo more, smaller groups cuckoo less) 2)Inconsistently spaced primary joins: Group vets join attempt, deny if insufficient secondary joins since last primary join

“Commensal” cuckoo rule Com  men  sal  ism. A symbiotic relationship in which one organism derives benefit while causing little or no harm to the other.

too few secondary joins Commensal cuckoo rule (CCR) 1 0 cuckoo k random nodes (recall CR cuckood only 1 node) holes don’t matter primary join accepted received secondary join!

Commensal cuckoo rule (CCR) In summary: 1.On primary join to selected random ID, if fewer than  k secondary joins since last primary join, start over with new random ID 2.Otherwise, cuckoo k nodes weighted by group size, join them as secondary joins (i.e., no subsequent cuckoos)

Techniques are synergistic Join vetting forces adversary to join distinct groups  all groups joined (roughly) Weighted cuckoos ensure sufficient secondary joins  O(1) join attempts needed

Cuckoo size is consistent CR: CCR:

CCR: Primary join spacing is consistent

CCR tolerates significantly more faults NGlobal fault % (CR) Global fault % (CCR) Gain x x x x x f < 1/3

NGlobal fault % (CR) Global fault % (CCR) Gain x x x x x CCR tolerates significantly more faults f < 1/2 How to use BFT with f < 1/2? Idea: Separate correctness from availability Group is correct, but unresponsive Use other groups to revive group!

Join vetting has deeper benefits Security vulnerability in CR: adversary retries a primary join (w/o causing cuckoos) until gets location it likes CCR avoids problem: group won’t accept primary join if insufficient secondary joins – Don’t care how many previous attempts or where

Summary CR suffers from random bad events, which CCR avoids by derandomizing – Cuckoos weighted by group size – Primary join attempts vetted by groups CCR tolerates F  7% for f < 1/3 F  18% for f < 1/2

Extensions (A complete solution) Route messages securely – O(1)-hop routing Verify messages from other groups – Distributed key generation, threshold signatures  constant public/private key per group Bootstrap the system, handle heavy churn – Choose target group size at onset (e.g. 64); Split/merge locally Handle DoS and data layer attacks – Reactive approach, e.g. reactive replication

Conclusion Secure group membership partitioning for open P2P systems Most previous systems assumed (impossible) perfect distribution, ignored join-leave attacks CCR can handle much higher fractions of faulty nodes than prior algorithms