Navigating in the Dark: New Options for Building Self- Configuring Embedded Systems Ken Birman Cornell University.

Slides:

Advertisements

Similar presentations

Performance Testing - Kanwalpreet Singh.

Advertisements

P2P data retrieval DHT (Distributed Hash Tables) Partially based on Hellerstein’s presentation at VLDB2004.

Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, Hari Balakrishnan MIT and Berkeley presented by Daniel Figueiredo Chord: A Scalable Peer-to-peer.

Pastry Peter Druschel, Rice University Antony Rowstron, Microsoft Research UK Some slides are borrowed from the original presentation by the authors.

Peer-to-Peer Systems Chapter 25. What is Peer-to-Peer (P2P)? Napster? Gnutella? Most people think of P2P as music sharing.

Ken Birman Cornell University. CS5410 Fall

Optimizing Buffer Management for Reliable Multicast Zhen Xiao AT&T Labs – Research Joint work with Ken Birman and Robbert van Renesse.

Reliable Group Communication Quanzeng You & Haoliang Wang.

Denial-of-Service Resilience in Peer-to-Peer Systems D. Dumitriu, E. Knightly, A. Kuzmanovic, I. Stoica and W. Zwaenepoel Presenter: Yan Gao.

Distributed Systems 2006 Group Communication I * *With material adapted from Ken Birman.

Small-world Overlay P2P Network

Using Gossip to Build Scalable Services Ken Birman, CS514 Dept. of Computer Science Cornell University.

Future Usage Environments & Systems Integration November 16 th 2004 HCMDSS planning workshop Douglas C. Schmidt (moderator) David Forslund, Cognition Group.

Topics in Reliable Distributed Systems Lecture 2, Fall Dr. Idit Keidar.

Reliable Distributed Systems Astrolabe. Massive scale. Constant flux Source: Burch and Cheswick The Internet.

Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek and Hari alakrishnan.

Astrolabe Serge Kreiker. Problem Need to manage large collections of distributed resources (Scalable system) The computers may be co-located in a room,

Distributed Systems 2006 Group Membership * *With material adapted from Ken Birman.

Topics in Reliable Distributed Systems Fall Dr. Idit Keidar.

Ken Birman Cornell University. CS5410 Fall

Epidemic Techniques Chiu Wah So (Kelvin). Database Replication Why do we replicate database? – Low latency – High availability To achieve strong (sequential)

Peer To Peer Distributed Systems Pete Keleher. Why Distributed Systems? l Aggregate resources! –memory –disk –CPU cycles l Proximity to physical stuff.

Wide-area cooperative storage with CFS

What Can Databases Do for Peer-to-Peer Steven Gribble, Alon Halevy, Zachary Ives, Maya Rodrig, Dan Suciu Presented by: Ryan Huebsch CS294-4 P2P Systems.

Internet Indirection Infrastructure (i3) Ion Stoica, Daniel Adkins, Shelley Zhuang, Scott Shenker, Sonesh Surana UC Berkeley SIGCOMM 2002.

Introduction to Peer-to-Peer Networks. What is a P2P network Uses the vast resource of the machines at the edge of the Internet to build a network that.

EPIDEMIC TECHNIQUES Ki Suh Lee. OUTLINE Epidemic Protocol Epidemic Algorithms for Replicated Database Maintenance Astrolabe: A Robust and scalable technology.

Freenet. Anonymity  Napster, Gnutella, Kazaa do not provide anonymity  Users know who they are downloading from  Others know who sent a query  Freenet.

Communication (II) Chapter 4

Developing Analytical Framework to Measure Robustness of Peer-to-Peer Networks Niloy Ganguly.

Introduction to Peer-to-Peer Networks. What is a P2P network A P2P network is a large distributed system. It uses the vast resource of PCs distributed.

A Randomized Error Recovery Algorithm for Reliable Multicast Zhen Xiao Ken Birman AT&T Labs – Research Cornell University.

Collaborative Content Delivery Werner Vogels Robbert van Renesse, Ken Birman Dept. of Computer Science, Cornell University A peer-to-peer solution for.

HERO: Online Real-time Vehicle Tracking in Shanghai Xuejia Lu 11/17/2008.

Chord & CFS Presenter: Gang ZhouNov. 11th, University of Virginia.

Jonathan Walpole CSE515 - Distributed Computing Systems 1 Teaching Assistant for CSE515 Rahul Dubey.

COMP 410 Update. The Problems Story Time! Describe the Hurricane Problem Do this with pictures, lots of people, a hurricane, trucks, medicine all disconnected.

Chord: A Scalable Peer-to-peer Lookup Protocol for Internet Applications Xiaozhou Li COS 461: Computer Networks (precept 04/06/12) Princeton University.

Reliable Distributed Systems Astrolabe. Massive scale. Constant flux Source: Burch and Cheswick The Internet.

Scalable Web Server on Heterogeneous Cluster CHEN Ge.

1 Distributed Hash Tables (DHTs) Lars Jørgen Lillehovde Jo Grimstad Bang Distributed Hash Tables (DHTs)

Approved for Public Release, Distribution Unlimited QuickSilver: Middleware for Scalable Self-Regenerative Systems Cornell University Ken Birman, Johannes.

Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, Hari Balakrishnan MIT and Berkeley presented by Daniel Figueiredo Chord: A Scalable Peer-to-peer.

Objectives Functionalities and services Architecture and software technologies Potential Applications –Link to research problems.

Ensemble and Beyond Presentation to David Tennenhouse, DARPA ITO Ken Birman Dept. of Computer Science Cornell University.

An IP Address Based Caching Scheme for Peer-to-Peer Networks Ronaldo Alves Ferreira Joint work with Ananth Grama and Suresh Jagannathan Department of Computer.

Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications.

CS5412: BIMODAL MULTICAST ASTROLABE Ken Birman CS5412 Spring Lecture XIX.

Leiden; Dec 06Gossip-Based Networking Workshop1 Epidemic Algorithms and Emergent Shape Ken Birman.

Leiden; Dec 06Gossip-Based Networking Workshop1 Gossip Algorithms and Emergent Shape Ken Birman.

2007/1/15http:// Lightweight Probabilistic Broadcast M2 Tatsuya Shirai M1 Dai Saito.

Scalable Self-Repairing Publish/Subscribe Robbert van Renesse Ken Birman Werner Vogels Cornell University.

1 Secure Peer-to-Peer File Sharing Frans Kaashoek, David Karger, Robert Morris, Ion Stoica, Hari Balakrishnan MIT Laboratory.

CS5412: BIMODAL MULTICAST ASTROLABE Ken Birman Gossip-Based Networking Workshop 1 Lecture XIX Leiden; Dec 06.

BIMODAL MULTICAST ASTROLABE Ken Birman 1 CS6410. Gossip  Recall from early in the semester that gossip spreads in log(system size) time  But is.

LOOKING UP DATA IN P2P SYSTEMS Hari Balakrishnan M. Frans Kaashoek David Karger Robert Morris Ion Stoica MIT LCS.

Two Peer-to-Peer Networking Approaches Ken Calvert Net Seminar, 23 October 2001 Note: Many slides “borrowed” from S. Ratnasamy’s Qualifying Exam talk.

INTERNET TECHNOLOGIES Week 10 Peer to Peer Paradigm 1.

Netprog: Chat1 Chat Issues and Ideas for Service Design Refs: RFC 1459 (IRC)

Fault Tolerance (2). Topics r Reliable Group Communication.

TRUST Self-Organizing Systems Emin G ü n Sirer, Cornell University.

Tackling Challenges of Scale in Highly Available Computing Systems Ken Birman Dept. of Computer Science Cornell University.

Chord: A Scalable Peer-to-Peer Lookup Service for Internet Applications * CS587x Lecture Department of Computer Science Iowa State University *I. Stoica,

Replication & Fault Tolerance CONARD JAMES B. FARAON

CHAPTER 3 Architectures for Distributed Systems

Using Gossip to Build Scalable Services

CS5412: Bimodal Multicast Astrolabe

TRUST:Team for Research in Ubiquitous Secure Technologies

CS514: Intermediate Course in Operating Systems

CS5412: Bimodal Multicast Astrolabe

Presentation transcript:

Navigating in the Dark: New Options for Building Self- Configuring Embedded Systems Ken Birman Cornell University

A sea change… We’re looking at a massive change in the way we use computers Today, we’re still very client-server oriented (Web Services will continue this) Tomorrow, many important applications will use vast numbers of small sensors And even standard wired systems should sometimes be treated as sensor networks

Characteristics? Large numbers of components, substantial rate of churn … failure is common in any large deployment, small nodes are fragile (flawed assumption?) and connectivity may be poor Need to self configure For obvious, practical reasons

Can we map this problem to the previous one? Not clear how we could do so Sensors can often capture lots of data (think: Foveon 6Mb optical chip, 20 fps) … and may even be able to process that data on chip But rarely have the capacity to ship the data to a server (power, signal limits)

This spells trouble! The way we normally build distributed systems is mismatched to the need! The clients of a Web Services or similar system are second-tier citizens And they operate in the dark About one-another And about network/system “state” Building sensor networks this way won’t work

Is there an alternative? We see hope in what are called peer-to-peer and “epidemic” communications protocols! Inspired by work on (illegal) file sharing But we’ll aim at other kinds of sharing Goals: scalability, stability despite churn, load loads/power consumption Most overcome tendency of many P2P technologies to be disabled by churn

Astrolabe Intended as help for applications adrift in a sea of information Structure emerges from a randomized peer-to-peer protocol This approach is robust and scalable even under extreme stress that cripples more traditional approaches Developed at Cornell By Robbert van Renesse, with many others helping… Just an example of the kind of solutions we need

Astrolabe builds a hierarchy using a P2P protocol that “assembles the puzzle” without any servers NameLoadWeblogic?SMTP?Word Version … swift falcon cardinal NameLoadWeblogic?SMTP?Word Version … gazelle zebra gnu NameAvg Load WL contactSMTP contact SF NJ Paris San Francisco New Jersey SQL query “summarizes” data Dynamically changing query output is visible system-wide

Astrolabe in a single domain Each node owns a single tuple, like the management information base (MIB) Nodes discover one-another through a simple broadcast scheme (“anyone out there?”) and gossip about membership Nodes also keep replicas of one-another’s rows Periodically (uniformly at random) merge your state with some else…

State Merge: Core of Astrolabe epidemic NameTimeLoadWeblogic ? SMTP?Word Version swift falcon cardinal NameTimeLoadWeblogic?SMTP?Word Versi on swift falcon cardinal swift.cs.cornell.edu cardinal.cs.cornell.edu

State Merge: Core of Astrolabe epidemic NameTimeLoadWeblogic ? SMTP?Word Version swift falcon cardinal NameTimeLoadWeblogic?SMTP?Word Versi on swift falcon cardinal swift.cs.cornell.edu cardinal.cs.cornell.edu swift cardinal

State Merge: Core of Astrolabe epidemic NameTimeLoadWeblogic ? SMTP?Word Version swift falcon cardinal NameTimeLoadWeblogic?SMTP?Word Versi on swift falcon cardinal swift.cs.cornell.edu cardinal.cs.cornell.edu

Observations Merge protocol has constant cost One message sent, received (on avg) per unit time. The data changes slowly, so no need to run it quickly – we usually run it every five seconds or so Information spreads in O(log N) time But this assumes bounded region size In Astrolabe, we limit them to rows

Big system will have many regions Astrolabe usually configured by a manager who places each node in some region, but we are also playing with ways to discover structure automatically A big system could have many regions Looks like a pile of spreadsheets A node only replicates data from its neighbors within its own region

Scaling up… and up… With a stack of domains, we don’t want every system to “see” every domain Cost would be huge So instead, we’ll see a summary NameTimeLoadWeblogic ? SMTP?Word Version swift falcon cardinal cardinal.cs.cornell.edu NameTimeLoadWeblogic ? SMTP?Word Version swift falcon cardinal NameTimeLoadWeblogic ? SMTP?Word Version swift falcon cardinal NameTimeLoadWeblogic ? SMTP?Word Version swift falcon cardinal NameTimeLoadWeblogic ? SMTP?Word Version swift falcon cardinal NameTimeLoadWeblogic ? SMTP?Word Version swift falcon cardinal NameTimeLoadWeblogic ? SMTP?Word Version swift falcon cardinal

Astrolabe builds a hierarchy using a P2P protocol that “assembles the puzzle” without any servers NameLoadWeblogic?SMTP?Word Version … swift falcon cardinal NameLoadWeblogic?SMTP?Word Version … gazelle zebra gnu NameAvg Load WL contactSMTP contact SF NJ Paris San Francisco New Jersey SQL query “summarizes” data Dynamically changing query output is visible system-wide

Large scale: “fake” regions These are Computed by queries that summarize a whole region as a single row Gossiped in a read-only manner within a leaf region But who runs the gossip? Each region elects “k” members to run gossip at the next level up. Can play with selection criteria and “k”

Hierarchy is virtual… data is replicated NameLoadWeblogic?SMTP?Word Version … swift falcon cardinal NameLoadWeblogic?SMTP?Word Version … gazelle zebra gnu NameAvg Load WL contactSMTP contact SF NJ Paris San Francisco New Jersey

Hierarchy is virtual… data is replicated NameLoadWeblogic?SMTP?Word Version … swift falcon cardinal NameLoadWeblogic?SMTP?Word Version … gazelle zebra gnu NameAvg Load WL contactSMTP contact SF NJ Paris San Francisco New Jersey

Worst case load? A small number of nodes end up participating in O(log fanout N) epidemics Here the fanout is something like 50 In each epidemic, a message is sent and received roughly every 5 seconds We limit message size so even during periods of turbulence, no message can become huge. Instead, data would just propagate slowly Haven’t really looked hard at this case

Astrolabe is a good fit No central server Hierarchical abstraction “emerges” Moreover, this abstraction is very robust It scales well… disruptions won’t disrupt the system … consistent in eyes of varied beholders Individual participant runs trivial p2p protocol Supports distributed data aggregation, data mining. Adaptive and self-repairing…

Data Mining We can data mine using Astrolabe The “configuration” and “aggregation” queries can be dynamically adjusted So we can use the former to query sensors in customizable ways (in parallel) … and then use the latter to extract an efficient summary that won’t cost an arm and a leg to transmit

Costs are basically constant! Unlike many systems that experience load surges and other kinds of load variability, Astrolabe is stable under all conditions Stress doesn’t provoke surges of message traffic And Astrolabe remains active even while those disruptions are happening

Other such abstractions Scalable probabilistically reliable multicast based on P2P (peer-to-peer) epidemics: Bimodal Multicast Some of the work on P2P indexing structures and file storage: Kelips Overlay networks for end-to-end IP- style multicast: Willow

Challenges We need to do more work on Real-time issues: right now Astrolabe is highly predictable but somewhat slow Security (including protection against malfunctioning components) Scheduled downtown (sensors do this quite often today; maybe less an issue in the future)

Communication locality Important in sensor networks, where messages to distant machines need costly relaying Astrolabe does most of its communication with neighbors Close to Kleinberg’s small worlds structure for remote gossip

Conclusions? We’re near a breakthrough: sensor networks that behave like sentient infrastructure They sense their own state and adapt Self-configure and self-repair Incredibly exciting opportunities ahead Cornell has focused on probabilistically scalable technologies and built real systems while also exploring theoretical analyses

Extra slides Just to respond to questions

Bimodal Multicast A technology we developed several years ago Our goal was to get better scalability without abandoning reliability

Multicast historical timeline Cheriton: V system IP multicast is a standard Internet protocol Anycast never really made it TIME 1980’s: IP multicast, anycast, other best-effort models

Multicast historical timeline Isis Toolkit was used by New York Stock Exchange, Swiss Exchange French Air Traffic Control System AEGIS radar control and communications TIME 1980’s: IP multicast, anycast, other best-effort models : Virtually synchronous multicast takes off (Isis, Horus, Ensemble but also many other systems, like Transis, Totem, etc). Used in many settings today but no single system “won”

Multicast historical timeline : Scalability issues prompt a new generation of scalable solutions (SRM, RMTP, etc). Cornell’s contribution was Bimodal Multicast, aka “pbcast” TIME 1980’s: IP multicast, anycast, other best-effort models : Virtually synchronous multicast takes off (Isis, Horus, Ensemble but also many other systems, like Transis, Totem, etc). Used in many settings today but no single system “won”

Virtual Synchrony Model crash G 0 ={p,q} G 1 ={p,q,r,s} G 2 ={q,r,s} G 3 ={q,r,s,t} pqrstpqrst r, s request to join r,s added; state xfer t added, state xfer t requests to join p fails... to date, the only widely adopted model for consistency and fault-tolerance in highly available networked applications

Virtual Synchrony scaling issue Virtually synchronous Ensemble multicast protocols perturb rate average throughput on nonperturbed members group size: 32 group size: 64 group size: 96

Bimodal Multicast Uses some sort of best effort dissemination protocol to get the message “seeded” E.g. IP multicast, or our own tree-based scheme running on TCP but willing to drop packets if congestion occurs But some nodes log messages We use a DHT scheme Detect a missing message? Recover from a log server that should have it…

Start by using unreliable multicast to rapidly distribute the message. But some messages may not get through, and some processes may be faulty. So initial state involves partial distribution of multicast(s)

Periodically (e.g. every 100ms) each process sends a digest describing its state to some randomly selected group member. The digest identifies messages. It doesn’t include them.

Recipient checks the gossip digest against its own history and solicits a copy of any missing message from the process that sent the gossip

Processes respond to solicitations received during a round of gossip by retransmitting the requested message. The round lasts much longer than a typical RPC time.

Figure 5: Graphs of analytical results

Our original degradation scenario

Distributed Indexing Goal is to find a copy of Nora Jones’ “I don’t know” Index contains, for example (machine-name, object) Operations to search, update

History of the problem Very old: Internet DNS does this But DNS lookup is by machine name We want the inverted map Napster was the first really big hit 5 million users at one time The index itself was centralized. Used peer-to-peer file copying once a copy was found (many issues arose…)

Hot academic topic today System based on a virtual ring MIT: Chord system (Karger, Kaashoek) Many systems use Paxton radix search Rice: Pastry (Druschel, Rowston) Berkeley: Tapestry Cornell: a scheme that uses replication Kelips

Kelips idea? Treat the system as sqrt(N) “affinity” groups of size sqrt(N) each Any given index entry is mapped to a group and replicated within it O(log N) time delay to do an update Could accelerate this with an unreliable multicast To do a lookup, find a group member (or a few of them) and ask for item O(1) lookup

Why Kelips? Other schemes have O(log N) lookup delay This is quite a high cost in practical settings Others also have fragile data structures Background reorganization costs soar under stress, churn, flash loads Kelips has a completely constant load!

Solutions that share properties Scalable Robust against localized disruption Have emergent behavior we can reason about, exploit in the application layer Think of the way a hive of insects organizes itself or reacts to stimuli. There are many similarities

Revisit our goals Are these potential components for sentient systems? Middleware that perceives the state of the network It represent this knowledge in a form smart applications can exploit Although built from large numbers of rather dumb components the emergent behavior is intelligent. These applications are more robust, more secure, more responsive than any individual component When something unexpected occurs, they can diagnose the problem and trigger a coordinated distributed response They repair themselves after damage We seem to have the basis from which to work!

Brings us full circle Our goal should be a new form of very stable “sentient middleware” Have we accomplished this goal? Probabilistically reliable, scalable primitives They solve many problems Gaining much attention now from industry, academic research community Fundamental issue is skepticism about peer- to-peer as a computing model

Conclusions? We’re at the verge of a breakthrough – networks that behave like sentient infrastructure on behalf of smart applications Incredibly exciting opportunities if we can build these Cornell’s angle has focused on probabilistically scalable technologies and tried to mix real systems and experimental work with stochastic analyses