Navigating in the Dark: New Options for Building Self- Configuring Embedded Systems Ken Birman Cornell University.

Navigating in the Dark: New Options for Building Self- Configuring Embedded Systems Ken Birman Cornell University

A sea change… We’re looking at a massive change in the way we use computers Today, we’re still very client-server oriented (Web Services will continue this) Tomorrow, many important applications will use vast numbers of small sensors And even standard wired systems should sometimes be treated as sensor networks

Characteristics? Large numbers of components, substantial rate of churn … failure is common in any large deployment, small nodes are fragile (flawed assumption?) and connectivity may be poor Need to self configure For obvious, practical reasons

Can we map this problem to the previous one? Not clear how we could do so Sensors can often capture lots of data (think: Foveon 6Mb optical chip, 20 fps) … and may even be able to process that data on chip But rarely have the capacity to ship the data to a server (power, signal limits)

This spells trouble! The way we normally build distributed systems is mismatched to the need! The clients of a Web Services or similar system are second-tier citizens And they operate in the dark About one-another And about network/system “state” Building sensor networks this way won’t work

Is there an alternative? We see hope in what are called peer-to-peer and “epidemic” communications protocols! Inspired by work on (illegal) file sharing But we’ll aim at other kinds of sharing Goals: scalability, stability despite churn, load loads/power consumption Most overcome tendency of many P2P technologies to be disabled by churn

Astrolabe Intended as help for applications adrift in a sea of information Structure emerges from a randomized peer-to-peer protocol This approach is robust and scalable even under extreme stress that cripples more traditional approaches Developed at Cornell By Robbert van Renesse, with many others helping… Just an example of the kind of solutions we need

Astrolabe builds a hierarchy using a P2P protocol that “assembles the puzzle” without any servers NameLoadWeblogic?SMTP?Word Version … swift2.0016.2 falcon1.5104.1 cardinal4.5106.0 NameLoadWeblogic?SMTP?Word Version … gazelle1.7004.5 zebra3.2016.2 gnu.5106.2 NameAvg Load WL contactSMTP contact SF2.6123.45.61.3123.45.61.17 NJ1.8127.16.77.6127.16.77.11 Paris3.114.66.71.814.66.71.12 San Francisco New Jersey SQL query “summarizes” data Dynamically changing query output is visible system-wide

Astrolabe in a single domain Each node owns a single tuple, like the management information base (MIB) Nodes discover one-another through a simple broadcast scheme (“anyone out there?”) and gossip about membership Nodes also keep replicas of one-another’s rows Periodically (uniformly at random) merge your state with some else…

State Merge: Core of Astrolabe epidemic NameTimeLoadWeblogic ? SMTP?Word Version swift2003.67016.2 falcon19762.7104.1 cardinal22013.5116.0 NameTimeLoadWeblogic?SMTP?Word Versi on swift20112.0016.2 falcon19711.5104.1 cardinal20044.5106.0 swift.cs.cornell.edu cardinal.cs.cornell.edu

State Merge: Core of Astrolabe epidemic NameTimeLoadWeblogic ? SMTP?Word Version swift2003.67016.2 falcon19762.7104.1 cardinal22013.5116.0 NameTimeLoadWeblogic?SMTP?Word Versi on swift20112.0016.2 falcon19711.5104.1 cardinal20044.5106.0 swift.cs.cornell.edu cardinal.cs.cornell.edu swift20112.0 cardinal22013.5

State Merge: Core of Astrolabe epidemic NameTimeLoadWeblogic ? SMTP?Word Version swift20112.0016.2 falcon19762.7104.1 cardinal22013.5116.0 NameTimeLoadWeblogic?SMTP?Word Versi on swift20112.0016.2 falcon19711.5104.1 cardinal22013.5106.0 swift.cs.cornell.edu cardinal.cs.cornell.edu

Observations Merge protocol has constant cost One message sent, received (on avg) per unit time. The data changes slowly, so no need to run it quickly – we usually run it every five seconds or so Information spreads in O(log N) time But this assumes bounded region size In Astrolabe, we limit them to 50-100 rows

Big system will have many regions Astrolabe usually configured by a manager who places each node in some region, but we are also playing with ways to discover structure automatically A big system could have many regions Looks like a pile of spreadsheets A node only replicates data from its neighbors within its own region

Scaling up… and up… With a stack of domains, we don’t want every system to “see” every domain Cost would be huge So instead, we’ll see a summary NameTimeLoadWeblogic ? SMTP?Word Version swift20112.0016.2 falcon19762.7104.1 cardinal22013.5116.0 cardinal.cs.cornell.edu NameTimeLoadWeblogic ? SMTP?Word Version swift20112.0016.2 falcon19762.7104.1 cardinal22013.5116.0 NameTimeLoadWeblogic ? SMTP?Word Version swift20112.0016.2 falcon19762.7104.1 cardinal22013.5116.0 NameTimeLoadWeblogic ? SMTP?Word Version swift20112.0016.2 falcon19762.7104.1 cardinal22013.5116.0 NameTimeLoadWeblogic ? SMTP?Word Version swift20112.0016.2 falcon19762.7104.1 cardinal22013.5116.0 NameTimeLoadWeblogic ? SMTP?Word Version swift20112.0016.2 falcon19762.7104.1 cardinal22013.5116.0 NameTimeLoadWeblogic ? SMTP?Word Version swift20112.0016.2 falcon19762.7104.1 cardinal22013.5116.0

Astrolabe builds a hierarchy using a P2P protocol that “assembles the puzzle” without any servers NameLoadWeblogic?SMTP?Word Version … swift2.0016.2 falcon1.5104.1 cardinal4.5106.0 NameLoadWeblogic?SMTP?Word Version … gazelle1.7004.5 zebra3.2016.2 gnu.5106.2 NameAvg Load WL contactSMTP contact SF2.6123.45.61.3123.45.61.17 NJ1.8127.16.77.6127.16.77.11 Paris3.114.66.71.814.66.71.12 San Francisco New Jersey SQL query “summarizes” data Dynamically changing query output is visible system-wide

Large scale: “fake” regions These are Computed by queries that summarize a whole region as a single row Gossiped in a read-only manner within a leaf region But who runs the gossip? Each region elects “k” members to run gossip at the next level up. Can play with selection criteria and “k”

Hierarchy is virtual… data is replicated NameLoadWeblogic?SMTP?Word Version … swift2.0016.2 falcon1.5104.1 cardinal4.5106.0 NameLoadWeblogic?SMTP?Word Version … gazelle1.7004.5 zebra3.2016.2 gnu.5106.2 NameAvg Load WL contactSMTP contact SF2.6123.45.61.3123.45.61.17 NJ1.8127.16.77.6127.16.77.11 Paris3.114.66.71.814.66.71.12 San Francisco New Jersey

Worst case load? A small number of nodes end up participating in O(log fanout N) epidemics Here the fanout is something like 50 In each epidemic, a message is sent and received roughly every 5 seconds We limit message size so even during periods of turbulence, no message can become huge. Instead, data would just propagate slowly Haven’t really looked hard at this case

Astrolabe is a good fit No central server Hierarchical abstraction “emerges” Moreover, this abstraction is very robust It scales well… disruptions won’t disrupt the system … consistent in eyes of varied beholders Individual participant runs trivial p2p protocol Supports distributed data aggregation, data mining. Adaptive and self-repairing…

Data Mining We can data mine using Astrolabe The “configuration” and “aggregation” queries can be dynamically adjusted So we can use the former to query sensors in customizable ways (in parallel) … and then use the latter to extract an efficient summary that won’t cost an arm and a leg to transmit

Costs are basically constant! Unlike many systems that experience load surges and other kinds of load variability, Astrolabe is stable under all conditions Stress doesn’t provoke surges of message traffic And Astrolabe remains active even while those disruptions are happening

Other such abstractions Scalable probabilistically reliable multicast based on P2P (peer-to-peer) epidemics: Bimodal Multicast Some of the work on P2P indexing structures and file storage: Kelips Overlay networks for end-to-end IP- style multicast: Willow

Challenges We need to do more work on Real-time issues: right now Astrolabe is highly predictable but somewhat slow Security (including protection against malfunctioning components) Scheduled downtown (sensors do this quite often today; maybe less an issue in the future)

Communication locality Important in sensor networks, where messages to distant machines need costly relaying Astrolabe does most of its communication with neighbors Close to Kleinberg’s small worlds structure for remote gossip

Conclusions? We’re near a breakthrough: sensor networks that behave like sentient infrastructure They sense their own state and adapt Self-configure and self-repair Incredibly exciting opportunities ahead Cornell has focused on probabilistically scalable technologies and built real systems while also exploring theoretical analyses

Extra slides Just to respond to questions

Bimodal Multicast A technology we developed several years ago Our goal was to get better scalability without abandoning reliability

Multicast historical timeline Cheriton: V system IP multicast is a standard Internet protocol Anycast never really made it TIME 1980’s: IP multicast, anycast, other best-effort models

Multicast historical timeline Isis Toolkit was used by New York Stock Exchange, Swiss Exchange French Air Traffic Control System AEGIS radar control and communications TIME 1980’s: IP multicast, anycast, other best-effort models 1987-1993: Virtually synchronous multicast takes off (Isis, Horus, Ensemble but also many other systems, like Transis, Totem, etc). Used in many settings today but no single system “won”

Multicast historical timeline 1995-2000: Scalability issues prompt a new generation of scalable solutions (SRM, RMTP, etc). Cornell’s contribution was Bimodal Multicast, aka “pbcast” TIME 1980’s: IP multicast, anycast, other best-effort models 1987-1993: Virtually synchronous multicast takes off (Isis, Horus, Ensemble but also many other systems, like Transis, Totem, etc). Used in many settings today but no single system “won”

Virtual Synchrony Model crash G 0 ={p,q} G 1 ={p,q,r,s} G 2 ={q,r,s} G 3 ={q,r,s,t} pqrstpqrst r, s request to join r,s added; state xfer t added, state xfer t requests to join p fails... to date, the only widely adopted model for consistency and fault-tolerance in highly available networked applications

Virtual Synchrony scaling issue 00.10.20.30.40.50.60.70.80.9 0 50 100 150 200 250 Virtually synchronous Ensemble multicast protocols perturb rate average throughput on nonperturbed members group size: 32 group size: 64 group size: 96

Bimodal Multicast Uses some sort of best effort dissemination protocol to get the message “seeded” E.g. IP multicast, or our own tree-based scheme running on TCP but willing to drop packets if congestion occurs But some nodes log messages We use a DHT scheme Detect a missing message? Recover from a log server that should have it…

Start by using unreliable multicast to rapidly distribute the message. But some messages may not get through, and some processes may be faulty. So initial state involves partial distribution of multicast(s)

Periodically (e.g. every 100ms) each process sends a digest describing its state to some randomly selected group member. The digest identifies messages. It doesn’t include them.

Recipient checks the gossip digest against its own history and solicits a copy of any missing message from the process that sent the gossip

Processes respond to solicitations received during a round of gossip by retransmitting the requested message. The round lasts much longer than a typical RPC time.

Figure 5: Graphs of analytical results

Our original degradation scenario

Distributed Indexing Goal is to find a copy of Nora Jones’ “I don’t know” Index contains, for example (machine-name, object) Operations to search, update

History of the problem Very old: Internet DNS does this But DNS lookup is by machine name We want the inverted map Napster was the first really big hit 5 million users at one time The index itself was centralized. Used peer-to-peer file copying once a copy was found (many issues arose…)

Hot academic topic today System based on a virtual ring MIT: Chord system (Karger, Kaashoek) Many systems use Paxton radix search Rice: Pastry (Druschel, Rowston) Berkeley: Tapestry Cornell: a scheme that uses replication Kelips

Kelips idea? Treat the system as sqrt(N) “affinity” groups of size sqrt(N) each Any given index entry is mapped to a group and replicated within it O(log N) time delay to do an update Could accelerate this with an unreliable multicast To do a lookup, find a group member (or a few of them) and ask for item O(1) lookup

Why Kelips? Other schemes have O(log N) lookup delay This is quite a high cost in practical settings Others also have fragile data structures Background reorganization costs soar under stress, churn, flash loads Kelips has a completely constant load!

Solutions that share properties Scalable Robust against localized disruption Have emergent behavior we can reason about, exploit in the application layer Think of the way a hive of insects organizes itself or reacts to stimuli. There are many similarities

Revisit our goals Are these potential components for sentient systems? Middleware that perceives the state of the network It represent this knowledge in a form smart applications can exploit Although built from large numbers of rather dumb components the emergent behavior is intelligent. These applications are more robust, more secure, more responsive than any individual component When something unexpected occurs, they can diagnose the problem and trigger a coordinated distributed response They repair themselves after damage We seem to have the basis from which to work!

Brings us full circle Our goal should be a new form of very stable “sentient middleware” Have we accomplished this goal? Probabilistically reliable, scalable primitives They solve many problems Gaining much attention now from industry, academic research community Fundamental issue is skepticism about peer- to-peer as a computing model

Conclusions? We’re at the verge of a breakthrough – networks that behave like sentient infrastructure on behalf of smart applications Incredibly exciting opportunities if we can build these Cornell’s angle has focused on probabilistically scalable technologies and tried to mix real systems and experimental work with stochastic analyses

Navigating in the Dark: New Options for Building Self- Configuring Embedded Systems Ken Birman Cornell University.

Similar presentations

Presentation on theme: "Navigating in the Dark: New Options for Building Self- Configuring Embedded Systems Ken Birman Cornell University."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Navigating in the Dark: New Options for Building Self- Configuring Embedded Systems Ken Birman Cornell University.

Similar presentations

Presentation on theme: "Navigating in the Dark: New Options for Building Self- Configuring Embedded Systems Ken Birman Cornell University."— Presentation transcript:

Similar presentations

About project

Feedback