Each mesh represents a single hop on the route to a given root. Sibling nodes maintain pointers to each other. Each referrer has pointers to the desired.

Slides:

Advertisements

Similar presentations

Dynamic Replica Placement for Scalable Content Delivery Yan Chen, Randy H. Katz, John D. Kubiatowicz {yanchen, randy, EECS Department.

Advertisements

Internet Indirection Infrastructure (i3 ) Ion Stoica, Daniel Adkins, Shelley Zhuang, Scott Shenker, Sonesh Surana UC Berkeley SIGCOMM 2002 Presented by:

Perspective on Overlay Networks Panel: Challenges of Computing on a Massive Scale Ben Y. Zhao FuDiCo 2002.

Tapestry: Scalable and Fault-tolerant Routing and Location Stanford Networking Seminar October 2001 Ben Y. Zhao

Tapestry: Decentralized Routing and Location SPAM Summer 2001 Ben Y. Zhao CS Division, U. C. Berkeley.

Pastry Peter Druschel, Rice University Antony Rowstron, Microsoft Research UK Some slides are borrowed from the original presentation by the authors.

Scalable Content-Addressable Network Lintao Liu

© 2005 Andreas Haeberlen, Rice University 1 Glacier: Highly durable, decentralized storage despite massive correlated failures Andreas Haeberlen Alan Mislove.

Chord: A scalable peer-to- peer lookup service for Internet applications Ion Stoica, Robert Morris, David Karger, M. Frans Kaashock, Hari Balakrishnan.

Serverless Network File Systems. Network File Systems Allow sharing among independent file systems in a transparent manner Mounting a remote directory.

Rumor Routing Algorithm For sensor Networks David Braginsky, Computer Science Department, UCLA Presented By: Yaohua Zhu CS691 Spring 2003.

Self-Organizing Hierarchical Routing for Scalable Ad Hoc Networking David B. Johnson Department of Computer Science Rice University Monarch.

1 Accessing nearby copies of replicated objects Greg Plaxton, Rajmohan Rajaraman, Andrea Richa SPAA 1997.

Common approach 1. Define space: assign random ID (160-bit) to each node and key 2. Define a metric topology in this space,  that is, the space of keys.

The Oceanstore Regenerative Wide-area Location Mechanism Ben Zhao John Kubiatowicz Anthony Joseph Endeavor Retreat, June 2000.

Web Caching Schemes1 A Survey of Web Caching Schemes for the Internet Jia Wang.

Peer to Peer File Sharing Huseyin Ozgur TAN. What is Peer-to-Peer?  Every node is designed to(but may not by user choice) provide some service that helps.

Pastry: Scalable, decentralized object location and routing for large-scale peer-to-peer systems Antony Rowstron and Peter Druschel Proc. of the 18th IFIP/ACM.

A Scalable Content-Addressable Network Authors: S. Ratnasamy, P. Francis, M. Handley, R. Karp, S. Shenker University of California, Berkeley Presenter:

Scalable Adaptive Data Dissemination Under Heterogeneous Environment Yan Chen, John Kubiatowicz and Ben Zhao UC Berkeley.

Tapestry: Wide-area Location and Routing Ben Y. Zhao John Kubiatowicz Anthony D. Joseph U. C. Berkeley.

Tapestry : An Infrastructure for Fault-tolerant Wide-area Location and Routing Presenter: Chunyuan Liao March 6, 2002 Ben Y.Zhao, John Kubiatowicz, and.

presented by Hasan SÖZER1 Scalable P2P Search Daniel A. Menascé George Mason University.

Weaving a Tapestry Distributed Algorithms for Secure Node Integration, Routing and Fault Handling Ben Y. Zhao (John Kubiatowicz, Anthony Joseph) Fault-tolerant.

OceanStore: An Architecture for Global-Scale Persistent Storage Professor John Kubiatowicz, University of California at Berkeley

1 Spring Semester 2007, Dept. of Computer Science, Technion Internet Networking recitation #5 Mobile Ad-Hoc Networks TBRPF.

Decentralized Location Services CS273 Guest Lecture April 24, 2001 Ben Y. Zhao.

Or, Providing High Availability and Adaptability in a Decentralized System Tapestry: Fault-resilient Wide-area Location and Routing Issues Facing Wide-area.

Wide-area cooperative storage with CFS

Or, Providing Scalable, Decentralized Location and Routing Network Services Tapestry: Fault-tolerant Wide-area Application Infrastructure Motivation and.

OceanStore An Architecture for Global-Scale Persistent Storage Motivation Feature Application Specific Components - Secure Naming - Update - Access Control-

Tapestry: A Resilient Global-scale Overlay for Service Deployment Ben Y. Zhao, Ling Huang, Jeremy Stribling, Sean C. Rhea, Anthony D. Joseph, and John.

Internet Indirection Infrastructure (i3) Ion Stoica, Daniel Adkins, Shelley Zhuang, Scott Shenker, Sonesh Surana UC Berkeley SIGCOMM 2002.

Locality Aware Mechanisms for Large-scale Networks Ben Y. Zhao Anthony D. Joseph John D. Kubiatowicz UC Berkeley Future Directions in Distributed Computing.

Mobile Ad-hoc Pastry (MADPastry) Niloy Ganguly. Problem of normal DHT in MANET No co-relation between overlay logical hop and physical hop – Low bandwidth,

Tapestry GTK Devaroy (07CS1012) Kintali Bala Kishan (07CS1024) G Rahul (07CS3009)

Communication (II) Chapter 4

1 Plaxton Routing. 2 Introduction Plaxton routing is a scalable mechanism for accessing nearby copies of objects. Plaxton mesh is a data structure that.

Arnold N. Pears, CoRE Group Uppsala University 3 rd Swedish Networking Workshop Marholmen, September Why Tapestry is not Pastry Presenter.

Chord & CFS Presenter: Gang ZhouNov. 11th, University of Virginia.

Mobile Adhoc Network: Routing Protocol:AODV

Andreas Larsson, Philippas Tsigas SIROCCO Self-stabilizing (k,r)-Clustering in Clock Rate-limited Systems.

Brocade Landmark Routing on P2P Networks Gisik Kwon April 9, 2002.

SOS: Security Overlay Service Angelos D. Keromytis, Vishal Misra, Daniel Rubenstein- Columbia University ACM SIGCOMM 2002 CONFERENCE, PITTSBURGH PA, AUG.

Vincent Matossian September 21st 2001 ECE 579 An Overview of Decentralized Discovery mechanisms.

Serverless Network File Systems Overview by Joseph Thompson.

1 More on Plaxton routing There are n nodes, and log B n digits in the id, where B = 2 b The neighbor table of each node consists of - primary neighbors.

Paper Survey of DHT Distributed Hash Table. Usages Directory service  Very little amount of information, such as URI, metadata, … Storage  Data, such.

Tapestry: A Resilient Global-scale Overlay for Service Deployment 1 Ben Y. Zhao, Ling Huang, Jeremy Stribling, Sean C. Rhea, Anthony D. Joseph, and John.

Security in Mobile Ad Hoc Networks: Challenges and Solutions (IEEE Wireless Communications 2004) Hao Yang, et al. October 10 th, 2006 Jinkyu Lee.

Computer Networking P2P. Why P2P? Scaling: system scales with number of clients, by definition Eliminate centralization: Eliminate single point.

Plethora: Infrastructure and System Design. Introduction Peer-to-Peer (P2P) networks: –Self-organizing distributed systems –Nodes receive and provide.

Information-Centric Networks10b-1 Week 10 / Paper 2 Hermes: a distributed event-based middleware architecture –P.R. Pietzuch, J.M. Bacon –ICDCS 2002 Workshops.

POND: THE OCEANSTORE PROTOTYPE S. Rea, P. Eaton, D. Geels, H. Weatherspoon, J. Kubiatowicz U. C. Berkeley.

P2PSIP Security Analysis and evaluation draft-song-p2psip-security-eval-00 Song Yongchao Ben Y. Zhao

Peer to Peer Network Design Discovery and Routing algorithms

CS 6401 Overlay Networks Outline Overlay networks overview Routing overlays Resilient Overlay Networks Content Distribution Networks.

Tapestry : An Infrastructure for Fault-tolerant Wide-area Location and Routing Presenter : Lee Youn Do Oct 5, 2005 Ben Y.Zhao, John Kubiatowicz, and Anthony.

LOOKING UP DATA IN P2P SYSTEMS Hari Balakrishnan M. Frans Kaashoek David Karger Robert Morris Ion Stoica MIT LCS.

Improving Fault Tolerance in AODV Matthew J. Miller Jungmin So.

Large Scale Sharing Marco F. Duarte COMP 520: Distributed Systems September 19, 2004.

1 Plaxton Routing. 2 History Greg Plaxton, Rajmohan Rajaraman, Andrea Richa. Accessing nearby copies of replicated objects, SPAA 1997 Used in several.

Repairing Sensor Network Using Mobile Robots Y. Mei, C. Xian, S. Das, Y. C. Hu and Y. H. Lu Purdue University, West Lafayette ICDCS 2006 Speaker ： Shih-Yun.

CS791Aravind Elango Maintenance-Free Global Data Storage Sean Rhea, Chris Wells, Patrick Eaten, Dennis Geels, Ben Zhao, Hakim Weatherspoon and John Kubiatowicz.

OceanStore : An Architecture for Global-Scale Persistent Storage Jaewoo Kim, Youngho Yi, Minsik Cho.

OceanStore: An Architecture for Global-Scale Persistent Storage

Intra-Domain Routing Jacob Strauss September 14, 2006.

Accessing nearby copies of replicated objects

John D. Kubiatowicz UC Berkeley

Tapestry: Scalable and Fault-tolerant Routing and Location

Presentation transcript:

Each mesh represents a single hop on the route to a given root. Sibling nodes maintain pointers to each other. Each referrer has pointers to the desired node’s siblings An investigation into building a truly scalable, highly available wide-area system Context: The Oceanstore Global-scale Persistent Storage System Why is it hard? –Large scale system  frequent component faults –Large amount of data  performance and load bottleneck –Dynamic environment  changes in topology, network conditions –More principals  attacks on system (e.g. DoS) more likely Previous efforts: –The Globe Wide-area Distributed System –SLP wide-area extension –The Berkeley Service Discovery Service Project Goals: –True scalability without centralization –Exploit locality: local-area performance for local objects –Availability in face of multiple node failures and network partitions –Self-maintenance: repair corrupted data, optimize Provide framework for active “shuttle” messages –“Shuttles” are protocol-specific messages tunneled inside Tapestry –Protocol-specific modules interpret shuttles and generate events Allows overlay networks to leverage Tapestry’s availability, fault-tolerance, and hierarchy management Need to verify / trust module code Further theoretical analysis of algorithms How do we deal with highly dynamic system? –Can we tolerate high rate of node entries/exits –Mobile clients: can we Tapestry out to the edge –Allow faster insertion for “guests” such as mobile nodes More security issues: –How to prevent flooding of route packets –Message authentication –DoS using frequent entries to Tapestry Ben Y. Zhao, John Kubiatowicz, Anthony D. Joseph Tapestry: A Highly-available Wide-area Location and Routing Mechanism Sibling Mesh: –Logical mesh formed by nodes with common suffix –Each node keeps pointers to small # (n) nearest siblings –Each pointer to next hop also keeps 2 alternate nodes –Result: reduces entry/exit latency and adds redundancy Referrer List (backpointers) –TBD: explain tradeoff between storage and functionality here… Availability via Replication –Plaxton resilient to: Intermediate node failures Small partitions –Vulnerable to: Root node failures Overload of router nodes Large/correlated partitions –Tapestry solution: Node replication –Replication algorithm: New replicate w/ ID “X” searches for existing node for “X” Negotiate with “X” to get referrer list Use network distance measurements to find optimal partition Keep regular beacons as part of replicate group Detect replicate fault and take over responsibility as necessary –Replicates can be co-located with hotspots for better load distribution Availability via Hashing –Incoming IDs hashed with multiple salt values –Queries parallized for additional redundancy –Hinders DoS attacks by obscuring object  node mapping Self-repair vs. corruption, –When next hop fails, use alternate node to access sibling mesh –Use mesh to find the new optimal next hop Self-optimization –Running queries store previous hop IDs and distances –Non-optimal paths detected during traversal and fixed Fast fault detection and recovery –Occasional soft-state beacons between node and referrers –Use active queries as heartbeats in high traffic regions –On fault: mark downed node as inactive with long lease –Probabilistically send regular query requests to inactive node If recovery detected, switch status to active If lease expires, mark as failed and actively remove Security –Use referrer maps and sibling mesh to isolate attackers Tapestry EnhancementsProject Overview Extensibility Simulation Results Fault-tolerance and optimality Map objects to one of many embedded trees in the network –Objects mapped to “root” nodes identified by string IDs –For every suffix length possible in the ID, Nodes keep pointers to “nearest” neighbors sharing a suffix length Plaxton Trees Availability Results: (plotted against Plaxton) 1.Single node failure 2.Multiple node failure Optimality measure: Minimum stretch factor as function of size of network 1.Immediately after insertion 2.After time X, and self- optimization reaches steady state Availability Results: (plotted against Plaxton) 1.Small/single partition 2.Multiple/correlated partitions Fault-tolerance 1.Integration of recovered node (2 nd chance vs plaxt) 2.Recovery time (downed node/link) 3.Query latency degradation under fault conditions Availability Ongoing work Routing algorithm –Start at closest neighbor with desired ending digit –At each hop, match next digit to nearest neighbor listing Operations: –Insertion: place pointers to obj on each intervening hop to root and at root –Query: route to root, stop when hop has desired pointer Inserting Obj # Searching Obj #62942 Root Node Search Client Object Location Properties: –Routes have at most Log B (N) hops, B=base of ID, N= # of nodes –No centralization; reroute around failed nodes or links –Highly scalable; exploits locality  local queries never go to root node Plaxton Trees / Ground Level Level 9 Level Single path to root Sibling pointers Single hops to root Three sibling meshes for one root

Poster Notes 1.Briefly review oceanstore: 1.Oceanstore is a globally distributed, fully secure, persistent storage system. Its focus is on reliability and guaranteed persistence. It uses multi-tiered servers to store the data and propagate updates down to clients. All data is fully encrypted everywhere, and the infrastructure is not trusted. Fragments of documents are encoded using erasure codes and let loose into the system. Adaptive mechanisms place them at optimal locations. Wide area location and communication between nodes on a global scale is crucial 2.Any global-scale system will run into the same problems of: 1.frequent faults in the system (large # of components), 2.any point of centralization will cause bottlenecks 3.Redundancy and assoc. cost in storage is acceptable tradeoff for reliability/availability 4.What is the problem with existing wide-area systems? There is ALWAYS a point of centralization somewhere,  performance and scaling bottleneck 5.What’s new here: no centralization and redundancy over redundancy 2.Review Plaxton work 1.Was primarily theory work, missing features desirable in a real system 2.Never implemented or simulated before (AFAIK) and no empirical results 3.Stress “limited” resilience against node failures (since every intermediate node is also a root node), and vulnerable to massive network partitions or bisections 4.See poster for algorithm details and graph, focus on key properties of locality, fault-tolerance, and true scalability via randomized distribution 3.Explain assumptions: 1.Tapestry is designed to work best over a relatively stable system, needs time to optimize itself to steady state 2.Needs a strong measurement infrastructure in order to learn about network distances between nodes 4.Discuss the Tapestry enhancements one by one: 1.sibling mesh: making what was already there more explicit 2.explain 4 tiered diagram, can think of siblings with the same length suffix as points on the same 2-D mesh, as you route closer and closer to root node, you traverse up a 3-D canopy of meshes 3.Explain how replication solves node failure problems, routing bottleneck problems, and locality problems 4.availability via hashing: very good redundancy, without having to incur overhead of pointer list again, only the overhead of storing X times more data (1 obj becomes 3 separate objects, but all 3 stored in same network) 5.Self-repair: crucial in wide-area systems, since faults are many, distributed, and often hard to get to and fix on time 6.self-optimization: this allows algorithms to not have to do a “perfect job”, just a good enough one, and then let self-optimization take over 7.Second-chance algorithm is tuned towards allowing servers that recover within some period of time (say 1 day) the ability to pick up where they left off, and not incur the high exit/entry cost (probabilistic algorithm proactively probes the node for activity on a regular basis) 8.security: use Stefan Savage and Dawn Song et. al’s algorithms to do traceback, then isolate and quarantine malicious nodes from network

Explore use of backpointers more Is the cost worth the benefit? Very expensive at the lower nodes of the sibling mesh, maybe use probabilistic argument to justify an incomplete but smaller subset of referrers. Don’t overwhelm w/ details, pick a few from enhancement list to discuss Close look at simulation results (which will be ready for actual conference date) availability should be quite good key is optimality measure. The minimum stretch factor should be ~3 or ~4, both are acceptable. If it’s much larger, then routes are too inefficient. (I suspect it will be <3) Spend as much time on future work as possible: Get ideas on security, how to identify malicious nodes using fast entry/exit Get ideas on storage/availability tradeoff What about replication consistency?