SkipNet: A Scalable Overlay Network with Practical Locality Properties Nick Harvey, Mike Jones, Stefan Saroiu, Marvin Theimer, Alec Wolman Microsoft Research.

Slides:



Advertisements
Similar presentations
Peer-to-Peer Infrastructure and Applications Andrew Herbert Microsoft Research, Cambridge
Advertisements

Peer-to-peer networks Ant Rowstron Microsoft Research 1.
Brocade: Landmark Routing on Peer to Peer Networks Ben Y. Zhao Yitao Duan, Ling Huang, Anthony Joseph, John Kubiatowicz IPTPS, March 2002.
Dynamic Replica Placement for Scalable Content Delivery Yan Chen, Randy H. Katz, John D. Kubiatowicz {yanchen, randy, EECS Department.
Distributed Hash Tables: An Overview
P2P data retrieval DHT (Distributed Hash Tables) Partially based on Hellerstein’s presentation at VLDB2004.
Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, Hari Balakrishnan MIT and Berkeley presented by Daniel Figueiredo Chord: A Scalable Peer-to-peer.
Pastry Peter Druschel, Rice University Antony Rowstron, Microsoft Research UK Some slides are borrowed from the original presentation by the authors.
Peter Druschel, Rice University Antony Rowstron, Microsoft Research UK
Scalable Content-Addressable Network Lintao Liu
Peer-to-Peer Systems Chapter 25. What is Peer-to-Peer (P2P)? Napster? Gnutella? Most people think of P2P as music sharing.
Pastry Peter Druschel, Rice University Antony Rowstron, Microsoft Research UK Some slides are borrowed from the original presentation by the authors.
1 PASTRY Partially borrowed from Gabi Kliot ’ s presentation.
1 Accessing nearby copies of replicated objects Greg Plaxton, Rajmohan Rajaraman, Andrea Richa SPAA 1997.
Common approach 1. Define space: assign random ID (160-bit) to each node and key 2. Define a metric topology in this space,  that is, the space of keys.
Presented by Elisavet Kozyri. A distributed application architecture that partitions tasks or work loads between peers Main actions: Find the owner of.
1 Canon in G Major: Designing DHTs with Hierarchical Structure Prasanna Ganesan (Stanford University) Krishna Gummadi (U. of Washington) Hector Garcia-Molina.
P2P Network Structured Networks (III) Distributed Hash Tables Pedro García López Universitat Rovira I Virgili
Alex Shraer, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Tutorial 4: SkipNet Spring.
Internet Indirection Infrastructure Ion Stoica UC Berkeley.
SkipNet: A Scalable Overlay Network with Practical Locality Properties Nick Harvey, Mike Jones, Stefan Saroiu, Marvin Theimer, Alec Wolman Presented by.
Carnegie Mellon University Complex queries in distributed publish- subscribe systems Ashwin R. Bharambe, Justin Weisz and Srinivasan Seshan.
Looking Up Data in P2P Systems Hari Balakrishnan M.Frans Kaashoek David Karger Robert Morris Ion Stoica.
SkipNet Christian Schmidt-Madsen, Peter Tiedemann,
SkipNet: A Scalable Overlay Network with Practical Locality Properties Nick Harvey, Mike Jones, Stefan Saroiu, Marvin Theimer, Alec Wolman Microsoft Research.
Aggregating Information in Peer-to-Peer Systems for Improved Join and Leave Distributed Computing Group Keno Albrecht Ruedi Arnold Michael Gähwiler Roger.
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 13: SkipNet Spring.
1 Canon in G Major: Designing DHTs with Hierarchical Structure Prasanna Ganesan (Stanford University) Krishna Gummadi (U. of Washington) Hector Garcia-Molina.
Object Naming & Content based Object Search 2/3/2003.
SkipNet: A Scaleable Overlay Network With Practical Locality Properties Presented by Rachel Rubin CS294-4: Peer-to-Peer Systems By Nicholas Harvey, Michael.
Topics in Reliable Distributed Systems Fall Dr. Idit Keidar.
Peer To Peer Distributed Systems Pete Keleher. Why Distributed Systems? l Aggregate resources! –memory –disk –CPU cycles l Proximity to physical stuff.
Wide-area cooperative storage with CFS
Or, Providing Scalable, Decentralized Location and Routing Network Services Tapestry: Fault-tolerant Wide-area Application Infrastructure Motivation and.
An Evaluation of Scalable Application-level Multicast Using Peer-to-peer Overlays Miguel Castro, Michael B. Jones, Anne-Marie Kermarrec, Antony Rowstron,
P2P Course, Structured systems 1 Skip Net (9/11/05)
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Tutorial 3: SkipNet Spring.
1 Peer-to-Peer Networks Outline Survey Self-organizing overlay network File system on top of P2P network Contributions from Peter Druschel.
ICDE A Peer-to-peer Framework for Caching Range Queries Ozgur D. Sahin Abhishek Gupta Divyakant Agrawal Amr El Abbadi Department of Computer Science.
File Sharing : Hash/Lookup Yossi Shasho (HW in last slide) Based on Chord: A Scalable Peer-to-peer Lookup Service for Internet ApplicationsChord: A Scalable.
Topology-Aware Overlay Networks By Huseyin Ozgur TAN.
1CS 6401 Peer-to-Peer Networks Outline Overview Gnutella Structured Overlays BitTorrent.
Mobile Ad-hoc Pastry (MADPastry) Niloy Ganguly. Problem of normal DHT in MANET No co-relation between overlay logical hop and physical hop – Low bandwidth,
INTRODUCTION TO PEER TO PEER NETWORKS Z.M. Joseph CSE 6392 – DB Exploration Spring 2006 CSE, UT Arlington.
Symmetric Replication in Structured Peer-to-Peer Systems Ali Ghodsi, Luc Onana Alima, Seif Haridi.
Other Structured P2P Systems CAN, BATON Lecture 4 1.
Tapestry GTK Devaroy (07CS1012) Kintali Bala Kishan (07CS1024) G Rahul (07CS3009)
1 Plaxton Routing. 2 Introduction Plaxton routing is a scalable mechanism for accessing nearby copies of objects. Plaxton mesh is a data structure that.
PIC: Practical Internet Coordinates for Distance Estimation Manuel Costa joint work with Miguel Castro, Ant Rowstron, Peter Key Microsoft Research Cambridge.
Content Overlays (Nick Feamster). 2 Content Overlays Distributed content storage and retrieval Two primary approaches: –Structured overlay –Unstructured.
Chord: A Scalable Peer-to-peer Lookup Protocol for Internet Applications Xiaozhou Li COS 461: Computer Networks (precept 04/06/12) Princeton University.
1 Distributed Hash Tables (DHTs) Lars Jørgen Lillehovde Jo Grimstad Bang Distributed Hash Tables (DHTs)
1 Detecting and Reducing Partition Nodes in Limited-routing-hop Overlay Networks Zhenhua Li and Guihai Chen State Key Laboratory for Novel Software Technology.
An IP Address Based Caching Scheme for Peer-to-Peer Networks Ronaldo Alves Ferreira Joint work with Ananth Grama and Suresh Jagannathan Department of Computer.
November 17, 2015Department of Computer Sciences, UT Austin1 SDIMS: A Scalable Distributed Information Management System Praveen Yalagandula Mike Dahlin.
Peer to Peer A Survey and comparison of peer-to-peer overlay network schemes And so on… Chulhyun Park
DHT-based unicast for mobile ad hoc networks Thomas Zahn, Jochen Schiller Institute of Computer Science Freie Universitat Berlin 報告 : 羅世豪.
1 Secure Peer-to-Peer File Sharing Frans Kaashoek, David Karger, Robert Morris, Ion Stoica, Hari Balakrishnan MIT Laboratory.
1 Distributed Hash Table CS780-3 Lecture Notes In courtesy of Heng Yin.
Peer to Peer Network Design Discovery and Routing algorithms
Topologically-Aware Overlay Construction and Sever Selection Sylvia Ratnasamy, Mark Handley, Richard Karp, Scott Shenker.
Algorithms and Techniques in Structured Scalable Peer-to-Peer Networks
LOOKING UP DATA IN P2P SYSTEMS Hari Balakrishnan M. Frans Kaashoek David Karger Robert Morris Ion Stoica MIT LCS.
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved DISTRIBUTED SYSTEMS.
Large Scale Sharing Marco F. Duarte COMP 520: Distributed Systems September 19, 2004.
Malugo – a scalable peer-to-peer storage system..
Plethora: A Locality Enhancing Peer-to-Peer Network Ronaldo Alves Ferreira Advisor: Ananth Grama Co-advisor: Suresh Jagannathan Department of Computer.
CS694 - DHT1 Distributed Hash Table Systems Hui Zhang University of Southern California.
Accessing nearby copies of replicated objects
Zhichen Xu, Mallik Mahalingam, Magnus Karlsson
Presentation transcript:

SkipNet: A Scalable Overlay Network with Practical Locality Properties Nick Harvey, Mike Jones, Stefan Saroiu, Marvin Theimer, Alec Wolman Microsoft Research University of Washington

Overlay Networks Overlays have achieved several goals: Overlays have achieved several goals: Scalable and decentralized infrastructure Scalable and decentralized infrastructure Uniform and random load and data distribution Uniform and random load and data distribution But, at the price of data controllability But, at the price of data controllability Data may be stored far from its users Data may be stored far from its users Data may be stored outside its domain Data may be stored outside its domain Local accesses leave local organization Local accesses leave local organization Basic trade-off: data controllability vs. data uniformity Basic trade-off: data controllability vs. data uniformity SkipNet: SkipNet: Traditional overlay functionality Traditional overlay functionality Provides an abstraction to control this trade-off: Provides an abstraction to control this trade-off: Constrained load balancing (CLB) Constrained load balancing (CLB)

Talk Outline Practical data locality requirements Practical data locality requirements Basic SkipNet design Basic SkipNet design SkipNet locality properties SkipNet locality properties Performance evaluation Performance evaluation Conclusions Conclusions

Talk Outline Practical data locality requirements Practical data locality requirements Basic SkipNet design Basic SkipNet design SkipNet locality properties SkipNet locality properties Performance evaluation Performance evaluation Conclusions Conclusions

Key Locality Properties and Abstraction In practice, two properties are important: In practice, two properties are important: Content Locality – ability to explicitly place data Content Locality – ability to explicitly place data Placement on a single node or on a set of nodes Placement on a single node or on a set of nodes Path Locality – ability to guarantee that local traffic remains local Path Locality – ability to guarantee that local traffic remains local One abstraction is important – CLB: One abstraction is important – CLB: SkipNet abstraction to control the trade-off SkipNet abstraction to control the trade-off Multiple DHT scopes within one single overlay Multiple DHT scopes within one single overlay

Practical Requirements Data Controllability: Data Controllability: Organizations want control over their own data Organizations want control over their own data Even if local data is globally available Even if local data is globally available Manageability: Manageability: Data control allows for data administration, provisioning and manageability Data control allows for data administration, provisioning and manageability Data center/cluster = constrained set of nodes Data center/cluster = constrained set of nodes CLB ensures load balance across data center/cluster CLB ensures load balance across data center/cluster

Practical Requirements (contd) Security: Security: Content and path locality are key building blocks for dealing with certain external attacks Content and path locality are key building blocks for dealing with certain external attacks Data availability Data availability Local data survives network partitions Local data survives network partitions Performance Performance Data can be stored near clients that use it Data can be stored near clients that use it

Talk Outline Practical data locality requirements Practical data locality requirements Basic SkipNet design Basic SkipNet design SkipNet locality properties SkipNet locality properties Performance evaluation Performance evaluation Conclusions Conclusions

SkipNet Key property: two address spaces Key property: two address spaces 1. Name ID space: nodes are sorted by their names (e.g. DNS names) 2. Numeric ID space: nodes are randomly distributed Combining both spaces achieves Combining both spaces achieves Content + Path locality Content + Path locality Other uses could emerge: range queries [AS 03] Other uses could emerge: range queries [AS 03] Scalable peer-to-peer overlay network Scalable peer-to-peer overlay network O(log N) routing performance in both spaces O(log N) routing performance in both spaces O(log N) routing state per node O(log N) routing state per node

SkipNet Ring Pointers at level h skip over 2 h nodes Pointers at level h skip over 2 h nodes Nodes are ordered by names Nodes are ordered by names A D M V T X Z O

SkipNet Ring Pointers at level h skip over 2 h nodes Pointers at level h skip over 2 h nodes Nodes are ordered by names Nodes are ordered by names A D M V T X Z O

SkipNet Ring Pointers at level h skip over 2 h nodes Pointers at level h skip over 2 h nodes Nodes are ordered by names Nodes are ordered by names A E F M H S Z G

SkipNet Global View A Level: L = 0 L = 1 L = 3 L = 2 Root Ring Ring 0 Ring 1 Ring 00 Ring 01 Ring 10 Ring 11 Ring 000 Ring 000 Ring 001 Ring 001 Ring 010 Ring 010 Ring 011 Ring 011 Ring 100 Ring 100 Ring 101 Ring 101 Ring 110 Ring 110 Ring 111 Ring 111 D M O T V X Z O Z AT M X D V A T M X D V Z O O Z A T M X D V

SkipNet Global View A Level: L = 0 L = 1 L = 3 L = 2 Root Ring Ring 0 Ring 1 Ring 00 Ring 01 Ring 10 Ring 11 Ring 000 Ring 000 Ring 001 Ring 001 Ring 010 Ring 010 Ring 011 Ring 011 Ring 100 Ring 100 Ring 101 Ring 101 Ring 110 Ring 110 Ring 111 Ring 111 D M O T V X Z O Z AT M X D V A T M X D V Z O O Z A T M X D V

Two Address Spaces SkipNet can route efficiently in both address spaces: SkipNet can route efficiently in both address spaces: Name ID space (e.g. DNS names) Name ID space (e.g. DNS names) Numeric ID space Numeric ID space

Routing by Name ID Level: L = 0 L = 1 L = 2 Example: route from A to V Example: route from A to V Simple Rule: Forward the message to node that is closest to dest, without going too far. Simple Rule: Forward the message to node that is closest to dest, without going too far. Ring 00 Ring 01 Ring 10 Ring 11 Ring 000 Ring 000 Ring 001 Ring 001 Ring 010 Ring 010 Ring 011 Ring 011 Ring 100 Ring 100 Ring 101 Ring 101 Ring 110 Ring 110 Ring 111 Ring 111 A Root Ring D M O T V X Z Ring 0 A M T X Ring 1 D Z V O O Z AT M X D V A T M X D V Z O L = 3 Node As Routing Table Node As Routing Table

Routing by Name ID Level: L = 0 L = 1 L = 2 Example: route from A to V Example: route from A to V Simple Rule: Forward the message to node that is closest to dest, without going too far. Simple Rule: Forward the message to node that is closest to dest, without going too far. Ring 00 Ring 01 Ring 10 Ring 11 Ring 000 Ring 000 Ring 001 Ring 001 Ring 010 Ring 010 Ring 011 Ring 011 Ring 100 Ring 100 Ring 101 Ring 101 Ring 110 Ring 110 Ring 111 Ring 111 A Root Ring D M O T V X Z Ring 0 A M T X Ring 1 D Z V O O Z AT M X D V A T M X D V Z O L = 3

Routing by Name ID Level: L = 0 L = 1 L = 2 Example: route from A to V Example: route from A to V Simple Rule: Forward the message to node that is closest to dest, without going too far. Simple Rule: Forward the message to node that is closest to dest, without going too far. Ring 00 Ring 01 Ring 10 Ring 11 Ring 000 Ring 000 Ring 001 Ring 001 Ring 010 Ring 010 Ring 011 Ring 011 Ring 100 Ring 100 Ring 101 Ring 101 Ring 110 Ring 110 Ring 111 Ring 111 A Root Ring D M O T V X Z Ring 0 A M T X Ring 1 D Z V O O Z AT M X D V A T M X D V Z O L = 3 Node Ts Routing Table Node Ts Routing Table

Routing by Name ID Level: L = 0 L = 1 L = 2 Example: route from A to V Example: route from A to V Simple Rule: Forward the message to node that is closest to dest, without going too far. Simple Rule: Forward the message to node that is closest to dest, without going too far. Ring 00 Ring 01 Ring 10 Ring 11 Ring 000 Ring 000 Ring 001 Ring 001 Ring 010 Ring 010 Ring 011 Ring 011 Ring 100 Ring 100 Ring 101 Ring 101 Ring 110 Ring 110 Ring 111 Ring 111 A Root Ring D M O T V X Z Ring 0 A M T X Ring 1 D Z V O O Z AT M X D V A T M X D V Z O L = 3 Node Ts Routing Table Node Ts Routing Table

Routing by Name ID Level: L = 0 L = 1 L = 2 Example: route from A to V Example: route from A to V Simple Rule: Forward the message to node that is closest to dest, without going too far. Simple Rule: Forward the message to node that is closest to dest, without going too far. Ring 00 Ring 01 Ring 10 Ring 11 Ring 000 Ring 000 Ring 001 Ring 001 Ring 010 Ring 010 Ring 011 Ring 011 Ring 100 Ring 100 Ring 101 Ring 101 Ring 110 Ring 110 Ring 111 Ring 111 A Root Ring D M O T V X Z Ring 0 A M T X Ring 1 D Z V O O Z AT M X D V A T M X D V Z O L = 3 Node Ts Routing Table Node Ts Routing Table

Routing by Name ID Level: L = 0 L = 1 L = 2 Example: route from A to V Example: route from A to V Simple Rule: Forward the message to node that is closest to dest, without going too far. Simple Rule: Forward the message to node that is closest to dest, without going too far. Ring 00 Ring 01 Ring 10 Ring 11 Ring 000 Ring 000 Ring 001 Ring 001 Ring 010 Ring 010 Ring 011 Ring 011 Ring 100 Ring 100 Ring 101 Ring 101 Ring 110 Ring 110 Ring 111 Ring 111 A Root Ring D M O T V X Z Ring 0 A M T X Ring 1 D Z V O O Z AT M X D V A T M X D V Z O L = 3

Routing by Numeric ID Provides the basic DHT primitive Provides the basic DHT primitive To store file Foo.c To store file Foo.c Hash(Foo.c) a random numeric ID Hash(Foo.c) a random numeric ID Find highest ring matching that numeric ID Find highest ring matching that numeric ID Store file on node in that ring Store file on node in that ring Log N routing efficiency Log N routing efficiency

DHT Example Store file Foo.c from node A Store file Foo.c from node A Hash(Foo.c) = 101… Hash(Foo.c) = 101… Route from A to V in numeric space Route from A to V in numeric space Level: L = 0 L = 1 L = 2 Ring 00 Ring 01 Ring 10 Ring 11 Ring 000 Ring 000 Ring 001 Ring 001 Ring 010 Ring 010 Ring 011 Ring 011 Ring 100 Ring 100 Ring 101 Ring 101 Ring 110 Ring 110 Ring 111 Ring 111 A Root Ring D M O T V X Z Ring 0 A M T X Ring 1 D Z V O O Z AT M X D V A T M X D V Z O L = 3 Foo.c

Talk Outline Practical data locality requirements Practical data locality requirements Basic SkipNet design Basic SkipNet design SkipNet locality properties SkipNet locality properties Performance evaluation Performance evaluation Conclusions Conclusions

Constrained Load Balancing (CLB) Multiple DHTs with differing scopes using a single SkipNet structure Multiple DHTs with differing scopes using a single SkipNet structure A result of the ability to route in both address spaces A result of the ability to route in both address spaces Divide data object names into 2 parts using the ! special character Divide data object names into 2 parts using the ! special character CLB Domain CLB Suffix CLB Domain CLB Suffix microsoft.com ! skipnet.html microsoft.com ! skipnet.html Numeric RoutingName Routing

CLB Example To read file com.microsoft ! skipnet.html To read file com.microsoft ! skipnet.html Route by name ID to com.microsoft Route by name ID to com.microsoft Route by numeric ID to Hash(skipnet.html) within the com.microsoft constraint Route by numeric ID to Hash(skipnet.html) within the com.microsoft constraint com.sun edu.ucb gov.irs com.microsoft skipnet. html

SkipNet Path Locality Organizations correspond to contiguous SkipNet segments Organizations correspond to contiguous SkipNet segments Internal routing by NameID remains internal Internal routing by NameID remains internal Nodes have left / right pointers Nodes have left / right pointers com.sun edu.ucb gov.irs com.microsoft com.microsoft.research

Fault Tolerance Many failures occur along organizational boundaries: Many failures occur along organizational boundaries: Gateway/firewall failure, BGP misconfig, physical network cut, … Gateway/firewall failure, BGP misconfig, physical network cut, … SkipNet handles organizational disconnect gracefully SkipNet handles organizational disconnect gracefully Results in two well-connected, partitioned SkipNets Results in two well-connected, partitioned SkipNets Efficient remerging algorithms Efficient remerging algorithms Node independent failures Node independent failures Same resiliency as systems such as Chord and Pastry Same resiliency as systems such as Chord and Pastry Similar approach to repair (Leaf Set) Similar approach to repair (Leaf Set)

Primary Security Benefit & Weakness + SkipNet + name access control mechanism: Content locality ensures that content stays within organization Content locality ensures that content stays within organization Path locality prevents: Path locality prevents: malicious forwarders malicious forwarders analysis of internal traffic analysis of internal traffic external tampering external tampering - Easier to target organizations: Someone creates one million nodes with name prefixes microsofa.com and microsort.com Someone creates one million nodes with name prefixes microsofa.com and microsort.com Most traffic to/from Microsoft will go through a microsofa / microsort intermediate node Most traffic to/from Microsoft will go through a microsofa / microsort intermediate node

Talk Outline Practical data locality requirements Practical data locality requirements Basic SkipNet design Basic SkipNet design SkipNet locality properties SkipNet locality properties Performance evaluation Performance evaluation Conclusions Conclusions

Methodology Packet-level, event-driven simulator: Packet-level, event-driven simulator: SkipNet implementation SkipNet implementation Basic SkipNet Basic SkipNet Full SkipNet = Basic SkipNet + network proximity Full SkipNet = Basic SkipNet + network proximity Pastry and Chord implementation Pastry and Chord implementation Uses Mercator and GT-ITM network topologies Uses Mercator and GT-ITM network topologies Experimentally evaluated: Experimentally evaluated: Name ID routing performance Name ID routing performance Tolerance to organizational disconnect Tolerance to organizational disconnect

Methodology Packet-level, event-driven simulator: Packet-level, event-driven simulator: SkipNet implementation SkipNet implementation Basic SkipNet Basic SkipNet Full SkipNet = Basic SkipNet + network proximity Full SkipNet = Basic SkipNet + network proximity Pastry and Chord implementation Pastry and Chord implementation Uses Mercator and GT-ITM network topologies Uses Mercator and GT-ITM network topologies Experimentally evaluated: Experimentally evaluated: Name ID routing performance Name ID routing performance Tolerance to organizational disconnect Tolerance to organizational disconnect Numeric ID routing performance Numeric ID routing performance Effectiveness of network proximity optimizations Effectiveness of network proximity optimizations Effectiveness of CLB routing optimizations Effectiveness of CLB routing optimizations

Routing by Name ID Performance Benefits come at no extra cost

Surviving Organizational Disconnect Disconnected Org Size = 15% of all nodes

Conclusions SkipNet : SkipNet : Traditional overlay functionality Traditional overlay functionality Explicit control of data placement Explicit control of data placement Constrained load balancing Constrained load balancing Content + Path Locality are basic ingredients to: Content + Path Locality are basic ingredients to: Data controllability Data controllability Manageability Manageability Security Security Data availability Data availability Performance Performance

Questions?