CS 268: Lecture 22 (Peer-to-Peer Networks)

Slides:



Advertisements
Similar presentations
Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, Hari Balakrishnan MIT and Berkeley presented by Daniel Figueiredo Chord: A Scalable Peer-to-peer.
Advertisements

Peer to Peer and Distributed Hash Tables
Scalable Content-Addressable Network Lintao Liu
Peer-to-Peer Systems Chapter 25. What is Peer-to-Peer (P2P)? Napster? Gnutella? Most people think of P2P as music sharing.
Chord: A scalable peer-to- peer lookup service for Internet applications Ion Stoica, Robert Morris, David Karger, M. Frans Kaashock, Hari Balakrishnan.
Denial-of-Service Resilience in Peer-to-Peer Systems D. Dumitriu, E. Knightly, A. Kuzmanovic, I. Stoica and W. Zwaenepoel Presenter: Yan Gao.
Application Layer Overlays IS250 Spring 2010 John Chuang.
Peer to Peer File Sharing Huseyin Ozgur TAN. What is Peer-to-Peer?  Every node is designed to(but may not by user choice) provide some service that helps.
Topics in Reliable Distributed Systems Lecture 2, Fall Dr. Idit Keidar.
Introduction to Peer-to-Peer (P2P) Systems Gabi Kliot - Computer Science Department, Technion Concurrent and Distributed Computing Course 28/06/2006 The.
Distributed Lookup Systems
Overlay Networks EECS 122: Lecture 18 Department of Electrical Engineering and Computer Sciences University of California Berkeley.
Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek and Hari alakrishnan.
Content Addressable Networks. CAN Associate with each node and item a unique id in a d-dimensional space Goals –Scales to hundreds of thousands of nodes.
Chord-over-Chord Overlay Sudhindra Rao Ph.D Qualifier Exam Department of ECECS.
Freenet A Distributed Anonymous Information Storage and Retrieval System I Clarke O Sandberg I Clarke O Sandberg B WileyT W Hong.
Topics in Reliable Distributed Systems Fall Dr. Idit Keidar.
1 CS 194: Distributed Systems Distributed Hash Tables Scott Shenker and Ion Stoica Computer Science Division Department of Electrical Engineering and Computer.
Decentralized Location Services CS273 Guest Lecture April 24, 2001 Ben Y. Zhao.
CS 268: Overlay Networks: Distributed Hash Tables Kevin Lai May 1, 2001.
1 EE 122: Overlay Networks and p2p Networks Ion Stoica TAs: Junda Liu, DK Moon, David Zats (Materials with thanks.
Peer-to-Peer Networks Slides largely adopted from Ion Stoica’s lecture at UCB.
1 Freenet  Addition goals to file location: -Provide publisher anonymity, security -Resistant to attacks – a third party shouldn’t be able to deny the.
1CS 6401 Peer-to-Peer Networks Outline Overview Gnutella Structured Overlays BitTorrent.
INTRODUCTION TO PEER TO PEER NETWORKS Z.M. Joseph CSE 6392 – DB Exploration Spring 2006 CSE, UT Arlington.
Freenet. Anonymity  Napster, Gnutella, Kazaa do not provide anonymity  Users know who they are downloading from  Others know who sent a query  Freenet.
Peer-to-Peer Computing CS587x Lecture Department of Computer Science Iowa State University.
Peer-to-Peer Networks and Distributed Hash Tables 2006.
Content Overlays (Nick Feamster). 2 Content Overlays Distributed content storage and retrieval Two primary approaches: –Structured overlay –Unstructured.
Chord & CFS Presenter: Gang ZhouNov. 11th, University of Virginia.
CS162 Operating Systems and Systems Programming Lecture 23 Peer-to-Peer Systems April 18, 2011 Anthony D. Joseph and Ion Stoica
Freenet File sharing for a political world. Freenet: A Distributed Anonymous Information Storage and Retrieval System I. Clarke, O. Sandberg, B. Wiley,
Using the Small-World Model to Improve Freenet Performance Hui Zhang Ashish Goel Ramesh Govindan USC.
Chord: A Scalable Peer-to-peer Lookup Protocol for Internet Applications Xiaozhou Li COS 461: Computer Networks (precept 04/06/12) Princeton University.
Security Michael Foukarakis – 13/12/2004 A Survey of Peer-to-Peer Security Issues Dan S. Wallach Rice University,
Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, Hari Balakrishnan MIT and Berkeley presented by Daniel Figueiredo Chord: A Scalable Peer-to-peer.
Presentation 1 By: Hitesh Chheda 2/2/2010. Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, Hari Balakrishnan MIT Laboratory for Computer Science.
SIGCOMM 2001 Lecture slides by Dr. Yingwu Zhu Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications.
1 Peer-to-Peer Technologies Seminar by: Kunal Goswami (05IT6006) School of Information Technology Guided by: Prof. C.R.Mandal, School of Information Technology.
15-744: Computer Networking L-22: P2P. Lecture 22: Peer-to-Peer Networks Typically each member stores/provides access to content Has quickly.
1 Secure Peer-to-Peer File Sharing Frans Kaashoek, David Karger, Robert Morris, Ion Stoica, Hari Balakrishnan MIT Laboratory.
Computer Networking P2P. Why P2P? Scaling: system scales with number of clients, by definition Eliminate centralization: Eliminate single point.
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 2: Distributed Hash.
1 9/30/2009 Peer-to-Peer Systems: Unstructured. Admin. r Programming assignment 1 linked on the schedule page 2.
15-744: Computer Networking L-22: P2P. L -22; © Srinivasan Seshan, P2P Peer-to-peer networks Assigned reading [Cla00] Freenet: A Distributed.
LOOKING UP DATA IN P2P SYSTEMS Hari Balakrishnan M. Frans Kaashoek David Karger Robert Morris Ion Stoica MIT LCS.
15-744: Computer Networking L-23: P2P. L -23; © Srinivasan Seshan, P2P Peer-to-peer networks Assigned reading [Cla00] Freenet: A Distributed.
P2P Search COP P2P Search Techniques Centralized P2P systems  e.g. Napster, Decentralized & unstructured P2P systems  e.g. Gnutella.
Large Scale Sharing Marco F. Duarte COMP 520: Distributed Systems September 19, 2004.
CS 425 / ECE 428 Distributed Systems Fall 2015 Indranil Gupta (Indy) Peer-to-peer Systems All slides © IG.
Chord: A Scalable Peer-to-Peer Lookup Service for Internet Applications * CS587x Lecture Department of Computer Science Iowa State University *I. Stoica,
CS Spring 2010 CS 414 – Multimedia Systems Design Lecture 24 – Introduction to Peer-to-Peer (P2P) Systems Klara Nahrstedt (presented by Long Vu)
Distributed Web Systems Peer-to-Peer Systems Lecturer Department University.
Peer-to-Peer Information Systems Week 12: Naming
A Survey of Peer-to-Peer Content Distribution Technologies Stephanos Androutsellis-Theotokis and Diomidis Spinellis ACM Computing Surveys, December 2004.
Ion Stoica, Robert Morris, David Liben-Nowell, David R. Karger, M
CASCADE: AN ATTACK-RESISTANT DHT WITH MINIMAL HARD STATE
A Scalable Peer-to-peer Lookup Service for Internet Applications
Scott Shenker and Ion Stoica April 11, 2005
Computer Science Division
EE 122: Peer-to-Peer (P2P) Networks
CS 268: Peer-to-Peer Networks and Distributed Hash Tables
DHT Routing Geometries and Chord
Presentation by Theodore Mao CS294-4: Peer-to-peer Systems
CS 162: P2P Networks Computer Science Division
An Overview of Peer-to-Peer
Peer-to-Peer Networks and Distributed Hash Tables
MIT LCS Proceedings of the 2001 ACM SIGCOMM Conference
Peer-to-Peer Information Systems Week 12: Naming
Presentation transcript:

CS 268: Lecture 22 (Peer-to-Peer Networks) Ion Stoica April 10, 2001

How Did it Start? A killer application: Naptser Free music over the Internet Key idea: share the storage and bandwidth of individual (home) users Internet istoica@cs.berkeley.edu

Model Each user stores a subset of files Each user has access (can download) files from all users in the system istoica@cs.berkeley.edu

Main Challenge Find where a particular file is stored Note: problem similar to finding a particular page in web caching (see last lecture – what are the differences?) E F D E? C A B istoica@cs.berkeley.edu

Other Challenges Scale: up to hundred of thousands or millions of machines Dynamicity: machines can come and go any time istoica@cs.berkeley.edu

Napster Assume a centralized index system that maps files (songs) to machines that are alive How to find a file (song) Query the index system  return a machine that stores the required file Ideally this is the closest/least-loaded machine ftp the file Advantages: Simplicity, easy to implement sophisticated search engines on top of the index system Disadvantages: Robustness, scalability (?) istoica@cs.berkeley.edu

Napster: Example E m5 E m6 E? F D m1 A m2 B m3 C m4 D m5 E m6 F m4 E? istoica@cs.berkeley.edu

Gnutella Distribute file location Idea: multicast the request Hot to find a file: Send request to all neighbors Neighbors recursively multicast the request Eventually a machine that has the file receives the request, and it sends back the answer Advantages: Totally decentralized, highly robust Disadvantages: Not scalable; the entire network can be swamped with request (to alleviate this problem, each request has a TTL) istoica@cs.berkeley.edu

Gnutella: Example Assume: m1’s neighbors are m2 and m3; m3’s neighbors are m4 and m5;… m5 E m6 E E? F D m4 E? C A B m3 m1 m2 istoica@cs.berkeley.edu

Freenet Addition goals to file location: Architecture: Provide publisher anonymity, security Resistant to attacks – a third party shouldn’t be able to deny the access to a particular file (data item, object), even if it compromises a large fraction of machines Architecture: Each file is identified by a unique identifier Each machine stores a set of files, and maintains a “routing table” to route the individual requests istoica@cs.berkeley.edu

Data Structure … … Each node maintains a common stack Forwarding: id – file identifier next_hop – another node that store the file id file – file identified by id being stored on the local node Forwarding: Each message contains the file id it is referring to If file id stored locally, then stop; If not, search for the “closest” id in the stack, and forward the message to the corresponding next_hop id next_hop file … … istoica@cs.berkeley.edu

Query API: file = query(id); Upon receiving a query for document id Check whether the queried file is stored locally If yes, return it If not, forward the query message Notes: Each query is associated a TTL that is decremented each time the query message is forwarded; to obscure distance to originator: TTL can be initiated to a random value within some bounds When TTL=1, the query is forwarded with a finite probability Each node maintains the state for all outstanding queries that have traversed it  help to avoid cycles When file is returned it is cached along the reverse path istoica@cs.berkeley.edu

Query Example Note: doesn’t show file caching on the reverse path 1 4 n1 f4 12 n2 f12 5 n3 9 n3 f9 4’ 4 n4 n5 2 14 n5 f14 13 n2 f13 3 n6 5 4 n1 f4 10 n5 f10 8 n6 3 n3 3 n1 f3 14 n4 f14 5 n3 Note: doesn’t show file caching on the reverse path istoica@cs.berkeley.edu

Insert API: insert(id, file); Two steps Search for the file to be inserted If found, report collision if number of nodes exhausted report failure If not found, insert the file istoica@cs.berkeley.edu

Insert Searching: like query, but nodes maintain state after a collision is detected and the reply is sent back to the originator Insertion Follow the forward path; insert the file at all nodes along the path A node probabilistically replace the originator with itself; obscure the true originator istoica@cs.berkeley.edu

Insert Example Assume query returned failure along “gray” path; insert f10 insert(10, f10) n1 n2 4 n1 f4 12 n2 f12 5 n3 9 n3 f9 n4 n5 14 n5 f14 13 n2 f13 3 n6 4 n1 f4 11 n5 f11 8 n6 n3 3 n1 f3 14 n4 f14 5 n3 istoica@cs.berkeley.edu

Insert Example insert(10, f10) n1 n2 orig=n1 10 n1 f10 4 n1 f4 12 n2 istoica@cs.berkeley.edu

Insert Example n2 replaces the originator (n1) with itself insert(10, f10) n1 n2 10 n1 f10 4 n1 f4 12 n2 10 n1 f10 9 n3 f9 n4 n5 orig=n2 14 n5 f14 13 n2 f13 3 n6 4 n1 f4 11 n5 f11 8 n6 n3 10 n2 10 3 n1 f3 14 n4 istoica@cs.berkeley.edu

Insert Example n2 replaces the originator (n1) with itself Insert(10, f10) n1 n2 10 n1 f10 4 n1 f4 12 n2 10 n1 f10 9 n3 f9 n4 n5 10 n2 f10 14 n5 f14 13 n2 10 n4 f10 4 n1 f4 11 n5 n3 10 n2 10 3 n1 f3 14 n4 istoica@cs.berkeley.edu

Freenet Properties Newly queried/inserted files are stored on nodes with similar ids New nodes can announce themselves by inserting files Attempts to supplant or discover existing files will just spread the files istoica@cs.berkeley.edu

Freenet Summary Advantages Disadvantages Provides publisher anonymity Totally decentralize architecture  robust and scalable Resistant against malicious file deletion Disadvantages Does not always guarantee that a file is found, even if the file is in the network istoica@cs.berkeley.edu

Other Solutions to the Location Problem Goal: make sure that an item (file) identified is always found Abstraction: a distributed hash-table data structure insert(id, item); item = query(id); Note: item can be anything: a data object, document, file, pointer to a file… Proposals CAN (ACIRI/Berkeley) Chord (MIT/Berkeley) Pastry (Rice) Tapestry (Berkeley) istoica@cs.berkeley.edu

Content Addressable Network (CAN) Associate to each node and item a unique id in an d-dimensional space Properties Routing table size O(d) Guarantees that a file is found in at most d*n1/d steps, where n is the total number of nodes istoica@cs.berkeley.edu

CAN Example: Two Dimensional Space Space divided between nodes All nodes cover the entire space Each node covers either a square or a rectangular area of ratios 1:2 or 2:1 Example: Assume space size (8 x 8) Node n1:(1, 2) first node that joins  cover the entire space 7 6 5 4 3 n1 2 1 1 2 3 4 5 6 7 istoica@cs.berkeley.edu

CAN Example: Two Dimensional Space Node n2:(4, 2) joins  space is divided between n1 and n2 7 6 5 4 3 n1 n2 2 1 1 2 3 4 5 6 7 istoica@cs.berkeley.edu

CAN Example: Two Dimensional Space Node n2:(4, 2) joins  space is divided between n1 and n2 7 6 n3 5 4 3 n1 n2 2 1 1 2 3 4 5 6 7 istoica@cs.berkeley.edu

CAN Example: Two Dimensional Space Nodes n4:(5, 5) and n5:(6,6) join 7 6 n5 n4 n3 5 4 3 n1 n2 2 1 1 2 3 4 5 6 7 istoica@cs.berkeley.edu

CAN Example: Two Dimensional Space Nodes: n1:(1, 2); n2:(4,2); n3:(3, 5); n4:(5,5);n5:(6,6) Items: f1:(2,3); f2:(5,1); f3:(2,1); f4:(7,5); 7 6 n5 n4 n3 5 f4 4 f1 3 n1 n2 2 f3 1 f2 1 2 3 4 5 6 7 istoica@cs.berkeley.edu

CAN Example: Two Dimensional Space Each item is stored by the node who owns its mapping in the space 7 6 n5 n4 n3 5 f4 4 f1 3 n1 n2 2 f3 1 f2 1 2 3 4 5 6 7 istoica@cs.berkeley.edu

CAN: Query Example Each node knows its neighbors in the d-space Forward query to the neighbor that is closest to the query id Example: assume n1 queries f4 7 6 n5 n4 n3 5 f4 4 f1 3 n1 n2 2 f3 1 f2 1 2 3 4 5 6 7 istoica@cs.berkeley.edu

Chord Associate to each node and item a unique id in an uni-dimensional space Properties Routing table size O(log(N)) , where N is the total number of nodes Guarantees that a file is found in O(log(N)) steps istoica@cs.berkeley.edu

Data Structure Assume identifier space is 0..2m Each node maintains Finger table Entry i in the finger table of n is the first node that succeeds or equals n + 2i Predecessor node An item identified by id is stored on the succesor node of id istoica@cs.berkeley.edu

Chord Example Assume an identifier space 0..8 Node n1:(1) joinsall entries in its finger table are initialized to itself Succ. Table i id+2i succ 0 2 1 1 3 1 2 5 1 1 7 6 2 5 3 4 istoica@cs.berkeley.edu

Chord Example Node n2:(3) joins 1 7 6 2 5 3 4 Succ. Table i id+2i succ i id+2i succ 0 2 2 1 3 1 2 5 1 1 7 6 2 Succ. Table i id+2i succ 0 3 1 1 4 1 2 6 1 5 3 4 istoica@cs.berkeley.edu

Chord Example Nodes n3:(0), n4:(6) join 1 7 6 2 5 3 4 Succ. Table i id+2i succ 0 1 1 1 2 2 2 4 0 Succ. Table i id+2i succ 0 2 2 1 3 6 2 5 6 1 7 Succ. Table i id+2i succ 0 7 0 1 0 0 2 2 2 6 2 Succ. Table i id+2i succ 0 3 6 1 4 6 2 6 6 5 3 4 istoica@cs.berkeley.edu

Chord Examples Nodes: n1:(1), n2(3), n3(0), n4(6) Succ. Table Items Nodes: n1:(1), n2(3), n3(0), n4(6) Items: f1:(7), f2:(2) i id+2i succ 0 1 1 1 2 2 2 4 0 7 Succ. Table Items 1 7 i id+2i succ 0 2 2 1 3 6 2 5 6 1 Succ. Table 6 2 i id+2i succ 0 7 0 1 0 0 2 2 2 Succ. Table i id+2i succ 0 3 6 1 4 6 2 6 6 5 3 4 istoica@cs.berkeley.edu

Query Upon receiving a query for item id, a node Check whether stores the item locally If not, forwards the query to the largest node in its successor table that does not exceed id Succ. Table Items i id+2i succ 0 1 1 1 2 2 2 4 0 7 Succ. Table Items 1 7 i id+2i succ 0 2 2 1 3 6 2 5 6 1 query(7) Succ. Table 6 2 i id+2i succ 0 7 0 1 0 0 2 2 2 Succ. Table i id+2i succ 0 3 6 1 4 6 2 6 6 5 3 4 istoica@cs.berkeley.edu

Discussion Query can be implemented Iteratively Recursively Performance: routing in the overlay network can be more expensive than in the underlying network Because usually there is no correlation between node ids and their locality; a query can repeatedly jump from Europe to North America, though both the initiator and the node that store the item are in Europe! Solutions: Tapestry takes care of this implicitly; CAN and Chord maintain multiple copies for each entry in their routing tables and choose the closest in terms of network distance istoica@cs.berkeley.edu

Discussion Robustness Security Maintain multiple copies associated to each entry in the routing tables Replicate an item on nodes with close ids in the identifier space Security Can be build on top of CAN, Chord, Tapestry, and Pastry istoica@cs.berkeley.edu

Conclusions The key challenge of building wide area P2P systems is a scalable and robust location service Solutions covered in this lecture Naptser: centralized location service Gnutella: broadcast-based decentralized location service Freenet: intelligent-routing decentralized solution (but correctness not guaranteed; queries for existing items may fail) CAN, Chord, Tapestry, Pastry: intelligent-routing decentralized solution Guarantee correctness Tapestry (Pastry ?) provide efficient routing, but more complex istoica@cs.berkeley.edu