CS522: Algorithmic and Economic Aspects of the Internet Instructors: Nicole Immorlica Mohammad Mahdian

Slides:



Advertisements
Similar presentations
P2P data retrieval DHT (Distributed Hash Tables) Partially based on Hellerstein’s presentation at VLDB2004.
Advertisements

Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, Hari Balakrishnan MIT and Berkeley presented by Daniel Figueiredo Chord: A Scalable Peer-to-peer.
Peer to Peer and Distributed Hash Tables
CHORD – peer to peer lookup protocol Shankar Karthik Vaithianathan & Aravind Sivaraman University of Central Florida.
Technische Universität Yimei Liao Chemnitz Kurt Tutschku Vertretung - Professur Rechner- netze und verteilte Systeme Chord - A Distributed Hash Table Yimei.
Technische Universität Chemnitz Kurt Tutschku Vertretung - Professur Rechner- netze und verteilte Systeme Chord - A Distributed Hash Table Yimei Liao.
CHORD: A Peer-to-Peer Lookup Service CHORD: A Peer-to-Peer Lookup Service Ion StoicaRobert Morris David R. Karger M. Frans Kaashoek Hari Balakrishnan Presented.
Chord: A scalable peer-to- peer lookup service for Internet applications Ion Stoica, Robert Morris, David Karger, M. Frans Kaashock, Hari Balakrishnan.
Chord: A Scalable Peer-to-Peer Lookup Service for Internet Applications Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, Hari Balakrishnan Presented.
1 David Liben-Nowell, Hari Balakrishnan, David Karger Analysis of the Evolution of Peer-to-Peer Systems Speaker: Jan Conrad.
Robert Morris, M. Frans Kaashoek, David Karger, Hari Balakrishnan, Ion Stoica, David Liben-Nowell, Frank Dabek Chord: A scalable peer-to-peer look-up protocol.
Massively Distributed Database Systems Distributed Hash Spring 2014 Ki-Joune Li Pusan National University.
1 An Overview of Gnutella. 2 History The Gnutella network is a fully distributed alternative to the centralized Napster. Initial popularity of the network.
Peer-to-Peer Networks João Guerreiro Truong Cong Thanh Department of Information Technology Uppsala University.
Peer to Peer File Sharing Huseyin Ozgur TAN. What is Peer-to-Peer?  Every node is designed to(but may not by user choice) provide some service that helps.
Topics in Reliable Distributed Systems Lecture 2, Fall Dr. Idit Keidar.
Introduction to Peer-to-Peer (P2P) Systems Gabi Kliot - Computer Science Department, Technion Concurrent and Distributed Computing Course 28/06/2006 The.
Spring 2003CS 4611 Peer-to-Peer Networks Outline Survey Self-organizing overlay network File system on top of P2P network Contributions from Peter Druschel.
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 2: Peer-to-Peer.
Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek and Hari alakrishnan.
Chord-over-Chord Overlay Sudhindra Rao Ph.D Qualifier Exam Department of ECECS.
Topics in Reliable Distributed Systems Fall Dr. Idit Keidar.
1 CS 194: Distributed Systems Distributed Hash Tables Scott Shenker and Ion Stoica Computer Science Division Department of Electrical Engineering and Computer.
Peer To Peer Distributed Systems Pete Keleher. Why Distributed Systems? l Aggregate resources! –memory –disk –CPU cycles l Proximity to physical stuff.
1 Peer-to-Peer Networks Outline Survey Self-organizing overlay network File system on top of P2P network Contributions from Peter Druschel.
1CS 6401 Peer-to-Peer Networks Outline Overview Gnutella Structured Overlays BitTorrent.
INTRODUCTION TO PEER TO PEER NETWORKS Z.M. Joseph CSE 6392 – DB Exploration Spring 2006 CSE, UT Arlington.
1 Napster & Gnutella An Overview. 2 About Napster Distributed application allowing users to search and exchange MP3 files. Written by Shawn Fanning in.
A Survey of Peer-to-Peer Content Distribution Technologies Stephanos Androutsellis-Theotokis and Diomidis Spinellis ACM Computing Surveys, December 2004.
Information-Centric Networks05b-1 Week 5 / Paper 2 A survey of peer-to-peer content distribution technologies –Stephanos Androutsellis-Theotokis, Diomidis.
Content Overlays (Nick Feamster). 2 Content Overlays Distributed content storage and retrieval Two primary approaches: –Structured overlay –Unstructured.
Introduction of P2P systems
Using the Small-World Model to Improve Freenet Performance Hui Zhang Ashish Goel Ramesh Govindan USC.
1 Reading Report 5 Yin Chen 2 Mar 2004 Reference: Chord: A Scalable Peer-To-Peer Lookup Service for Internet Applications, Ion Stoica, Robert Morris, david.
Chord: A Scalable Peer-to-peer Lookup Protocol for Internet Applications Xiaozhou Li COS 461: Computer Networks (precept 04/06/12) Princeton University.
1 Distributed Hash Tables (DHTs) Lars Jørgen Lillehovde Jo Grimstad Bang Distributed Hash Tables (DHTs)
Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, Hari Balakrishnan MIT and Berkeley presented by Daniel Figueiredo Chord: A Scalable Peer-to-peer.
Presentation 1 By: Hitesh Chheda 2/2/2010. Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, Hari Balakrishnan MIT Laboratory for Computer Science.
An IP Address Based Caching Scheme for Peer-to-Peer Networks Ronaldo Alves Ferreira Joint work with Ananth Grama and Suresh Jagannathan Department of Computer.
Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications.
Peer-to-Peer Network Tzu-Wei Kuo. Outline What is Peer-to-Peer(P2P)? P2P Architecture Applications Advantages and Weaknesses Security Controversy.
Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, Hari Balakrishnan Presented.
SIGCOMM 2001 Lecture slides by Dr. Yingwu Zhu Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications.
1 Peer-to-Peer Technologies Seminar by: Kunal Goswami (05IT6006) School of Information Technology Guided by: Prof. C.R.Mandal, School of Information Technology.
Peer to Peer A Survey and comparison of peer-to-peer overlay network schemes And so on… Chulhyun Park
1 Secure Peer-to-Peer File Sharing Frans Kaashoek, David Karger, Robert Morris, Ion Stoica, Hari Balakrishnan MIT Laboratory.
Chord Advanced issues. Analysis Theorem. Search takes O (log N) time (Note that in general, 2 m may be much larger than N) Proof. After log N forwarding.
Peer to Peer Network Design Discovery and Routing algorithms
Information-Centric Networks Section # 5.2: Content Distribution Instructor: George Xylomenos Department: Informatics.
Algorithms and Techniques in Structured Scalable Peer-to-Peer Networks
LOOKING UP DATA IN P2P SYSTEMS Hari Balakrishnan M. Frans Kaashoek David Karger Robert Morris Ion Stoica MIT LCS.
CS Spring 2014 CS 414 – Multimedia Systems Design Lecture 37 – Introduction to P2P (Part 1) Klara Nahrstedt.
Two Peer-to-Peer Networking Approaches Ken Calvert Net Seminar, 23 October 2001 Note: Many slides “borrowed” from S. Ratnasamy’s Qualifying Exam talk.
INTERNET TECHNOLOGIES Week 10 Peer to Peer Paradigm 1.
CS 347Notes081 CS 347: Parallel and Distributed Data Management Notes 08: P2P Systems.
P2P Search COP6731 Advanced Database Systems. P2P Computing  Powerful personal computer Share computing resources P2P Computing  Advantages: Shared.
P2P Search COP P2P Search Techniques Centralized P2P systems  e.g. Napster, Decentralized & unstructured P2P systems  e.g. Gnutella.
CS694 - DHT1 Distributed Hash Table Systems Hui Zhang University of Southern California.
Distributed Hash Tables (DHT) Jukka K. Nurminen *Adapted from slides provided by Stefan Götz and Klaus Wehrle (University of Tübingen)
CS 425 / ECE 428 Distributed Systems Fall 2015 Indranil Gupta (Indy) Peer-to-peer Systems All slides © IG.
Chord: A Scalable Peer-to-Peer Lookup Service for Internet Applications * CS587x Lecture Department of Computer Science Iowa State University *I. Stoica,
CS Spring 2010 CS 414 – Multimedia Systems Design Lecture 24 – Introduction to Peer-to-Peer (P2P) Systems Klara Nahrstedt (presented by Long Vu)
Peer-to-Peer Information Systems Week 12: Naming
A Survey of Peer-to-Peer Content Distribution Technologies Stephanos Androutsellis-Theotokis and Diomidis Spinellis ACM Computing Surveys, December 2004.
(slides by Nick Feamster)
EE 122: Peer-to-Peer (P2P) Networks
DHT Routing Geometries and Chord
MIT LCS Proceedings of the 2001 ACM SIGCOMM Conference
Peer-to-Peer Information Systems Week 12: Naming
Presentation transcript:

CS522: Algorithmic and Economic Aspects of the Internet Instructors: Nicole Immorlica Mohammad Mahdian

Peer-To-Peer Networks A network in which nodes employ distributed resources to accomplish critical task Nodes are typically equals, i.e. (approximately) indistinguishable in functionality System is highly dynamic, nodes frequently come and go

P2P Definition Distributed systems consisting of interconnected nodes able to self-organize into network topologies with the purpose of sharing resources such as content, CPU cycles, storage and bandwidth, capable of adapting to failures and accommodating transient populations of nodes while maintaining acceptable connectivity and performance, without requiring the intermediation or support of a global centralized server or authority. – A Survey of Peer-To-Peer Content Distribution Technologies, Androutsellis-Theotokis and Spinellis

Peer-To-Peer Applications Direct real-time communication: instant messaging Combine processing power of multiple distributed machines to perform complex computations: analysis of SETI data, prime computation Store and distribute digital content: mp3 file sharing

Peer-To-Peer Benefits Self-organized and adaptive Easily scalable Fault-tolerant and load balanced Resistant to censorship

P2P Construction MIT Clients Servers SPRINT UUNET AOL P2P overlay network

P2P Classification Unstructured Loosely Structured Highly Structured HybridNapster, IM PartialKazaa, Gia NoneGnutellaFreenetChord, CAN Centralization Data organization

Napster, IM Centralized servers maintain list of files and peer at which file is stored Peers join, leave, and query network via direct communication with servers File transfers occur directly between peers Server Query: U2 Reply: 6 File Transfer

Napster, IM Advantages:  Highly efficient data lookup  Rapidly adapts to changes in network Disadvantages:  Questionable scalability  Vulnerable to censorship, failure, attack

Gnutella All peers, called servents, are identical and function as both servers and clients A peer joins network by contacting existing servents (chosen from online databases) using PING messages A servent receiving a PING message replies with a PONG message and forwards PING to other servents Peer connects to servents who send PONG

Gnutella A servent queries network by sending a QUERY message A servent receiving a QUERY message replies with a QUERYHIT message if he can answer the query. If not, he forwards QUERY message to other servents

Routing in Gnutella How PING/QUERY messages are forwarded affects network topology, search efficiency/accuracy, and scalability Proposals  Breadth-First-Search: flooding, iterative deepening, modified random BFS  Depth-First-Search: random walk, k-walker random walks, two-level random walk, dominating set based search

Hybrid Search Random walk with lookahead: short random walks with shallow local flooding Takes advantage of “supernodes” or nodes of high degree  Stationary dist. of random walk is naturally biased towards supernodes  Lookahead allows search to quickly discover content stored at all neighbors of these high- degree nodes

Supernodes Improve scalability and performance of Gnutella-like systems via supernodes Supernodes are special peers with high degree, elected dynamically according to bandwidth and other considerations Supernodes maintain a list of content stored at peers Advantages:  Searches propagate on supernodes, 3 to 5 times faster  Takes advantage of heterogeneity in network

Gnutella Advantages  Entirely decentralized, pure P2P network  Highly resistant to failure Disadvantages  Search is time-consuming  Network typically scales poorly

Chord Distributed hash table (DHT) implementation Each node/piece of content has an ID Content IDs are deterministically mapped to node IDs so a searcher knows exactly where data is located, a content addressable network Efficient: O(log n) messages per lookup Scalable: O(log n) state per node

Keys in Chord m bit identifier space for both nodes and content keys Content ID = hash(content) Node ID = hash(IP address) Both are uniformly distributed How to map content IDs to node IDs?

N32 N90 N123 K20 K5 Circular 7-bit ID space 0 IP=“ ” K101 K60 Content = “U2” Mapping Content to Nodes Content is stored at successor node, node with next higher ID Figure adapted from Stoica et al.

Routing Every node knows of every other node  Routing tables O(n), lookup O(1) N32 N90 N123 Hash(“U2”) = K60 N10 N55 Where is “U2”? “N90 has K60” K60 Figure adapted from Stoica et al.

Routing Every node knows its successor in ring  Routing tables O(1), lookup O(n) N32 N90 N123 Hash(“U2”) = K60 N10 N55 Where is “U2”? “N90 has K60” K60 Figure adapted from Stoica et al.

Routing Every node knows m others Distances increase exponentially, node i points to node whose ID is successor of i + 2 j for j from 1 to m. These pointers are called fingers. The finger (routing) table and search time are both O(log n)

Finger Tables N N112 N96 N Figure adapted from Stoica et al.

Routing with Finger Tables N32 N10 N5 N20 N110 N99 N80 N60 Lookup(K19) K19 Figure adapted from Stoica et al.

Chord Dynamics When a node joins  Initialize all fingers of new node  Update fingers of existing nodes  Transfer content from successor to new node When a node leaves  Transfer content to successor

Chord Failures Churn rate is very high (on average, nodes are in system for only 60 minutes) and events happen concurrently Churn (esp. ungraceful departures or simultaneous joins/departures) can failure states, e.g. inconsistencies in successor relationships or, worse, loopy states Requires a lot of maintenance messages to preserve ideal state

Maintenance in P2P Maintenance protocol ensures global connectivity and efficient lookup by continuously repairing overlay network and routing tables Maintenance is essential, e.g. when a node  Joins and announces presence  Updates routing table to ensure efficient search  Monitors neighbors for failures/departures Cost of maintenance protocol can be measured in terms of the rate of maintenance messages

Half Life Defn. Suppose there are N t nodes at time t. Let the doubling time  t be such that at time t +  t, N t new nodes have arrived. Similarly let halving time  t be such that at time t +  t, N t /2 nodes have departed. Then the half life of a system is min t (  t,  t ).  The half life is the average amount of time until half the system has been replaced.  Measures rate of change of system.

Example Nodes arrive according to Poisson with rate : prob. k arrivals in time t proportional to e - t Nodes remain for duration exponential rate  : prob. node stays for  amount of time is e  If system in steady state, then arrival rate must equal departure rate  N, so N = / . Doubling time   = N/ = 1/ , halving time   = (ln 2)/ , and so half life = (ln 2)/ .

Bounding Maintenance Costs Thm. There exists a sequence of joins and leaves such that any node that, at any time, has received an average of fewer than k notifications per half-life will be disconnected from the network with prob. at least (1 – 1/e) k. Cor. Any N-node P2P network that remains connected with prob. at least 1 – 1/N must generate an average of  (log N) notifications per node per half life.

Proof of Bound Consider Poisson arrival rate, exponential waiting time  =1 in system. Suppose node n averages fewer than k notifications per half life and so there is a minimum time t such that at time t, n has received less than kt notifications. Observe n is isolated at time t with probability at least (1 – 1/(e-1)) k.

Maintenance in Chord Liben-Nowell, Balakrishnan, Karger: (Modified) Chord requires only O(log 2 n) maintenance messages per half life to maintain efficiency and correctness of search.

Chord Advantages:  Highly efficient search  Good load balancing Disadvantages:  Locality of data is destroyed  Only handles exact match queries, but keyword queries are more prevalent  Most requests are for highly replicated files (needles vs haystack)

Conclusion Saw several representative P2P systems, each with advantages and disadvantages Many important issues  Efficiency of search  Ability to adapt to dynamics of system  Security: Malicious peers, Spread of worms  Free riding: Reputation mechanisms, Micro- payment mechanisms Legality