15-744: Computer Networking L-23: P2P. L -23; 12-03-02© Srinivasan Seshan, 20022 P2P Peer-to-peer networks Assigned reading [Cla00] Freenet: A Distributed.

Slides:



Advertisements
Similar presentations
CAN 1.Distributed Hash Tables a)DHT recap b)Uses c)Example – CAN.
Advertisements

Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, Hari Balakrishnan MIT and Berkeley presented by Daniel Figueiredo Chord: A Scalable Peer-to-peer.
Peer to Peer and Distributed Hash Tables
CHORD: A Peer-to-Peer Lookup Service CHORD: A Peer-to-Peer Lookup Service Ion StoicaRobert Morris David R. Karger M. Frans Kaashoek Hari Balakrishnan Presented.
Chord: A Scalable Peer-to-peer Lookup Protocol for Internet Applications Speaker: Cathrin Weiß 11/23/2004 Proseminar Peer-to-Peer Information Systems.
Robert Morris, M. Frans Kaashoek, David Karger, Hari Balakrishnan, Ion Stoica, David Liben-Nowell, Frank Dabek Chord: A scalable peer-to-peer look-up protocol.
Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications Robert Morris Ion Stoica, David Karger, M. Frans Kaashoek, Hari Balakrishnan MIT.
Peer-to-Peer Networks João Guerreiro Truong Cong Thanh Department of Information Technology Uppsala University.
Peer to Peer File Sharing Huseyin Ozgur TAN. What is Peer-to-Peer?  Every node is designed to(but may not by user choice) provide some service that helps.
1 Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications Robert Morris Ion Stoica, David Karger, M. Frans Kaashoek, Hari Balakrishnan.
Topics in Reliable Distributed Systems Lecture 2, Fall Dr. Idit Keidar.
Distributed Lookup Systems
Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek and Hari alakrishnan.
Gnutella, Freenet and Peer to Peer Networks By Norman Eng Steven Hnatko George Papadopoulos.
Content Addressable Networks. CAN Associate with each node and item a unique id in a d-dimensional space Goals –Scales to hundreds of thousands of nodes.
Chord-over-Chord Overlay Sudhindra Rao Ph.D Qualifier Exam Department of ECECS.
Freenet A Distributed Anonymous Information Storage and Retrieval System I Clarke O Sandberg I Clarke O Sandberg B WileyT W Hong.
Topics in Reliable Distributed Systems Fall Dr. Idit Keidar.
1 CS 194: Distributed Systems Distributed Hash Tables Scott Shenker and Ion Stoica Computer Science Division Department of Electrical Engineering and Computer.
1 Seminar: Information Management in the Web Gnutella, Freenet and more: an overview of file sharing architectures Thomas Zahn.
CS 268: Overlay Networks: Distributed Hash Tables Kevin Lai May 1, 2001.
Peer To Peer Distributed Systems Pete Keleher. Why Distributed Systems? l Aggregate resources! –memory –disk –CPU cycles l Proximity to physical stuff.
Wide-area cooperative storage with CFS
Peer-to-Peer Networks Slides largely adopted from Ion Stoica’s lecture at UCB.
Lecture 10 Naming services for flat namespaces. EECE 411: Design of Distributed Software Applications Logistics / reminders Project Send Samer and me.
1 Freenet  Addition goals to file location: -Provide publisher anonymity, security -Resistant to attacks – a third party shouldn’t be able to deny the.
1CS 6401 Peer-to-Peer Networks Outline Overview Gnutella Structured Overlays BitTorrent.
INTRODUCTION TO PEER TO PEER NETWORKS Z.M. Joseph CSE 6392 – DB Exploration Spring 2006 CSE, UT Arlington.
Freenet. Anonymity  Napster, Gnutella, Kazaa do not provide anonymity  Users know who they are downloading from  Others know who sent a query  Freenet.
1 Napster & Gnutella An Overview. 2 About Napster Distributed application allowing users to search and exchange MP3 files. Written by Shawn Fanning in.
DHTs and Peer-to-Peer Systems Supplemental Slides Aditya Akella 03/21/2007.
Content Overlays (Nick Feamster). 2 Content Overlays Distributed content storage and retrieval Two primary approaches: –Structured overlay –Unstructured.
Introduction of P2P systems
Jonathan Walpole CSE515 - Distributed Computing Systems 1 Teaching Assistant for CSE515 Rahul Dubey.
Chord: A Scalable Peer-to-peer Lookup Protocol for Internet Applications Xiaozhou Li COS 461: Computer Networks (precept 04/06/12) Princeton University.
15-829A/18-849B/95-811A/19-729A Internet-Scale Sensor Systems: Design and Policy Lecture 9 – Lookup Algorithms (DNS & DHTs)
Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, Hari Balakrishnan MIT and Berkeley presented by Daniel Figueiredo Chord: A Scalable Peer-to-peer.
1 Slides from Richard Yang with minor modification Peer-to-Peer Systems: DHT and Swarming.
SIGCOMM 2001 Lecture slides by Dr. Yingwu Zhu Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications.
1 Peer-to-Peer Technologies Seminar by: Kunal Goswami (05IT6006) School of Information Technology Guided by: Prof. C.R.Mandal, School of Information Technology.
Peer to Peer A Survey and comparison of peer-to-peer overlay network schemes And so on… Chulhyun Park
15-744: Computer Networking L-22: P2P. Lecture 22: Peer-to-Peer Networks Typically each member stores/provides access to content Has quickly.
P2PComputing/Scalab 1 Gnutella and Freenet Ramaswamy N.Vadivelu Scalab.
Structured P2P Overlays. Consistent Hashing – the Basis of Structured P2P Intuition: –We want to build a distributed hash table where the number of buckets.
1 Secure Peer-to-Peer File Sharing Frans Kaashoek, David Karger, Robert Morris, Ion Stoica, Hari Balakrishnan MIT Laboratory.
Computer Networking P2P. Why P2P? Scaling: system scales with number of clients, by definition Eliminate centralization: Eliminate single point.
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 2: Distributed Hash.
1 9/30/2009 Peer-to-Peer Systems: Unstructured. Admin. r Programming assignment 1 linked on the schedule page 2.
Peer to Peer Network Design Discovery and Routing algorithms
Prof. Younghee Lee 1 1 Computer networks u Lecture 11: Peer to Peer Prof. Younghee Lee.
15-744: Computer Networking L-22: P2P. L -22; © Srinivasan Seshan, P2P Peer-to-peer networks Assigned reading [Cla00] Freenet: A Distributed.
Peer-to-peer systems (part I) Slides by Indranil Gupta (modified by N. Vaidya)
CS 268: Computer Networking
LOOKING UP DATA IN P2P SYSTEMS Hari Balakrishnan M. Frans Kaashoek David Karger Robert Morris Ion Stoica MIT LCS.
Two Peer-to-Peer Networking Approaches Ken Calvert Net Seminar, 23 October 2001 Note: Many slides “borrowed” from S. Ratnasamy’s Qualifying Exam talk.
P2P Search COP6731 Advanced Database Systems. P2P Computing  Powerful personal computer Share computing resources P2P Computing  Advantages: Shared.
P2P Search COP P2P Search Techniques Centralized P2P systems  e.g. Napster, Decentralized & unstructured P2P systems  e.g. Gnutella.
15-829A/18-849B/95-811A/19-729A Internet-Scale Sensor Systems: Design and Policy Review.
Aditya Akella The Web Aditya Akella 18 Apr, 2002.
CS 425 / ECE 428 Distributed Systems Fall 2015 Indranil Gupta (Indy) Peer-to-peer Systems All slides © IG.
Chord: A Scalable Peer-to-Peer Lookup Service for Internet Applications * CS587x Lecture Department of Computer Science Iowa State University *I. Stoica,
CS Spring 2010 CS 414 – Multimedia Systems Design Lecture 24 – Introduction to Peer-to-Peer (P2P) Systems Klara Nahrstedt (presented by Long Vu)
A Survey of Peer-to-Peer Content Distribution Technologies Stephanos Androutsellis-Theotokis and Diomidis Spinellis ACM Computing Surveys, December 2004.
BitTorrent Vs Gnutella.
CS 268: Lecture 22 (Peer-to-Peer Networks)
EE 122: Peer-to-Peer (P2P) Networks
CS 268: Peer-to-Peer Networks and Distributed Hash Tables
CS 162: P2P Networks Computer Science Division
#02 Peer to Peer Networking
Presentation transcript:

15-744: Computer Networking L-23: P2P

L -23; © Srinivasan Seshan, P2P Peer-to-peer networks Assigned reading [Cla00] Freenet: A Distributed Anonymous Information Storage and Retrieval System [S+01] Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications

L -23; © Srinivasan Seshan, Overview P2P Lookup Overview Centralized/Flooded Lookups Routed Lookups Freenet Chord CAN

L -23; © Srinivasan Seshan, Peer-to-Peer Networks Typically each member stores/provides access to content Has quickly grown in popularity Bulk of traffic from/to CMU is Kazaa! Basically a replication system for files Always a tradeoff between possible location of files and searching difficulty Peer-to-peer allow files to be anywhere  searching is the challenge Dynamic member list makes it more difficult What other systems have similar goals? Routing, DNS

L -23; © Srinivasan Seshan, The Lookup Problem Internet N1N1 N2N2 N3N3 N6N6 N5N5 N4N4 Publisher Key=“title” Value=MP3 data… Client Lookup(“title”) ?

L -23; © Srinivasan Seshan, Centralized Lookup (Napster) Client Lookup(“title”) N6N6 N9N9 N7N7 DB N8N8 N3N3 N2N2 N1N1 SetLoc(“title”, N4) Simple, but O( N ) state and a single point of failure Key=“title” Value=MP3 data… N4N4

L -23; © Srinivasan Seshan, Flooded Queries (Gnutella) N4N4 Client N6N6 N9N9 N7N7 N8N8 N3N3 N2N2 N1N1 Robust, but worst case O( N ) messages per lookup Key=“title” Value=MP3 data… Lookup(“title”)

L -23; © Srinivasan Seshan, Routed Queries (Freenet, Chord, etc.) N4N4 Publisher Client N6N6 N9N9 N7N7 N8N8 N3N3 N2N2 N1N1 Lookup(“title”) Key=“title” Value=MP3 data…

L -23; © Srinivasan Seshan, Overview P2P Lookup Overview Centralized/Flooded Lookups Routed Lookups Freenet Chord CAN

L -23; © Srinivasan Seshan, Centralized: Napster Simple centralized scheme  motivated by ability to sell/control How to find a file: On startup, client contacts central server and reports list of files Query the index system  return a machine that stores the required file Ideally this is the closest/least-loaded machine Fetch the file directly from peer

L -23; © Srinivasan Seshan, Centralized: Napster Advantages: Simple Easy to implement sophisticated search engines on top of the index system Disadvantages: Robustness, scalability Easy to sue!

L -23; © Srinivasan Seshan, Flooding: Gnutella On startup, client contacts any servent (server + client) in network Servent interconnection used to forward control (queries, hits, etc) Idea: broadcast the request How to find a file: Send request to all neighbors Neighbors recursively forward the request Eventually a machine that has the file receives the request, and it sends back the answer Transfers are done with HTTP between peers

L -23; © Srinivasan Seshan, Flooding: Gnutella Advantages: Totally decentralized, highly robust Disadvantages: Not scalable; the entire network can be swamped with request (to alleviate this problem, each request has a TTL) Especially hard on slow clients At some point broadcast traffic on Gnutella exceeded 56kbps – what happened? Modem users were effectively cut off!

L -23; © Srinivasan Seshan, Flooding: Gnutella Details Basic message header Unique ID, TTL, Hops Message types Ping – probes network for other servents Pong – response to ping, contains IP addr, # of files, # of Kbytes shared Query – search criteria + speed requirement of servent QueryHit – successful response to Query, contains addr + port to transfer from, speed of servent, number of hits, hit results, servent ID Push – request to servent ID to initiate connection, used to traverse firewalls Ping, Queries are flooded QueryHit, Pong, Push reverse path of previous message

L -23; © Srinivasan Seshan, Flooding: Gnutella Example Assume: m1’s neighbors are m2 and m3; m3’s neighbors are m4 and m5;… A B C D E F m1 m2 m3 m4 m5 m6 E? E E E

L -23; © Srinivasan Seshan, Flooding: FastTrack (aka Kazaa) Modifies the Gnutella protocol into two-level hierarchy Hybrid of Gnutella and Napster Supernodes Nodes that have better connection to Internet Act as temporary indexing servers for other nodes Help improve the stability of the network Standard nodes Connect to supernodes and report list of files Allows slower nodes to participate Search Broadcast (Gnutella-style) search across supernodes Disadvantages Kept a centralized registration  allowed for law suits 

L -23; © Srinivasan Seshan, Overview P2P Lookup Overview Centralized/Flooded Lookups Routed Lookups Freenet Chord CAN

L -23; © Srinivasan Seshan, Routing: Freenet Addition goals to file location: Provide publisher anonymity, security Resistant to attacks – a third party shouldn’t be able to deny the access to a particular file (data item, object), even if it compromises a large fraction of machines Files are stored according to associated key Core idea: try to cluster information about similar keys Messages Random 64bit ID used for loop detection TTL TTL 1 are forwarded with finite probablity Helps anonymity Depth counter Opposite of TTL – incremented with each hop Depth counter initialized to small random value

L -23; © Srinivasan Seshan, Routing: Freenet Routing Tables id – file identifier next_hop – another node that stores the file id file – file identified by id being stored on the local node Forwarding of query for file id If file id stored locally, then stop Forward data back to upstream requestor Requestor adds file to cache, adds entry in routing table If not, search for the “closest” id in the stack, and forward the message to the corresponding next_hop If data is not found, failure is reported back Requestor then tries next closest match in routing table id next_hop file … …

L -23; © Srinivasan Seshan, Routing: Freenet Example Note: doesn’t show file caching on the reverse path 4 n1 f4 12 n2 f12 5 n3 9 n3 f9 3 n1 f3 14 n4 f14 5 n3 14 n5 f14 13 n2 f13 3 n6 n1 n2 n3 n4 4 n1 f4 10 n5 f10 8 n6 n5 query(10) ’ 5

L -23; © Srinivasan Seshan, Freenet Request 1 AB C D E F Data Request Data Reply Request Failed

L -23; © Srinivasan Seshan, Routing: Freenet Requests Any node forwarding reply may change the source of the reply (to itself or any other node) Helps anonymity Each query is associated a TTL that is decremented each time the query message is forwarded; to obscure distance to originator: TTL can be initiated to a random value within some bounds When TTL=1, the query is forwarded with a finite probability Each node maintains the state for all outstanding queries that have traversed it  help to avoid cycles If data is not found, failure is reported back Requestor then tries next closest match in routing table

L -23; © Srinivasan Seshan, Freenet Search Features Nodes tend to specialize in searching for similar keys over time Gets queries from other nodes for similar keys Nodes store similar keys over time Caching of files as a result of successful queries Similarity of keys does not reflect similarity of files

L -23; © Srinivasan Seshan, Freenet File Creation Key for file generated and searched  helps identify collision Not found (“All clear”) result indicates success Source of insert message can be change by any forwarding node Creation mechanism adds files/info to locations with similar keys New nodes are discovered through file creation Erroneous/malicious inserts propagate original file further

L -23; © Srinivasan Seshan, Cache Management LRU Cache of files Files are not guaranteed to live forever Files “fade away” as fewer requests are made for them File contents can be encrypted with original text names as key Cache owners do not know either original name or contents  cannot be held responsible

L -23; © Srinivasan Seshan, Freenet Naming Freenet deals with keys But humans need names Keys are flat  would like structure as well Could have files that store keys for other files File /text/philiosophy could store keys for files in that directory  how to update this file though? Search engine  undesirable centralized solution

L -23; © Srinivasan Seshan, Freenet Naming - Indirect files Normal files stored using content-hash key Prevents tampering, enables versioning, etc. Indirect files stored using name-based key Indirect files store keys for normal files Inserted at same time as normal file Has same update problems as directory files Updates handled by signing indirect file with public/private key Collisions for insert of new indirect file handled specially  check to ensure same key used for signing Allows for files to be split into multiple smaller parts

L -23; © Srinivasan Seshan, Overview P2P Lookup Overview Centralized/Flooded Lookups Routed Lookups Freenet Chord CAN

L -23; © Srinivasan Seshan, Routing: Structured Approaches Goal: make sure that an item (file) identified is always found in a reasonable # of steps Abstraction: a distributed hash-table (DHT) data structure insert(id, item); item = query(id); Note: item can be anything: a data object, document, file, pointer to a file… Proposals CAN (ICIR/Berkeley) Chord (MIT/Berkeley) Pastry (Rice) Tapestry (Berkeley)

L -23; © Srinivasan Seshan, Routing: Chord Associate to each node and item a unique id in an uni-dimensional space Properties Routing table size O(log(N)), where N is the total number of nodes Guarantees that a file is found in O(log(N)) steps

L -23; © Srinivasan Seshan, Aside: Consistent Hashing [Karger 97] N32 N90 N105 K80 K20 K5 Circular 7-bit ID space Key 5 Node 105 A key is stored at its successor: node with next higher ID

L -23; © Srinivasan Seshan, Routing: Chord Basic Lookup N32 N90 N105 N60 N10 N120 K80 “Where is key 80?” “N90 has K80”

L -23; © Srinivasan Seshan, Routing: Finger table - Faster Lookups N80 ½ ¼ 1/8 1/16 1/32 1/64 1/128

L -23; © Srinivasan Seshan, Routing: Chord Summary Assume identifier space is 0…2 m Each node maintains Finger table Entry i in the finger table of n is the first node that succeeds or equals n + 2 i Predecessor node An item identified by id is stored on the successor node of id

L -23; © Srinivasan Seshan, Routing: Chord Example Assume an identifier space 0..8 Node n1:(1) joins  all entries in its finger table are initialized to itself i id+2 i succ Succ. Table

L -23; © Srinivasan Seshan, Routing: Chord Example Node n2:(3) joins i id+2 i succ Succ. Table i id+2 i succ Succ. Table

L -23; © Srinivasan Seshan, Routing: Chord Example Nodes n3:(0), n4:(6) join i id+2 i succ Succ. Table i id+2 i succ Succ. Table i id+2 i succ Succ. Table i id+2 i succ Succ. Table

L -23; © Srinivasan Seshan, Routing: Chord Examples Nodes: n1:(1), n2(3), n3(0), n4(6) Items: f1:(7), f2:(2) i id+2 i succ Succ. Table i id+2 i succ Succ. Table i id+2 i succ Succ. Table 7 Items 1 i id+2 i succ Succ. Table

L -23; © Srinivasan Seshan, Routing: Query Upon receiving a query for item id, a node Check whether stores the item locally If not, forwards the query to the largest node in its successor table that does not exceed id i id+2 i succ Succ. Table i id+2 i succ Succ. Table i id+2 i succ Succ. Table 7 Items 1 i id+2 i succ Succ. Table query(7)

L -23; © Srinivasan Seshan, Overview P2P Lookup Overview Centralized/Flooded Lookups Routed Lookups Freenet Chord CAN

L -23; © Srinivasan Seshan, CAN Associate to each node and item a unique id in an d- dimensional space Virtual Cartesian coordinate space Entire space is partitioned amongst all the nodes Every node “owns” a zone in the overall space Abstraction Can store data at “points” in the space Can route from one “point” to another Point = node that owns the enclosing zone Properties Routing table size O(d) Guarantees that a file is found in at most d*n 1/d steps, where n is the total number of nodes

L -23; © Srinivasan Seshan, CAN E.g.: Two Dimensional Space Space divided between nodes All nodes cover the entire space Each node covers either a square or a rectangular area of ratios 1:2 or 2:1 Example: Assume space size (8 x 8) Node n1:(1, 2) first node that joins  cover the entire space n1

L -23; © Srinivasan Seshan, CAN E.g.: Two Dimensional Space Node n2:(4, 2) joins  space is divided between n1 and n n1 n2

L -23; © Srinivasan Seshan, CAN E.g.: Two Dimensional Space Node n2:(4, 2) joins  space is divided between n1 and n n1 n2 n3

L -23; © Srinivasan Seshan, CAN E.g.: Two Dimensional Space Nodes n4:(5, 5) and n5:(6,6) join n1 n2 n3 n4 n5

L -23; © Srinivasan Seshan, CAN E.g.: Two Dimensional Space Nodes: n1:(1, 2); n2:(4,2); n3:(3, 5); n4:(5,5);n5:(6,6) Items: f1:(2,3); f2:(5,1); f3:(2,1); f4:(7,5); n1 n2 n3 n4 n5 f1 f2 f3 f4

L -23; © Srinivasan Seshan, CAN E.g.: Two Dimensional Space Each item is stored by the node who owns its mapping in the space n1 n2 n3 n4 n5 f1 f2 f3 f4

L -23; © Srinivasan Seshan, CAN: Query Example Each node knows its neighbors in the d-space Forward query to the neighbor that is closest to the query id Example: assume n1 queries f n1 n2 n3 n4 n5 f1 f2 f3 f4

L -23; © Srinivasan Seshan, Routing: Concerns Each hop in a routing-based P2P network can be expensive No correlation between neighbors and their location A query can repeatedly jump from Europe to North America, though both the initiator and the node that store the item are in Europe! Solutions: Tapestry takes care of this implicitly; CAN and Chord maintain multiple copies for each entry in their routing tables and choose the closest in terms of network distance What type of lookups? Only exact match!  no beatles.*

L -23; © Srinivasan Seshan, Summary The key challenge of building wide area P2P systems is a scalable and robust location service Solutions covered in this lecture Naptser: centralized location service Gnutella: broadcast-based decentralized location service Freenet: intelligent-routing decentralized solution DHTs: structured intelligent-routing decentralized solution

L -23; © Srinivasan Seshan, Upcoming Deadlines Thursday: exam Similar to midterm Mostly (80+%) on second half Friday: poster session 8220 Wean Hall 10:30AM – 1PM Brief 10 minute presentation for grading Monday: Project handin date Maximum: 8 pages double column/13 pages single column, reasonable font size, etc.