Computer Science Division

Slides:



Advertisements
Similar presentations
P2P data retrieval DHT (Distributed Hash Tables) Partially based on Hellerstein’s presentation at VLDB2004.
Advertisements

Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, Hari Balakrishnan MIT and Berkeley presented by Daniel Figueiredo Chord: A Scalable Peer-to-peer.
Peer to Peer and Distributed Hash Tables
Peer-to-Peer Systems Chapter 25. What is Peer-to-Peer (P2P)? Napster? Gnutella? Most people think of P2P as music sharing.
CompSci 356: Computer Network Architectures Lecture 21: Content Distribution Chapter 9.4 Xiaowei Yang
An Overview of Peer-to-Peer Networking CPSC 441 (with thanks to Sami Rollins, UCSB)
Application Layer Overlays IS250 Spring 2010 John Chuang.
CS162 Operating Systems and Systems Programming Lecture 23 HTTP and Peer-to-Peer Networks April 20, 2011 Ion Stoica
Cis e-commerce -- lecture #6: Content Distribution Networks and P2P (based on notes from Dr Peter McBurney © )
FRIENDS: File Retrieval In a dEcentralized Network Distribution System Steven Huang, Kevin Li Computer Science and Engineering University of California,
Topics in Reliable Distributed Systems Lecture 2, Fall Dr. Idit Keidar.
CDNs & Replication Prof. Vern Paxson EE122 Fall 2007 TAs: Lisa Fowler, Daniel Killebrew, Jorge Ortiz.
Distributed Lookup Systems
Overlay Networks EECS 122: Lecture 18 Department of Electrical Engineering and Computer Sciences University of California Berkeley.
Content Addressable Networks. CAN Associate with each node and item a unique id in a d-dimensional space Goals –Scales to hundreds of thousands of nodes.
Chord-over-Chord Overlay Sudhindra Rao Ph.D Qualifier Exam Department of ECECS.
Topics in Reliable Distributed Systems Fall Dr. Idit Keidar.
1 CS 194: Distributed Systems Distributed Hash Tables Scott Shenker and Ion Stoica Computer Science Division Department of Electrical Engineering and Computer.
CS 268: Overlay Networks: Distributed Hash Tables Kevin Lai May 1, 2001.
1 EE 122: Overlay Networks and p2p Networks Ion Stoica TAs: Junda Liu, DK Moon, David Zats (Materials with thanks.
Peer-to-Peer Networks Slides largely adopted from Ion Stoica’s lecture at UCB.
1CS 6401 Peer-to-Peer Networks Outline Overview Gnutella Structured Overlays BitTorrent.
INTRODUCTION TO PEER TO PEER NETWORKS Z.M. Joseph CSE 6392 – DB Exploration Spring 2006 CSE, UT Arlington.
Network Layer (3). Node lookup in p2p networks Section in the textbook. In a p2p network, each node may provide some kind of service for other.
CS162 Operating Systems and Systems Programming Lecture 23 Peer-to-Peer Systems April 18, 2011 Anthony D. Joseph and Ion Stoica
2: Application Layer1 Chapter 2 outline r 2.1 Principles of app layer protocols r 2.2 Web and HTTP r 2.3 FTP r 2.4 Electronic Mail r 2.5 DNS r 2.6 Socket.
Chord: A Scalable Peer-to-peer Lookup Protocol for Internet Applications Xiaozhou Li COS 461: Computer Networks (precept 04/06/12) Princeton University.
Chapter 22 Network Layer: Delivery, Forwarding, and Routing Part 5 Multicasting protocol.
Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, Hari Balakrishnan MIT and Berkeley presented by Daniel Figueiredo Chord: A Scalable Peer-to-peer.
SIGCOMM 2001 Lecture slides by Dr. Yingwu Zhu Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications.
1 Peer-to-Peer Technologies Seminar by: Kunal Goswami (05IT6006) School of Information Technology Guided by: Prof. C.R.Mandal, School of Information Technology.
15-744: Computer Networking L-22: P2P. Lecture 22: Peer-to-Peer Networks Typically each member stores/provides access to content Has quickly.
1 Secure Peer-to-Peer File Sharing Frans Kaashoek, David Karger, Robert Morris, Ion Stoica, Hari Balakrishnan MIT Laboratory.
T. S. Eugene Ngeugeneng at cs.rice.edu Rice University1 COMP/ELEC 429 Introduction to Computer Networks Lecture 21: Multicast Routing Slides used with.
ADVANCED COMPUTER NETWORKS Peer-Peer (P2P) Networks 1.
Peer to Peer Network Design Discovery and Routing algorithms
15-744: Computer Networking L-22: P2P. L -22; © Srinivasan Seshan, P2P Peer-to-peer networks Assigned reading [Cla00] Freenet: A Distributed.
Algorithms and Techniques in Structured Scalable Peer-to-Peer Networks
LOOKING UP DATA IN P2P SYSTEMS Hari Balakrishnan M. Frans Kaashoek David Karger Robert Morris Ion Stoica MIT LCS.
CS Spring 2014 CS 414 – Multimedia Systems Design Lecture 37 – Introduction to P2P (Part 1) Klara Nahrstedt.
P2P Search COP6731 Advanced Database Systems. P2P Computing  Powerful personal computer Share computing resources P2P Computing  Advantages: Shared.
P2P Search COP P2P Search Techniques Centralized P2P systems  e.g. Napster, Decentralized & unstructured P2P systems  e.g. Gnutella.
EE122: Multicast Kevin Lai October 7, Internet Radio  (techno station)
Chord: A Scalable Peer-to-Peer Lookup Service for Internet Applications * CS587x Lecture Department of Computer Science Iowa State University *I. Stoica,
CS Spring 2010 CS 414 – Multimedia Systems Design Lecture 24 – Introduction to Peer-to-Peer (P2P) Systems Klara Nahrstedt (presented by Long Vu)
Peer-to-Peer Information Systems Week 12: Naming
A Survey of Peer-to-Peer Content Distribution Technologies Stephanos Androutsellis-Theotokis and Diomidis Spinellis ACM Computing Surveys, December 2004.
COMP/ELEC 429 Introduction to Computer Networks
Internet Indirection Infrastructure (i3)
CASCADE: AN ATTACK-RESISTANT DHT WITH MINIMAL HARD STATE
CS 268: Lecture 22 (Peer-to-Peer Networks)
Multicast Outline Multicast Introduction and Motivation DVRMP.
CSE 486/586 Distributed Systems Distributed Hash Tables
(slides by Nick Feamster)
Net 323 D: Networks Protocols
EE 122: Peer-to-Peer (P2P) Networks
CS 268: Peer-to-Peer Networks and Distributed Hash Tables
Overlay Networking Overview.
DHT Routing Geometries and Chord
Internet Indirection Infrastructure
CS 162: P2P Networks Computer Science Division
P2P Systems and Distributed Hash Tables
An Overview of Peer-to-Peer
Dynamic Replica Placement for Scalable Content Delivery
EE 122: Lecture 22 (Overlay Networks)
EE 122: Lecture 13 (IP Multicast Routing)
Consistent Hashing and Distributed Hash Table
CSE 486/586 Distributed Systems Distributed Hash Tables
Peer-to-Peer Information Systems Week 12: Naming
Presentation transcript:

EECS 122: Introduction to Computer Networks Overlay Networks, CDNs, and P2P Networks Computer Science Division Department of Electrical Engineering and Computer Sciences University of California, Berkeley Berkeley, CA 94720-1776

Overlay Networks: Motivations Changes in the network happen very slowly Why? Internet network is a shared infrastructure; need to achieve consensus (IETF) Many of proposals require to change a large number of routers (e.g., IP Multicast, QoS); otherwise end-users won’t benefit Proposed changes that haven’t happened yet on large scale: Congestion (RED ‘93); More Addresses (IPv6 ‘91) Security (IPSEC ‘93); Multicast (IP multicast ‘90)

Motivations (cont’d) One size does not fit all Applications need different levels of Reliability Performance (latency) Security Access control (e.g., who is allowed to join a multicast group) …

Goals Make it easy to deploy new functionalities in the network  accelerate the pace of innovation Allow users to customize their service

Solution Deploy processing in the network Have packets processed as they traverse the network AS-1 IP AS-1 Overlay Network (over IP)

Examples Overlay multicast Content Distribution Networks (CDNs) Peer-to-peer systems

Motivation Example: Internet Radio www.digitallyimported.com (techno station) Sends out 128Kb/s MP3 music streams Peak usage ~9000 simultaneous streams Only 5 unique streams (trance, hard trance, hard house, eurodance, classical) Consumes ~1.1Gb/s Bandwidth costs are large fraction of their expenditures (maybe 50%?) If 1000 people are getting their groove on in Berkeley, 1000 unicast streams are sent from NYC to Berkeley

This approach does not scale… Broadcast Center Backbone ISP

Multicast Service Model Unicast Multicast [R0, data] [R1, data] [Rn-1, data] R0 [G, data] R0 joins G R1 joins G Rn-1 joins G R0 S R1 Net S R1 Net . . Rn-1 Rn-1 Receivers join a multicast group which is identified by a multicast address (e.g. G) Sender(s) send data to address G Network routes data to each of the receivers

Instead build trees Backbone ISP Copy data at routers At most one copy of a data packet per link Broadcast Center Backbone ISP Routers keep track of groups in real-time “Path” computation is Tree computation LANs implement layer 2 multicast by broadcasting

Multicast Primer Type of trees Examples Source Specific Trees Shared Trees Examples Distance Vector Routing Multicast Protocol (DVRMP) – Source specific trees Core Based Tree (CBT) – Shared trees Protocol Independent Multicast (PIM) Sparse mode  Shared trees Dense mode  Single source trees

Source Specific Trees Each source is the route of its own tree 5 7 4 8 6 11 2 10 3 1 13 12

Source Specific Trees Each source is the route of its own tree. One tree for each source 5 7 4 8 6 11 2 10 3 1 13 12 Can pick “good” trees but lots of state at the routers!

Shared Tree One tree used by all 5 7 One tree used by all 4 8 6 11 2 10 3 1 13 12 Can’t pick “good” trees but minimal state at the routers

IP Multicast Problems Fifteen years of research, but still not widely deployed Poor scalability Routers need to maintain per-group or even per-group and per-sender state! Aggregation of multicast addresses is complicated Supporting higher level functionality is difficult IP Multicast: best-effort multi-point delivery service Reliability and congestion control for IP Multicast complicated Need to deal with heterogeneous receiver  negotiation hard No support for access control Nor restriction on who can send  very easy to mount Denial of Service (Dos) attacks!

Overlay Approach Provide IP multicast functionality above the IP layer  application level multicast Challenge: do this efficiently Projects: Narada Overcast Scattercast Yoid …

Narada [Yang-hua et al, 2000] Source Speific Trees Involves only end hosts Small group sizes <= hundreds of nodes Typical application: chat

Narada: End System Multicast Gatech Stanford Stan1 Stan2 CMU Berk1 Berkeley Berk2 Overlay Tree Stan1 Gatech Stan2 CMU Berk1 Berk2

Performance Concerns Stretch Stress Ratio of latency in the overlay to latency in the underlying network Stress Number of duplicate packets sent over the same physical link

Performance Concerns Gatech Gatech Stanford Stan1 Stan2 Berk1 Berk2 Delay from CMU to Berk1 increases Stan1 Gatech Stan2 CMU Berk2 Berk1 Duplicate Packets: Bandwidth Wastage Gatech Stanford Stan1 Stan2 CMU Berk1 Berk2 Berkeley

Properties Easier to deploy than IP Multicast Don’t have to modify every router on path Easier to implement reliability than IP Multicast Use hop-by-hop retransmissions Can consume more bandwidth than IP Multicast Can have higher latency than IP Multicast Not clear how well it scales Neither has been used for a group with 1M receivers or 1M groups Can use IP Multicast where available to optimize performance

Examples Overlay Multicast Content Distribution Networks (CDNs) Peer-to-peer systems

Content Distribution Networks Problem: You are a web content provider How do you handle millions of web clients? How do you ensure that all clients experience good performance? How do you maintain availability in the presence of server and network failures? Solutions: Add more servers at different locations  If you are CNN this might work! Caching Content Distribution Networks

“Base-line” Many clients transfer same information Generate unnecessary server and network load Clients experience unnecessary latency Server Backbone ISP ISP-1 ISP-2 Clients

Reverse Caches Cache documents close to server  decrease server load Typically done by content providers Server Reverse caches Backbone ISP ISP-1 ISP-2 Clients

Forward Proxies Cache documents close to clients  reduce network traffic and decrease latency Typically done by ISPs or corporate LANs Server Reverse caches Backbone ISP ISP-1 ISP-2 Forward caches Clients

Content Distribution Networks (CDNs) Integrate forward and reverse caching functionalities into one overlay network (usually) administrated by one entity Example: Akamai Documents are cached both As a result of clients’ requests (pull) Pushed in the expectation of a high access rate Beside caching do processing, e.g., Handle dynamic web pages Transcoding

CDNs (cont’d) Server CDN Backbone ISP ISP-1 ISP-2 Forward caches Clients

Example: Akamai Akamai creates new domain names for each client content provider. e.g., a128.g.akamai.net The CDN’s DNS servers are authoritative for the new domains The client content provider modifies its content so that embedded URLs reference the new domains. “Akamaize” content, e.g.: http://www.cnn.com/image-of-the-day.gif becomes http://a128.g.akamai.net/image-of-the-day.gif.

Example: Akamai akamai.net DNS servers www.nhc.noaa.gov “Akamaizes” its content. Akamai servers store/cache secondary content for “Akamaized” services. lookup a128.g.akamai.net a b DNS server for nhc.noaa.gov c get http://www.nhc.noaa.gov local DNS server “Akamaized” response object has inline URLs for secondary content at a128.g.akamai.net and other Akamai-managed DNS names.

Examples Overlay Multicast Content Distribution Networks (CDNs) Peer-to-peer systems Napster, Gnutella, KaZaa, DHTs Skype, BitTorent (next lecture)

How Did it Start? A killer application: Naptser Free music over the Internet Key idea: share the storage and bandwidth of individual (home) users Internet

Model Each user stores a subset of files Each user has access (can download) files from all users in the system

Main Challenge Find where a particular file is stored Note: problem similar to finding a particular page in web caching (see last lecture – what are the differences?) E F D E? C A B

Other Challenges Scale: up to hundred of thousands or millions of machines Dynamicity: machines can come and go any time

Napster Assume a centralized index system that maps files (songs) to machines that are alive How to find a file (song) Query the index system  return a machine that stores the required file Ideally this is the closest/least-loaded machine ftp the file Advantages: Simplicity, easy to implement sophisticated search engines on top of the index system Disadvantages: Robustness, scalability (?)

Napster: Example E m5 E m6 E? F D m1 A m2 B m3 C m4 D m5 E m6 F m4 E?

Gnutella Distribute file location Idea: broadcast the request Hot to find a file: Send request to all neighbors Neighbors recursively multicast the request Eventually a machine that has the file receives the request, and it sends back the answer Advantages: Totally decentralized, highly robust Disadvantages: Not scalable; the entire network can be swamped with requests (to alleviate this problem, each request has a TTL)

Gnutella: Example Assume: m1’s neighbors are m2 and m3; m3’s neighbors are m4 and m5;… m5 E m6 E E? F D m4 E? C A B m3 m1 m2

Two-Level Hierarchy Oct 2003 Crawl on Gnutella Current Gnutella implementation, KaZaa Leaf nodes are connected to a small number of ultrapeers (suppernodes) Query A leaf sends query to its ultrapeers If ultrapeers don’t know the answer, they flood the query to other ultrapeers More scalable: Flooding only among ultrapeers Ultrapeer nodes Leaf nodes

Skype Peer-to-peer Internet Telephony Two-level hierarchy like KaZaa login server Peer-to-peer Internet Telephony Two-level hierarchy like KaZaa Ultrapeers used mainly to route traffic between NATed end-hosts (see next slide)… … plus a login server to authenticate users ensure that names are unique across network B Messages exchanged to login server A Data traffic (Note*: probable protocol; Skype protocol is not published)

Detour: NAT (1/3) Internet Network Address Translation: Motivation: address scarcity problem in IPv4 Allow to independently allocate addresses to hosts behind NAT Two hosts behind two different NATs can have the same address 64.36.12.64 Internet 192.168.0.1 NAT box 169.32.41.10 NAT box 192.168.0.1 192.168.0.2 128.2.12.30 Same address

Detour: NAT (2/3) Main idea: use port numbers to multiplex/demultiplex connections of NATed end-hosts Map (IPaddr, Port) of a NATed host to (IPaddrNAT, PortNAT) (192.168.0.1:64.36.12.64)(1005:80) src addr dst addr src port dst port 1 192.168.0.1 5 (169.32.41.10:64.36.12.64)(78:80) 3 64.36.12.64 (64.36.12.64:192.168.0.1)(80:1005) Internet NAT box (64.36.12.64:169.32.41.10)(80:78) 4 169.32.41.10 (192.168.0.1:1005) ↔ 78 … 2 NAT Table

Detour: NAT (3/3) Limitations Skype and other P2P systems use Number of machines behind a NAT <= 64000. Why? A host outside NAT cannot initiate connection to a host behind a NAT Skype and other P2P systems use Login servers and ultrapeers to solve limitation (2) How? (Hint: ultrapeers have globally unique (Internet-routable) IP addresses)

BitTorrent (1/2) Allow fast downloads even when sources have low connectivity How does it work? Split each file into pieces (~ 256 KB each), and each piece into sub-pieces (~ 16 KB each) The loader loads one piece at a time Within one piece, the loader can load up to five sub-pieces in parallel

BitTorrent (2/2) Download consists of three phases: Start: get a piece as soon as possible Select a random piece Middle: spread all pieces as soon as possible Select rarest piece next End: avoid getting stuck with a slow source, when downloading the last sub-pieces Request in parallel the same sub-piece Cancel slowest downloads once a sub-piece has been received (For details see: http://bittorrent.com/bittorrentecon.pdf)

Distributed Hash Tables Problem: Given an ID, map to a host Challenges Scalability: hundreds of thousands or millions of machines Instability Changes in routes, congestion, availability of machines Heterogeneity Latency: 1ms to 1000ms Bandwidth: 32Kb/s to 100Mb/s Nodes stay in system from 10s to a year Trust Selfish users Malicious users

Content Addressable Network (CAN) Associate to each node and item a unique id in an d-dimensional space Properties Routing table size O(d) Guarantees that a file is found in at most d*n1/d steps, where n is the total number of nodes

CAN Example: Two Dimensional Space Space divided between nodes All nodes cover the entire space Each node covers either a square or a rectangular area of ratios 1:2 or 2:1 Example: Assume space size (8 x 8) Node n1:(1, 2) first node that joins  cover the entire space 7 6 5 4 3 n1 2 1 1 2 3 4 5 6 7

CAN Example: Two Dimensional Space Node n2:(4, 2) joins  space is divided between n1 and n2 7 6 5 4 3 n1 n2 2 1 1 2 3 4 5 6 7

CAN Example: Two Dimensional Space Node n2:(4, 2) joins  space is divided between n1 and n2 7 6 n3 5 4 3 n1 n2 2 1 1 2 3 4 5 6 7

CAN Example: Two Dimensional Space Nodes n4:(5, 5) and n5:(6,6) join 7 6 n5 n4 n3 5 4 3 n1 n2 2 1 1 2 3 4 5 6 7

CAN Example: Two Dimensional Space Nodes: n1:(1, 2); n2:(4,2); n3:(3, 5); n4:(5,5);n5:(6,6) Items: f1:(2,3); f2:(5,1); f3:(2,1); f4:(7,5); 7 6 n5 n4 n3 5 f4 4 f1 3 n1 n2 2 f3 1 f2 1 2 3 4 5 6 7

CAN Example: Two Dimensional Space Each item is stored by the node who owns its mapping in the space 7 6 n5 n4 n3 5 f4 4 f1 3 n1 n2 2 f3 1 f2 1 2 3 4 5 6 7

CAN: Query Example Each node knows its neighbors in the d-space Forward query to the neighbor that is closest to the query id Example: assume n1 queries f4 7 6 n5 n4 n3 5 f4 4 f1 3 n1 n2 2 f3 1 f2 1 2 3 4 5 6 7

Chord Associate to each node and item a unique ID in an uni-dimensional space Properties Routing table size O(log(N)) , where N is the total number of nodes Guarantees that a file is found in O(log(N)) steps

Data Structure Assume identifier space is 0..2m Each node maintains Finger table Entry i in the finger table of n is the first node that succeeds or equals n + 2i Predecessor node An item identified by id is stored on the succesor node of id

Chord Example Assume an identifier space 0..8 Node n1:(1) joinsall entries in its finger table are initialized to itself Succ. Table i id+2i succ 0 2 1 1 3 1 2 5 1 1 7 6 2 5 3 4

Chord Example Node n2:(3) joins 1 7 6 2 5 3 4 Succ. Table i id+2i succ i id+2i succ 0 2 2 1 3 1 2 5 1 1 7 6 2 Succ. Table i id+2i succ 0 3 1 1 4 1 2 6 1 5 3 4

Chord Example Nodes n3:(0), n4:(6) join 1 7 6 2 5 3 4 Succ. Table i id+2i succ 0 1 1 1 2 2 2 4 0 Succ. Table i id+2i succ 0 2 2 1 3 6 2 5 6 1 7 Succ. Table i id+2i succ 0 7 0 1 0 0 2 2 2 6 2 Succ. Table i id+2i succ 0 3 6 1 4 6 2 6 6 5 3 4

Chord Examples Nodes: n1:(1), n2(3), n3(0), n4(6) Succ. Table Items Nodes: n1:(1), n2(3), n3(0), n4(6) Items: f1:(7), f2:(2) i id+2i succ 0 1 1 1 2 2 2 4 0 7 Succ. Table Items 1 7 i id+2i succ 0 2 2 1 3 6 2 5 6 1 Succ. Table 6 2 i id+2i succ 0 7 0 1 0 0 2 2 2 Succ. Table i id+2i succ 0 3 6 1 4 6 2 6 6 5 3 4

Query Upon receiving a query for item id, a node Check whether stores the item locally If not, forwards the query to the largest node in its successor table that does not exceed id Succ. Table Items i id+2i succ 0 1 1 1 2 2 2 4 0 7 Succ. Table Items 1 7 i id+2i succ 0 2 2 1 3 6 2 5 6 1 query(7) Succ. Table 6 2 i id+2i succ 0 7 0 1 0 0 2 2 2 Succ. Table i id+2i succ 0 3 6 1 4 6 2 6 6 5 3 4

Discussion Query can be implemented Iteratively Recursively Performance: routing in the overlay network can be more expensive than in the underlying network Because usually there is no correlation between node IDs and their locality; a query can repeatedly jump from Europe to North America, though both the initiator and the node that store the item are in Europe! Solutions: Tapestry takes care of this implicitly; CAN and Chord maintain multiple copies for each entry in their routing tables and choose the closest in terms of network distance

Discussion Robustness Security Maintain multiple copies associated to each entry in the routing tables Replicate an item on nodes with close ids in the identifier space Security Can be build on top of CAN, Chord, Tapestry, and Pastry

Discussion The key challenge of building wide area P2P systems is a scalable and robust location service Naptser: centralized solution Guarantee correctness and support approximate matching… …but neither scalable nor robust Gnutella, KaZaa Support approximate queries, scalable, and robust… …but doesn’t guarantee correctness (i.e., it may fail to locate an existing file) Distributed Hash Tables Guarantee correctness, highly scalable and robust… … but difficult to implement approximate matching