Download presentation
Presentation is loading. Please wait.
Published byBelinda Gibson Modified over 9 years ago
1
Content Distribution March 2, 2011 2: Application Layer1
2
Contents r P2P architecture and benefits r P2P content distribution r Content distribution network (CDN) 2: Application Layer2
3
3 Pure P2P architecture r no always-on server r arbitrary end systems directly communicate r peers are intermittently connected and change IP addresses r Three topics: File distribution Searching for information Case Study: Skype peer-peer
4
2: Application Layer4 File Distribution: Server-Client vs P2P Question : How much time to distribute file from one server to N peers? usus u2u2 d1d1 d2d2 u1u1 uNuN dNdN Server Network (with abundant bandwidth) File, size F u s : server upload bandwidth u i : peer i upload bandwidth d i : peer i download bandwidth
5
2: Application Layer5 File distribution time: server-client usus u2u2 d1d1 d2d2 u1u1 uNuN dNdN Server Network (with abundant bandwidth) F r server sequentially sends N copies: NF/u s time r client i takes F/d i time to download increases linearly in N (for large N) = d cs = max { NF/u s, F/min(d i ) } i Time to distribute F to N clients using client/server approach
6
2: Application Layer6 File distribution time: P2P usus u2u2 d1d1 d2d2 u1u1 uNuN dNdN Server Network (with abundant bandwidth) F r server must send one copy: F/u s time r client i takes F/d i time to download r NF bits must be downloaded (aggregate) fastest possible upload rate: u s + u i d P2P = max { F/u s, F/min(d i ), NF/(u s + u i ) } i
7
2: Application Layer7 Server-client vs. P2P: example Client upload rate = u, F/u = 1 hour, u s = 10u, d min ≥ u s
8
Contents r P2P architecture and benefits r P2P content distribution r Content distribution network (CDN) 2: Application Layer8
9
P2P content distribution issues r Issues Peer discovery and group management Data placement and searching Reliable and efficient file exchange Security/privacy/anonymity/trust r Approaches for group management and data search (i.e., who has what?) Centralized (e.g., BitTorrent tracker) Unstructured (e.g., Gnutella) Structured (Distributed Hash Tables [DHT]) 2: Application Layer9
10
Centralized index (Napster) original “Napster” design 1) when peer connects, it informs central server: IP address content 2) Alice queries for “Hey Jude” 3) Alice requests file from Bob centralized directory server peers Alice Bob 1 1 1 1 2 3 2: Application Layer10
11
Centralized model BobAlice JaneJudy file transfer is decentralized, but locating content is highly centralized 2: Application Layer11
12
Centralized r Benefits: Low per-node state Limited bandwidth usage Short location time High success rate Fault tolerant r Drawbacks: Single point of failure Limited scale Possibly unbalanced load r copyright infringement BobAlice JaneJudy 2: Application Layer12
13
2: Application Layer13 File distribution: BitTorrent tracker: tracks peers participating in torrent torrent: group of peers exchanging chunks of a file obtain list of peers trading chunks peer r P2P file distribution
14
2: Application Layer14 BitTorrent (1) r file divided into 256KB chunks. r peer joining torrent: has no chunks, but will accumulate them over time registers with tracker to get list of peers, connects to subset of peers (“neighbors”) r while downloading, peer uploads chunks to other peers. r peers may come and go r once peer has entire file, it may (selfishly) leave or (altruistically) remain
15
2: Application Layer15 BitTorrent (2) Pulling Chunks r at any given time, different peers have different subsets of file chunks r periodically, a peer (Alice) asks each neighbor for list of chunks that they have. r Alice sends requests for her missing chunks rarest first Sending Chunks: tit-for-tat r Alice sends chunks to four neighbors currently sending her chunks at the highest rate re-evaluate top 4 every 10 secs r every 30 secs: randomly select another peer, starts sending chunks newly chosen peer may join top 4 “optimistically unchoke”
16
2: Application Layer16 BitTorrent: Tit-for-tat (1) Alice “optimistically unchokes” Bob (2) Alice becomes one of Bob’s top-four providers; Bob reciprocates (3) Bob becomes one of Alice’s top-four providers With higher upload rate, can find better trading partners & get file faster!
17
2: Application Layer17 P2P Case study: Skype r inherently P2P: pairs of users communicate. r proprietary application-layer protocol (inferred via reverse engineering) r hierarchical overlay with SNs r Index maps usernames to IP addresses; distributed over SNs Skype clients (SC) Supernode (SN) Skype login server
18
2: Application Layer18 Peers as relays r Problem when both Alice and Bob are behind “NATs”. NAT prevents an outside peer from initiating a call to insider peer r Solution: Using Alice’s and Bob’s SNs, Relay is chosen Each peer initiates session with relay. Peers can now communicate through NATs via relay
19
Distributed Hash Table (DHT) r DHT = distributed P2P database r Database has (key, value) pairs; key: ss number; value: human name key: content type; value: IP address r Peers query DB with key DB returns values that match the key r Peers can also insert (key, value) peers 2: Application Layer19
20
DHT Identifiers r Assign integer identifier to each peer in range [0,2 n -1]. Each identifier can be represented by n bits. r Require each key to be an integer in same range. r To get integer keys, hash original key. eg, key = h(“Led Zeppelin IV”) This is why they call it a distributed “hash” table 2: Application Layer20
21
How to assign keys to peers? r Central issue: Assigning (key, value) pairs to peers. r Rule: assign key to the peer that has the closest ID. r Convention in lecture: closest is the immediate successor of the key. r Ex: n=4; peers: 1,3,4,5,8,10,12,14; key = 13, then successor peer = 14 key = 15, then successor peer = 1 2: Application Layer21
22
1 3 4 5 8 10 12 15 Chord (a circular DHT) (1) r Each peer only aware of immediate successor and predecessor. r “Overlay network” 2: Application Layer22
23
Chord (a circular DHT) (2) 0001 0011 0100 0101 1000 1010 1100 1111 Who’s resp for key 1110 ? I am O(N) messages on avg to resolve query, when there are N peers 1110 Define closest as closest successor 2: Application Layer23
24
Chord (a circular DHT) with Shortcuts r Each peer keeps track of IP addresses of predecessor, successor, short cuts. r Reduced from 6 to 2 messages. r Possible to design shortcuts so O(log N) neighbors, O(log N) messages in query 1 3 4 5 8 10 12 15 Who’s resp for key 1110? 2: Application Layer24
25
Peer Churn r Peer 5 abruptly leaves r Peer 4 detects; makes 8 its immediate successor; asks 8 who its immediate successor is; makes 8’s immediate successor its second successor. r What if peer 13 wants to join? 1 3 4 5 8 10 12 15 To handle peer churn, require each peer to know the IP address of its two successors. Each peer periodically pings its two successors to see if they are still alive. 2: Application Layer25
26
Contents r P2P architecture and benefits r P2P content distribution r Content distribution network (CDN) 2: Application Layer26
27
Why Content Networks? r More hops between client and Web server more congestion! r Same data flowing repeatedly over links between clients and Web server S C1 C4 C2 C3 - IP router Slides from http://www.cis.udel.edu/~iyengar/courses/Overlays.ppt 2: Application Layer27
28
Why Content Networks? r Origin server is bottleneck as number of users grows r Flash Crowds (for instance, Sept. 11) r The Content Distribution Problem: Arrange a rendezvous between a content source at the origin server (www.cnn.com) and a content sink (us, as users) Slides from http://www.cis.udel.edu/~iyengar/courses/Overlays.ppt 2: Application Layer28
29
Example: Web Server Farm r Simple solution to the content distribution problem: deploy a large group of servers r Arbitrate client requests to servers using an “intelligent” L4-L7 switch r Pretty widely used today L4-L7 Switch Request from grad.umd.edu Request from ren.cis.udel.edu Request from ren.cis.udel.edu Request from grad.umd.edu www.cnn.com (Copy 1) www.cnn.com (Copy 3) www.cnn.com (Copy 2) 2: Application Layer29
30
Example: Caching Proxy r Majorly motivated by ISP business interests – reduction in bandwidth consumption of ISP from the Internet r Reduced network traffic r Reduced user perceived latency Client ren.cis.udel.edu Client merlot.cis.ud el.edu Intercepters Proxy www.cnn.com Internet TCP port 80 traffic Other traffic ISP 2: Application Layer30
31
But on Sept. 11, 2001 2: Application Layer31 Web Server www.cnn.com User mslab.kaist.ac.kr 1000,000 other hosts 1000,000 other hosts New Content WTC News! old content request - Caching Proxy ISP - Congestion / Bottleneck
32
Problems with discussed approaches: Server farms and Caching proxies r Server farms do nothing about problems due to network congestion, or to improve latency issues due to the network r Caching proxies serve only their clients, not all users on the Internet r Content providers (say, Web servers) cannot rely on existence and correct implementation of caching proxies r Accounting issues with caching proxies. For instance, www.cnn.com needs to know the number of hits to the webpage for advertisements displayed on the webpage 2: Application Layer32
33
Again on Sept. 11, 2001 with CDN 2: Application Layer33 Web Server www.cnn.com User mslab.kaist.ac.kr New Content WTC News! request new content 1000,000 other users 1000,000 other users - Surrogate - Distribution Infrastructure FL IL DE NY MA MI CA WA
34
Web replication - CDNs r Overlay network to distribute content from origin servers to users r Avoids large amounts of same data repeatedly traversing potentially congested links on the Internet r Reduces Web server load r Reduces user perceived latency r Tries to route around congested networks 2: Application Layer34
35
CDN vs. Caching Proxies r Caches are used by ISPs to reduce bandwidth consumption, CDNs are used by content providers to improve quality of service to end users r Caches are reactive, CDNs are proactive r Caching proxies cater to their users (web clients) and not to content providers (web servers), CDNs cater to the content providers (web servers) and clients r CDNs give control over the content to the content providers, caching proxies do not 2: Application Layer35
36
CDN Architecture Surrogate Request Routing Infrastructure Distribution & Accounting Infrastructure CDN Origin Server Client 2: Application Layer36
37
CDN Components r Content Delivery Infrastructure: Delivering content to clients from surrogates r Request Routing Infrastructure: Steering or directing content request from a client to a suitable surrogate r Distribution Infrastructure: Moving or replicating content from content source (origin server, content provider) to surrogates r Accounting Infrastructure: Logging and reporting of distribution and delivery activities 2: Application Layer37
38
Server Interaction with CDN Distribution Infrastructure 1 1. Origin server pushes new content to CDN OR CDN pulls content from origin server Accounting Infrastructure 2 2. Origin server requests logs and other accounting info from CDN OR CDN provides logs and other accounting info to origin server CDN Origin Server www.cnn.com 2: Application Layer38
39
Request Routing Infrastructure Client Interaction with CDN 1 1. Hi! I need www.cnn.com/sept11 2 2. Go to surrogate newyork.cnn.akamai.com 3 3. Hi! I need content /sept11 Q: How did the CDN choose the New York surrogate over the California surrogate ? Client Surrogate (NY) Surrogate (CA) CDN california.cnn.akamai.com newyorkcnn.akamai.com 2: Application Layer39
40
Request Routing Techniques r Request routing techniques use a set of metrics to direct users to “best” surrogate r Proprietary, but underlying techniques known: DNS based request routing Content Modification (URL rewriting) Anycast based (how common is anycast?) URL based request routing Transport layer request routing Combination of multiple mechanisms 2: Application Layer40
41
DNS based Request-Routing r Common due to the ubiquity of DNS as a directory service r Specialized DNS server inserted in DNS resolution process r DNS server is capable of returning a different set of A, NS or CNAME records based on policies/metrics 2: Application Layer41
42
DNS based Request-Routing Akamai DNS DNS query: www.cnn.com DNS response: A 145.155.10.15 Session local DNS server (dns.nyu.edu) 128.4.4.12 1) DNS query: www.cnn.com DNS response: A 145.155.10.15 www.cnn.com Surrogate 145.155.10.15 Surrogate 58.15.100.152 Akamai CDN test.nyu.edu 128.4.30.15 newyork.cnn.akamai.com california.cnn.akamai.com newyork.cnn.akamai.com Q: How does the Akamai DNS know which surrogate is closest ? 2: Application Layer42
43
DNS based Request-Routing DNS query DNS response Session Akamai DNS www.cnn.com Surrogate Akamai CDN test.nyu.edu 128.4.30.15 local DNS server (dns.nyu.edu) 128.4.4.12 DNS query DNS response Measure to Client DNS Measure to Client DNS Measurement results Measurements 2: Application Layer43
44
DNS based Request Routing: Caching www.cnn.com Client DNS 76.43.32.4 Surrogate 145.155.10.15 Surrogate 58.15.100.152 Akamai DNS Akamai CDN Client 76.43.35.53 Requesting DNS - 76.43.32.4 Surrogate - 145.155.10.15 www.cnn.com A 145.155.10.15 TTL = 10s Requesting DNS - 76.43.32.4 Available Bandwidth = 10 kbps RTT = 10 ms Requesting DNS - 76.43.32.4 Available Bandwidth = 5 kbps RTT = 100 ms 2: Application Layer44
45
DNS based Request Routing: Discussion r Originator Problem: Client may be far removed from client DNS r Client DNS Masking Problem: Virtually all DNS servers, except for root DNS servers honor requests for recursion Q: Which DNS server resolves a request for test.nyu.edu? Q: Which DNS server performs the last recursion of the DNS request? r Hidden Load Factor: A DNS resolution may result in drastically different load on the selected surrogate – issue in load balancing requests, and predicting load on surrogates 2: Application Layer45
46
Server Selection Metrics r Network Proximity (Surrogate to Client): Network hops (traceroute) Internet mapping services (NetGeo, IDMaps) … r Surrogate Load: Number of active TCP connections HTTP request arrival rate Other OS metrics … r Bandwidth Availability 2: Application Layer46
47
P4P : Provider Portal for (P2P) Applications Laboratory of Networked Systems Yale University
48
P2P: Benefits and Challenges P2P is a key to content delivery – Low costs to content owners/distributors – Scalability Challenge – Network-obliviousness usually leads to network inefficiency Intradomain: for Verizon network, P2P traffic traverses 1000 miles and 5.5 metro-hops on average Interdomain: 50%-90% of existing local pieces in active users are downloaded externally* * Karagiannis et al. Should Internet service providers fear peer-assisted content distribution? In Proceeding of IMC 2005
49
ISP Attempts to Address P2P Issues r Upgrade infrastructure r Customer pricing r Rate limiting, or termination of services r P2P caching ISPs cannot effectively address network efficiency alone
50
Locality-aware P2P: P2P’s Attempt to Improve Network Efficiency r P2P has flexibility in shaping communication patterns r Locality-aware P2P tries to use this flexibility to improve network efficiency E.g., Karagiannis et al. 2005, Bindal et al. 2006, Choffnes et al. 2008 (Ono)
51
Problems of Locality-aware P2P r Locality-aware P2P needs to reverse engineer network topology, traffic load and network policy r Locality-aware P2P may not achieve network efficiency ISP 0 ISP K ISP 1 ISP 2 Choose congested links Traverse costly interdomain links
52
A Fundamental Problem r Feedback from networks is limited E.g., end-to-end flow measurements or limited ICMP feedback
53
Our Goal Design a framework to enable better cooperation between networks and P2P P4P: Provider Portal for (P2P) Applications
54
ISP A iTracker P4P Architecture r Providers publish information via iTracker r Applications query providers’ information adjust traffic patterns accordingly P2P ISP B iTracker
55
Example:Tracker-based P2P r Information flow 1. peer queries appTracker 2/3. appTracker queries iTracker 4. appTracker selects a set of active peers ISP A 3 2 iTracker peer appTracker 1 4
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.