Download presentation
Presentation is loading. Please wait.
Published byBelinda Gibson Modified over 9 years ago
Content Distribution March 2, 2011 2: Application Layer1
Contents r P2P architecture and benefits r P2P content distribution r Content distribution network (CDN) 2: Application Layer2
3 Pure P2P architecture r no always-on server r arbitrary end systems directly communicate r peers are intermittently connected and change IP addresses r Three topics: File distribution Searching for information Case Study: Skype peer-peer
2: Application Layer4 File Distribution: Server-Client vs P2P Question : How much time to distribute file from one server to N peers? usus u2u2 d1d1 d2d2 u1u1 uNuN dNdN Server Network (with abundant bandwidth) File, size F u s : server upload bandwidth u i : peer i upload bandwidth d i : peer i download bandwidth
2: Application Layer5 File distribution time: server-client usus u2u2 d1d1 d2d2 u1u1 uNuN dNdN Server Network (with abundant bandwidth) F r server sequentially sends N copies: NF/u s time r client i takes F/d i time to download increases linearly in N (for large N) = d cs = max { NF/u s, F/min(d i ) } i Time to distribute F to N clients using client/server approach
2: Application Layer6 File distribution time: P2P usus u2u2 d1d1 d2d2 u1u1 uNuN dNdN Server Network (with abundant bandwidth) F r server must send one copy: F/u s time r client i takes F/d i time to download r NF bits must be downloaded (aggregate) fastest possible upload rate: u s + u i d P2P = max { F/u s, F/min(d i ), NF/(u s + u i ) } i
2: Application Layer7 Server-client vs. P2P: example Client upload rate = u, F/u = 1 hour, u s = 10u, d min ≥ u s
Contents r P2P architecture and benefits r P2P content distribution r Content distribution network (CDN) 2: Application Layer8
P2P content distribution issues r Issues Peer discovery and group management Data placement and searching Reliable and efficient file exchange Security/privacy/anonymity/trust r Approaches for group management and data search (i.e., who has what?) Centralized (e.g., BitTorrent tracker) Unstructured (e.g., Gnutella) Structured (Distributed Hash Tables [DHT]) 2: Application Layer9
Centralized index (Napster) original “Napster” design 1) when peer connects, it informs central server: IP address content 2) Alice queries for “Hey Jude” 3) Alice requests file from Bob centralized directory server peers Alice Bob 1 1 1 1 2 3 2: Application Layer10
Centralized model BobAlice JaneJudy file transfer is decentralized, but locating content is highly centralized 2: Application Layer11
Centralized r Benefits: Low per-node state Limited bandwidth usage Short location time High success rate Fault tolerant r Drawbacks: Single point of failure Limited scale Possibly unbalanced load r copyright infringement BobAlice JaneJudy 2: Application Layer12
2: Application Layer13 File distribution: BitTorrent tracker: tracks peers participating in torrent torrent: group of peers exchanging chunks of a file obtain list of peers trading chunks peer r P2P file distribution
2: Application Layer14 BitTorrent (1) r file divided into 256KB chunks. r peer joining torrent: has no chunks, but will accumulate them over time registers with tracker to get list of peers, connects to subset of peers (“neighbors”) r while downloading, peer uploads chunks to other peers. r peers may come and go r once peer has entire file, it may (selfishly) leave or (altruistically) remain
2: Application Layer15 BitTorrent (2) Pulling Chunks r at any given time, different peers have different subsets of file chunks r periodically, a peer (Alice) asks each neighbor for list of chunks that they have. r Alice sends requests for her missing chunks rarest first Sending Chunks: tit-for-tat r Alice sends chunks to four neighbors currently sending her chunks at the highest rate re-evaluate top 4 every 10 secs r every 30 secs: randomly select another peer, starts sending chunks newly chosen peer may join top 4 “optimistically unchoke”
2: Application Layer16 BitTorrent: Tit-for-tat (1) Alice “optimistically unchokes” Bob (2) Alice becomes one of Bob’s top-four providers; Bob reciprocates (3) Bob becomes one of Alice’s top-four providers With higher upload rate, can find better trading partners & get file faster!
2: Application Layer17 P2P Case study: Skype r inherently P2P: pairs of users communicate. r proprietary application-layer protocol (inferred via reverse engineering) r hierarchical overlay with SNs r Index maps usernames to IP addresses; distributed over SNs Skype clients (SC) Supernode (SN) Skype login server
2: Application Layer18 Peers as relays r Problem when both Alice and Bob are behind “NATs”. NAT prevents an outside peer from initiating a call to insider peer r Solution: Using Alice’s and Bob’s SNs, Relay is chosen Each peer initiates session with relay. Peers can now communicate through NATs via relay
Distributed Hash Table (DHT) r DHT = distributed P2P database r Database has (key, value) pairs; key: ss number; value: human name key: content type; value: IP address r Peers query DB with key DB returns values that match the key r Peers can also insert (key, value) peers 2: Application Layer19
DHT Identifiers r Assign integer identifier to each peer in range [0,2 n -1]. Each identifier can be represented by n bits. r Require each key to be an integer in same range. r To get integer keys, hash original key. eg, key = h(“Led Zeppelin IV”) This is why they call it a distributed “hash” table 2: Application Layer20
How to assign keys to peers? r Central issue: Assigning (key, value) pairs to peers. r Rule: assign key to the peer that has the closest ID. r Convention in lecture: closest is the immediate successor of the key. r Ex: n=4; peers: 1,3,4,5,8,10,12,14; key = 13, then successor peer = 14 key = 15, then successor peer = 1 2: Application Layer21
1 3 4 5 8 10 12 15 Chord (a circular DHT) (1) r Each peer only aware of immediate successor and predecessor. r “Overlay network” 2: Application Layer22
Chord (a circular DHT) (2) 0001 0011 0100 0101 1000 1010 1100 1111 Who’s resp for key 1110 ? I am O(N) messages on avg to resolve query, when there are N peers 1110 Define closest as closest successor 2: Application Layer23
Chord (a circular DHT) with Shortcuts r Each peer keeps track of IP addresses of predecessor, successor, short cuts. r Reduced from 6 to 2 messages. r Possible to design shortcuts so O(log N) neighbors, O(log N) messages in query 1 3 4 5 8 10 12 15 Who’s resp for key 1110? 2: Application Layer24
Peer Churn r Peer 5 abruptly leaves r Peer 4 detects; makes 8 its immediate successor; asks 8 who its immediate successor is; makes 8’s immediate successor its second successor. r What if peer 13 wants to join? 1 3 4 5 8 10 12 15 To handle peer churn, require each peer to know the IP address of its two successors. Each peer periodically pings its two successors to see if they are still alive. 2: Application Layer25
Contents r P2P architecture and benefits r P2P content distribution r Content distribution network (CDN) 2: Application Layer26
Why Content Networks? r More hops between client and Web server more congestion! r Same data flowing repeatedly over links between clients and Web server S C1 C4 C2 C3 - IP router Slides from 2: Application Layer27
Why Content Networks? r Origin server is bottleneck as number of users grows r Flash Crowds (for instance, Sept. 11) r The Content Distribution Problem: Arrange a rendezvous between a content source at the origin server ( and a content sink (us, as users) Slides from 2: Application Layer28
Example: Web Server Farm r Simple solution to the content distribution problem: deploy a large group of servers r Arbitrate client requests to servers using an “intelligent” L4-L7 switch r Pretty widely used today L4-L7 Switch Request from Request from Request from Request from (Copy 1) (Copy 3) (Copy 2) 2: Application Layer29
Example: Caching Proxy r Majorly motivated by ISP business interests – reduction in bandwidth consumption of ISP from the Internet r Reduced network traffic r Reduced user perceived latency Client Client merlot.cis.ud Intercepters Proxy Internet TCP port 80 traffic Other traffic ISP 2: Application Layer30
But on Sept. 11, 2001 2: Application Layer31 Web Server User 1000,000 other hosts 1000,000 other hosts New Content WTC News! old content request - Caching Proxy ISP - Congestion / Bottleneck
Problems with discussed approaches: Server farms and Caching proxies r Server farms do nothing about problems due to network congestion, or to improve latency issues due to the network r Caching proxies serve only their clients, not all users on the Internet r Content providers (say, Web servers) cannot rely on existence and correct implementation of caching proxies r Accounting issues with caching proxies. For instance, needs to know the number of hits to the webpage for advertisements displayed on the webpage 2: Application Layer32
Again on Sept. 11, 2001 with CDN 2: Application Layer33 Web Server User New Content WTC News! request new content 1000,000 other users 1000,000 other users - Surrogate - Distribution Infrastructure FL IL DE NY MA MI CA WA
Web replication - CDNs r Overlay network to distribute content from origin servers to users r Avoids large amounts of same data repeatedly traversing potentially congested links on the Internet r Reduces Web server load r Reduces user perceived latency r Tries to route around congested networks 2: Application Layer34
CDN vs. Caching Proxies r Caches are used by ISPs to reduce bandwidth consumption, CDNs are used by content providers to improve quality of service to end users r Caches are reactive, CDNs are proactive r Caching proxies cater to their users (web clients) and not to content providers (web servers), CDNs cater to the content providers (web servers) and clients r CDNs give control over the content to the content providers, caching proxies do not 2: Application Layer35
CDN Architecture Surrogate Request Routing Infrastructure Distribution & Accounting Infrastructure CDN Origin Server Client 2: Application Layer36
CDN Components r Content Delivery Infrastructure: Delivering content to clients from surrogates r Request Routing Infrastructure: Steering or directing content request from a client to a suitable surrogate r Distribution Infrastructure: Moving or replicating content from content source (origin server, content provider) to surrogates r Accounting Infrastructure: Logging and reporting of distribution and delivery activities 2: Application Layer37
Server Interaction with CDN Distribution Infrastructure 1 1. Origin server pushes new content to CDN OR CDN pulls content from origin server Accounting Infrastructure 2 2. Origin server requests logs and other accounting info from CDN OR CDN provides logs and other accounting info to origin server CDN Origin Server 2: Application Layer38
Request Routing Infrastructure Client Interaction with CDN 1 1. Hi! I need 2 2. Go to surrogate 3 3. Hi! I need content /sept11 Q: How did the CDN choose the New York surrogate over the California surrogate ? Client Surrogate (NY) Surrogate (CA) CDN 2: Application Layer39
Request Routing Techniques r Request routing techniques use a set of metrics to direct users to “best” surrogate r Proprietary, but underlying techniques known: DNS based request routing Content Modification (URL rewriting) Anycast based (how common is anycast?) URL based request routing Transport layer request routing Combination of multiple mechanisms 2: Application Layer40
DNS based Request-Routing r Common due to the ubiquity of DNS as a directory service r Specialized DNS server inserted in DNS resolution process r DNS server is capable of returning a different set of A, NS or CNAME records based on policies/metrics 2: Application Layer41
DNS based Request-Routing Akamai DNS DNS query: DNS response: A Session local DNS server ( 1) DNS query: DNS response: A Surrogate Surrogate Akamai CDN Q: How does the Akamai DNS know which surrogate is closest ? 2: Application Layer42
DNS based Request-Routing DNS query DNS response Session Akamai DNS Surrogate Akamai CDN local DNS server ( DNS query DNS response Measure to Client DNS Measure to Client DNS Measurement results Measurements 2: Application Layer43
DNS based Request Routing: Caching Client DNS Surrogate Surrogate Akamai DNS Akamai CDN Client Requesting DNS - Surrogate - A TTL = 10s Requesting DNS - Available Bandwidth = 10 kbps RTT = 10 ms Requesting DNS - Available Bandwidth = 5 kbps RTT = 100 ms 2: Application Layer44
DNS based Request Routing: Discussion r Originator Problem: Client may be far removed from client DNS r Client DNS Masking Problem: Virtually all DNS servers, except for root DNS servers honor requests for recursion Q: Which DNS server resolves a request for Q: Which DNS server performs the last recursion of the DNS request? r Hidden Load Factor: A DNS resolution may result in drastically different load on the selected surrogate – issue in load balancing requests, and predicting load on surrogates 2: Application Layer45
Server Selection Metrics r Network Proximity (Surrogate to Client): Network hops (traceroute) Internet mapping services (NetGeo, IDMaps) … r Surrogate Load: Number of active TCP connections HTTP request arrival rate Other OS metrics … r Bandwidth Availability 2: Application Layer46
P4P : Provider Portal for (P2P) Applications Laboratory of Networked Systems Yale University
P2P: Benefits and Challenges P2P is a key to content delivery – Low costs to content owners/distributors – Scalability Challenge – Network-obliviousness usually leads to network inefficiency Intradomain: for Verizon network, P2P traffic traverses 1000 miles and 5.5 metro-hops on average Interdomain: 50%-90% of existing local pieces in active users are downloaded externally* * Karagiannis et al. Should Internet service providers fear peer-assisted content distribution? In Proceeding of IMC 2005
ISP Attempts to Address P2P Issues r Upgrade infrastructure r Customer pricing r Rate limiting, or termination of services r P2P caching ISPs cannot effectively address network efficiency alone
Locality-aware P2P: P2P’s Attempt to Improve Network Efficiency r P2P has flexibility in shaping communication patterns r Locality-aware P2P tries to use this flexibility to improve network efficiency E.g., Karagiannis et al. 2005, Bindal et al. 2006, Choffnes et al. 2008 (Ono)
Problems of Locality-aware P2P r Locality-aware P2P needs to reverse engineer network topology, traffic load and network policy r Locality-aware P2P may not achieve network efficiency ISP 0 ISP K ISP 1 ISP 2 Choose congested links Traverse costly interdomain links
A Fundamental Problem r Feedback from networks is limited E.g., end-to-end flow measurements or limited ICMP feedback
Our Goal Design a framework to enable better cooperation between networks and P2P P4P: Provider Portal for (P2P) Applications
ISP A iTracker P4P Architecture r Providers publish information via iTracker r Applications query providers’ information adjust traffic patterns accordingly P2P ISP B iTracker
Example:Tracker-based P2P r Information flow 1. peer queries appTracker 2/3. appTracker queries iTracker 4. appTracker selects a set of active peers ISP A 3 2 iTracker peer appTracker 1 4
Similar presentations
© 2025 Inc.
All rights reserved.