Content Distribution March 8, : Application Layer1
2 Review: P2P architecture r no always-on server r arbitrary end systems directly communicate r peers are intermittently connected and change IP addresses peer-peer
2: Application Layer3 File Distribution: Server-Client vs P2P Question : How much time to distribute file from one server to N peers? usus u2u2 d1d1 d2d2 u1u1 uNuN dNdN Server Network (with abundant bandwidth) File, size F u s : server upload bandwidth u i : peer i upload bandwidth d i : peer i download bandwidth
P2P content distribution issues r Issues Group management and data search Reliable and efficient file exchange Security/privacy/anonymity/trust r Approaches for group management and data search (i.e., who has what?) Centralized (e.g., BitTorrent tracker) Unstructured (e.g., Gnutella) Structured (Distributed Hash Tables [DHT]) 2: Application Layer4
Contents r P2P architecture and benefits r P2P content distribution r Content distribution network (CDN) 2: Application Layer5
Why Content Networks? r More hops between client and Web server more congestion! r Same data flowing repeatedly over links between clients and Web server S C1 C4 C2 C3 - IP router Slides from 2: Application Layer6
Why Content Networks? r Origin server is bottleneck as number of users grows r Flash Crowds (for instance, Sept. 11) r The Content Distribution Problem: Arrange a rendezvous between a content source at the origin server ( and a content sink (us, as users) Slides from 2: Application Layer7
Example: Web Server Farm r Simple solution to the content distribution problem: deploy a large group of servers r Arbitrate client requests to servers using an “intelligent” L4-L7 switch r Pretty widely used today L4-L7 Switch Request from grad.umd.edu Request from ren.cis.udel.edu Request from ren.cis.udel.edu Request from grad.umd.edu (Copy 1) (Copy 3) (Copy 2) 2: Application Layer8
Example: Caching Proxy r Majorly motivated by ISP business interests – reduction in bandwidth consumption of ISP from the Internet r Reduced network traffic r Reduced user perceived latency Client ren.cis.udel.edu Client merlot.cis.ud el.edu Intercepters Proxy Internet TCP port 80 traffic Other traffic ISP 2: Application Layer9
But on Sept. 11, : Application Layer10 Web Server User mslab.kaist.ac.kr 1000,000 other hosts 1000,000 other hosts New Content WTC News! old content request - Caching Proxy ISP - Congestion / Bottleneck
Problems with discussed approaches: Server farms and Caching proxies r Server farms do nothing about problems due to network congestion r Caching proxies serve only their clients, not all users on the Internet r Content providers (say, Web servers) cannot rely on existence and correct implementation of caching proxies r Accounting issues with caching proxies. For instance, needs to know the number of hits to the webpage for advertisements displayed on the webpage 2: Application Layer11
Again on Sept. 11, 2001 with CDN 2: Application Layer12 Web Server User mslab.kaist.ac.kr New Content WTC News! request new content 1000,000 other users 1000,000 other users - Surrogate - Distribution Infrastructure FL IL DE NY MA MI CA WA
Web replication - CDNs r Overlay network to distribute content from origin servers to users r Avoids large amount of same data repeatedly traversing potentially congested links on the Internet r Reduces Web server load r Reduces user perceived latency r Tries to route around congested networks 2: Application Layer13
CDN vs. Caching Proxies r Caches are used by ISPs to reduce bandwidth consumption, CDNs are used by content providers to improve quality of service to end users r Caches are reactive, CDNs are proactive r Caching proxies cater to their users (web clients) and not to content providers (web servers), CDNs cater to the content providers (web servers) and clients r CDNs give control over the content to the content providers, caching proxies do not 2: Application Layer14
CDN Architecture Surrogate Request Routing Infrastructure Distribution & Accounting Infrastructure CDN Origin Server Client 2: Application Layer15
CDN Organization r Limelight/Google: placing CDN servers near a small # of ISP core nets r Akamai: placing CDN servers deep into a large # of ISP networks’ sites r Nano Data Center (NaDa): home gateways (STBs/modems) as CDN servers (peer-to-peer delivery among NaDa servers) r P2P software (BitTorrent, PPLive, etc.) Edge Router Core Router ONTOLT DSLAMModem AccessMetro/Edge NetworkCore Network NaDa Digital Media Delivery Platform
CDN Components r Distribution Infrastructure: Moving or replicating content from content source (origin server, content provider) to surrogates r Request Routing Infrastructure: Steering or directing content request from a client to a suitable surrogate r Content Delivery Infrastructure: Delivering content to clients from surrogates r Accounting Infrastructure: Logging and reporting of distribution and delivery activities 2: Application Layer17
Server Interaction with CDN Distribution Infrastructure 1 1. Origin server pushes new content to CDN OR CDN pulls content from origin server Accounting Infrastructure 2 2. Origin server requests logs and other accounting info from CDN OR CDN provides logs and other accounting info to origin server CDN Origin Server 2: Application Layer18
Request Routing Infrastructure Client Interaction with CDN 1 1. Hi! I need Go to surrogate newyork.cnn.akamai.com 3 3. Hi! I need content /sept11 Q: How did the CDN choose the New York surrogate over the California surrogate ? Client Surrogate (NY) Surrogate (CA) CDN california.cnn.akamai.com newyorkcnn.akamai.com 2: Application Layer19
Request Routing Techniques r Request routing techniques use a set of metrics to direct users to “best” surrogate r Proprietary, but underlying techniques known: DNS based request routing Content modification (URL rewriting) Anycast based (how common is anycast?) URL based request routing Transport layer request routing Combination of multiple mechanisms 2: Application Layer20
DNS based Request-Routing r Common due to the ubiquity of DNS as a directory service r Specialized DNS server inserted in a DNS resolution process r DNS server is capable of returning a different set of A, NS or CNAME records based on policies/metrics 2: Application Layer21
DNS based Request-Routing Akamai DNS DNS query: DNS response: A Session local DNS server (dns.nyu.edu) ) DNS query: DNS response: A Surrogate Surrogate Akamai CDN test.nyu.edu newyork.cnn.akamai.com california.cnn.akamai.com newyork.cnn.akamai.com Q: How does the Akamai DNS know which surrogate is closest ? 2: Application Layer22
DNS based Request-Routing DNS query Akamai DNS Surrogate Akamai CDN test.nyu.edu local DNS server (dns.nyu.edu) DNS query Measure to Client DNS Measure to Client DNS Measurement results Measurements 2: Application Layer23
DNS based Request-Routing Client DNS Surrogate Surrogate Akamai DNS Akamai CDN Client Requesting DNS Surrogate A TTL = 10s Requesting DNS Available Bandwidth = 10 kbps RTT = 10 ms Requesting DNS Available Bandwidth = 5 kbps RTT = 100 ms 2: Application Layer24
DNS based Request Routing: Discussion r Originator Problem: Client may be far removed from client DNS r Client DNS Masking Problem: Virtually all DNS servers, except for root DNS servers honor requests for recursion Q: Which DNS server resolves a request for test.nyu.edu? Q: Which DNS server performs the last recursion of the DNS request? r Hidden Load Factor: A DNS resolution may result in drastically different load on the selected surrogate – issue in load balancing requests, and predicting load on surrogates 2: Application Layer25
Summary r P2P architecture and its benefits r P2P content distribution BitTorrent, Skype r Content distribution network (CDN) DNS-based request routing 2: Application Layer26