Download presentation
Presentation is loading. Please wait.
Published byMyles Fields Modified over 9 years ago
1
An Analysis of Internet Content Delivery Systems 19 rd November, 2007 Youngsub Kwon @ CSE, SNU
2
2 Contents Introduction Overview of Content Delivery Systems Methodology High-Level Data Characteristics Detailed Content Delivery Characteristics The Potential Role of Caching in CDNs and P2P Conclusion
3
3 Introduction This paper examines content delivery from the point of view of four content delivery systems HTTP web traffic Akamai content delivery network Kazaa and Gnutella P2P file sharing traffic Results Quantify the rapidly increasing importance of new content delivery systems, particularly peer-to-peer networks Characterize the behavior of these systems from the perspectives of clients, objects, and servers Derive implications for caching in these systems
4
4 Overview of Content Delivery Systems WWW Using the HTTP protocol (Consistency management) Simple architecture (Server/Client) most web objects are small(5~10KB) Objects are accessed with Zipf popularity distribution The number of web objects is enormous and rapidly growing
5
5 Overview of Content Delivery Systems Content Delivery Networks (CDNS) Collections of servers located strategically across the wide-area Internet Content is replicated across the wide area. High availability CDN have server in ISP points of presence Clients can access topologically nearby replicas with low latency CDNs reduce average downloaded response times, but DNS redirection causes overhead Peer-to-Peer Systems (P2P) Peers collaborate to form a distributed system for the purpose of exchanging content Most content-serving hosts are run by end-user Low availability, low capacity network connections
6
6 Methodology Use passive network monitoring to collect traces of traffic Network Composition UW(=University of Washington) connects to its ISPs via two border routers - inbound, outbound traffic Two routers are Fully connected to four switches Each switches has a monitoring port that is used to copies packets to monitoring host Tracing Infrastructure Software - 26,000 lines of codes Hardware - dual-processor Dell Precision Workstation 530 with 2.0Ghz Pentium III Xeon CPUs FreeBSD 4.5
7
7 Methodology Distinguishing Traffic Types Two types of traffic - HTTP traffic, non-HTTP traffic HTTP Traffic - WWW, Akamai, Kazaa, Gnutella Non-HTTP Traffic - Kazaa, Gnutella search traffic Akamai – Port 80, 8080, 443 that is server by Akamai server WWW - Port 80, 8080, 443 that is not server by Akamai server Gnutella – Ports 6346 or 6347 – includes file transfer, but excludes Search and control traffic Kazaa – Port 1214 – includes file transfer, but excludes Search and control traffic
8
8 High-Level Data Characteristics TCP Bandwidth All systems show a typical diurnal cycle Akamai - 0.2% Gnutella - 6.04% WWW traffic - 14.3% of TCP traffic Kazaa - 36.99% of TCP bytes
9
9 High-Level Data Characteristics UW Client and server TCP bandwidth Figure (a) – Inbound Data BWs WWW peaking in the middle of the day Kazza peaking late at night Figure (b) – Outbound Data BWs Peak Kazza BW dominates WWW by a factor of 3
10
10 High-Level Data Characteristics Content types downloaded by UW clients GIF & JPEG images account for 42% of downloads, account for only 16.3% of the bytes transferred Compares with measurements from 1999 study HTML traffic : -43%, GIF&JPG traffic : -59% AVI&MPG traffic : 400%, MP3 traffic 300%
11
11 High-Level Data Characteristics Summary The balance of HTTP traffic has changed dramatically over the last server years P2P traffic overtaking WWW traffic as the largest contributor to HTTP bytes transferred Although UW is large publisher of web documents, P2P traffic makes the University an even larger exporter of data The mixture of object types downloaded by UW clients has changed
12
12 Detailed Content Delivery Characteristics Objects Object size: P2P > WWW & Akamai Top bandwidth consuming Objects For Gnutella, we see that a relatively large number of objects account for a large portion of the transferred bytes
13
13 Detailed Content Delivery Characteristics Objects – Top 10 bandwidth consuming objects WWW – The top 10 objects are a mix of extremely small objects Akamai – 8 out of the top 10 objects are larger and unpopular Kazaa – Export objects are larger than import objects
14
14 Detailed Content Delivery Characteristics Objects – Downloaded bytes by object type
15
15 Detailed Content Delivery Characteristics Clients - Top UW bandwidth consuming clients Figure (a) – Top Bandwidth Consuming UW Clients WWW - Top 200 clients (0.5%) 13% of WWW traffic Kazza - Top 200 clients (4%) 50% of Kazza traffic Figure (b) – Top Bandwidth Consuming UW Servers Kazza: 200 clients 20% of the total HTTP bytes downloaded (worst offender)
16
16 Detailed Content Delivery Characteristics Clients - Request rates over time
17
17 Detailed Content Delivery Characteristics Servers-Top UW-internal bandwidth producing servers Figure (a) – Top Bandwidth Consuming UW Servers Gnutella: All of the the bytes first 10 servers, WWW: steep curve Kazza: 80% of the bytes top 334 servers Figure (b) WWW: 20 servers 20% of all HTTP bytes output Kazza: 170 server 50% of all HTTP bytes output
18
18 Detailed Content Delivery Characteristics Servers-The UW-external bandwidth producing servers Figure (a) WWW: 938 external servers 50% of the bytes Kazza: 600 external servers 26% of the bytes Figure (b) Kazza: Top 500 external Kazza peers 10% of the bytes WWW: Top 500 servers 22% of the bytes
19
19 Detailed Content Delivery Characteristics Servers The response codes returned by external servers in each content delivery system Figure (a) Akamai and the WWW: 70% success, P2P: Less than 20% success Figure (b) shows that nearly all HTTP bytes are for useful content. Overhead of rejected requests is small compared to the amount of useful data transferred.
20
20 Detailed Content Delivery Characteristics Scalability of P2P Systems Whether P2P Systems like Kazaa can scale in environments such as the univ. ? Every peer in P2P system consumes bandwidth in both directions Each new P2P client added becomes a server for the entire P2P structure Kazaa object is huge, so a small number of peers can consume an enormous amount of total net. Bandwidth The bandwidth cost of each P2P peer is 90 times that of the web client ! It seems questionable whether any organization can supports a service with these characteristics
21
21 Detailed Content Delivery Characteristics Summary Peer-to-peer, which now accounts for over three quarters of HTTP traffic A small number of P2P users are consuming a disproportionately high fraction of bandwidth While the P2P request rate is quite low, the transfer last long While the design of P2P overlay structures focuses on spreading the workload for scalability, our measurements show that a small number of servers are taking the majority of the burden
22
22 The Potential Role of Caching in CDNs Akamai requests achieve an 88% ideal hit rate and a 50% practical hit rate, noticeably higher than www requests (77% and 36%) Our analysis shows that akamai requests are more skewed towards the most popular documents than are WWW requests We know that most bytes fetched from Akamai are from images and videos This implies that much of Akamai's content is in fact static and could be cached We would expect that widely deployed proxy caches would significantly reduce the need for a separate content delivery network
23
23 The Potential Role of Caching in P2P The potential impact of caching in P2P systems may exceed the benefits seen in the web Inbound cache byte hit rate = 35%, Outbound cache byte hit rate = 85% Hit rate increases with client population size for outbound traffic. (1000 client - 40%, 500,000 client - 85%) Reverse P2P cache saves the most bandwidth
24
24 Conclusion P2P traffic now accounts for the majority of HTTP bytes transferred P2P documents are three orders of magnitude larger than web objects A small number of extremely large objects account for an enormous fraction of observed P2P traffic A small number of clients and servers are responsible for the majority of the traffic we saw in the P2P systems Each P2P client creates a significant bandwidth load in both directions
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.