Presentation is loading. Please wait.

Presentation is loading. Please wait.

An Analysis of Internet Content Delivery Systems 19 rd November, 2007 Youngsub CSE, SNU.

Similar presentations


Presentation on theme: "An Analysis of Internet Content Delivery Systems 19 rd November, 2007 Youngsub CSE, SNU."— Presentation transcript:

1 An Analysis of Internet Content Delivery Systems 19 rd November, 2007 Youngsub Kwon @ CSE, SNU

2 2 Contents  Introduction  Overview of Content Delivery Systems  Methodology  High-Level Data Characteristics  Detailed Content Delivery Characteristics  The Potential Role of Caching in CDNs and P2P  Conclusion

3 3 Introduction  This paper examines content delivery from the point of view of four content delivery systems  HTTP web traffic  Akamai content delivery network  Kazaa and Gnutella P2P file sharing traffic  Results  Quantify the rapidly increasing importance of new content delivery systems, particularly peer-to-peer networks  Characterize the behavior of these systems from the perspectives of clients, objects, and servers  Derive implications for caching in these systems

4 4 Overview of Content Delivery Systems  WWW  Using the HTTP protocol (Consistency management)  Simple architecture (Server/Client)  most web objects are small(5~10KB)  Objects are accessed with Zipf popularity distribution  The number of web objects is enormous and rapidly growing

5 5 Overview of Content Delivery Systems  Content Delivery Networks (CDNS)  Collections of servers located strategically across the wide-area Internet  Content is replicated across the wide area. High availability  CDN have server in ISP points of presence  Clients can access topologically nearby replicas with low latency  CDNs reduce average downloaded response times, but DNS redirection causes overhead  Peer-to-Peer Systems (P2P)  Peers collaborate to form a distributed system for the purpose of exchanging content  Most content-serving hosts are run by end-user  Low availability, low capacity network connections

6 6 Methodology  Use passive network monitoring to collect traces of traffic  Network Composition  UW(=University of Washington) connects to its ISPs via two border routers - inbound, outbound traffic  Two routers are Fully connected to four switches  Each switches has a monitoring port that is used to copies packets to monitoring host  Tracing Infrastructure  Software - 26,000 lines of codes  Hardware - dual-processor Dell Precision Workstation 530 with 2.0Ghz Pentium III Xeon CPUs FreeBSD 4.5

7 7 Methodology  Distinguishing Traffic Types  Two types of traffic - HTTP traffic, non-HTTP traffic  HTTP Traffic - WWW, Akamai, Kazaa, Gnutella  Non-HTTP Traffic - Kazaa, Gnutella search traffic  Akamai – Port 80, 8080, 443 that is server by Akamai server  WWW - Port 80, 8080, 443 that is not server by Akamai server  Gnutella – Ports 6346 or 6347 – includes file transfer, but excludes Search and control traffic  Kazaa – Port 1214 – includes file transfer, but excludes Search and control traffic

8 8 High-Level Data Characteristics  TCP Bandwidth  All systems show a typical diurnal cycle  Akamai - 0.2%  Gnutella - 6.04%  WWW traffic - 14.3% of TCP traffic  Kazaa - 36.99% of TCP bytes

9 9 High-Level Data Characteristics  UW Client and server TCP bandwidth  Figure (a) – Inbound Data BWs  WWW peaking in the middle of the day  Kazza peaking late at night  Figure (b) – Outbound Data BWs  Peak Kazza BW dominates WWW by a factor of 3

10 10 High-Level Data Characteristics  Content types downloaded by UW clients  GIF & JPEG images account for 42% of downloads, account for only 16.3% of the bytes transferred  Compares with measurements from 1999 study  HTML traffic : -43%, GIF&JPG traffic : -59%  AVI&MPG traffic : 400%, MP3 traffic 300%

11 11 High-Level Data Characteristics  Summary  The balance of HTTP traffic has changed dramatically over the last server years  P2P traffic overtaking WWW traffic as the largest contributor to HTTP bytes transferred  Although UW is large publisher of web documents, P2P traffic makes the University an even larger exporter of data  The mixture of object types downloaded by UW clients has changed

12 12 Detailed Content Delivery Characteristics  Objects  Object size: P2P > WWW & Akamai  Top bandwidth consuming Objects  For Gnutella, we see that a relatively large number of objects account for a large portion of the transferred bytes

13 13 Detailed Content Delivery Characteristics  Objects – Top 10 bandwidth consuming objects  WWW – The top 10 objects are a mix of extremely small objects  Akamai – 8 out of the top 10 objects are larger and unpopular  Kazaa – Export objects are larger than import objects

14 14 Detailed Content Delivery Characteristics  Objects – Downloaded bytes by object type

15 15 Detailed Content Delivery Characteristics  Clients - Top UW bandwidth consuming clients  Figure (a) – Top Bandwidth Consuming UW Clients  WWW - Top 200 clients (0.5%)  13% of WWW traffic Kazza - Top 200 clients (4%)  50% of Kazza traffic  Figure (b) – Top Bandwidth Consuming UW Servers  Kazza: 200 clients  20% of the total HTTP bytes downloaded (worst offender)

16 16 Detailed Content Delivery Characteristics  Clients - Request rates over time

17 17 Detailed Content Delivery Characteristics  Servers-Top UW-internal bandwidth producing servers  Figure (a) – Top Bandwidth Consuming UW Servers  Gnutella: All of the the bytes  first 10 servers, WWW: steep curve Kazza: 80% of the bytes  top 334 servers  Figure (b)  WWW: 20 servers  20% of all HTTP bytes output Kazza: 170 server  50% of all HTTP bytes output

18 18 Detailed Content Delivery Characteristics  Servers-The UW-external bandwidth producing servers  Figure (a)  WWW: 938 external servers  50% of the bytes Kazza: 600 external servers  26% of the bytes  Figure (b)  Kazza: Top 500 external Kazza peers  10% of the bytes WWW: Top 500 servers  22% of the bytes

19 19 Detailed Content Delivery Characteristics  Servers  The response codes returned by external servers in each content delivery system  Figure (a)  Akamai and the WWW: 70% success, P2P: Less than 20% success  Figure (b) shows that nearly all HTTP bytes are for useful content.  Overhead of rejected requests is small compared to the amount of useful data transferred.

20 20 Detailed Content Delivery Characteristics  Scalability of P2P Systems  Whether P2P Systems like Kazaa can scale in environments such as the univ. ?  Every peer in P2P system consumes bandwidth in both directions  Each new P2P client added becomes a server for the entire P2P structure  Kazaa object is huge, so a small number of peers can consume an enormous amount of total net. Bandwidth  The bandwidth cost of each P2P peer is 90 times that of the web client !  It seems questionable whether any organization can supports a service with these characteristics

21 21 Detailed Content Delivery Characteristics  Summary  Peer-to-peer, which now accounts for over three quarters of HTTP traffic  A small number of P2P users are consuming a disproportionately high fraction of bandwidth  While the P2P request rate is quite low, the transfer last long  While the design of P2P overlay structures focuses on spreading the workload for scalability, our measurements show that a small number of servers are taking the majority of the burden

22 22 The Potential Role of Caching in CDNs  Akamai requests achieve an 88% ideal hit rate and a 50% practical hit rate, noticeably higher than www requests (77% and 36%)  Our analysis shows that akamai requests are more skewed towards the most popular documents than are WWW requests  We know that most bytes fetched from Akamai are from images and videos  This implies that much of Akamai's content is in fact static and could be cached  We would expect that widely deployed proxy caches would significantly reduce the need for a separate content delivery network

23 23 The Potential Role of Caching in P2P  The potential impact of caching in P2P systems may exceed the benefits seen in the web  Inbound cache byte hit rate = 35%, Outbound cache byte hit rate = 85%  Hit rate increases with client population size for outbound traffic. (1000 client - 40%, 500,000 client - 85%)  Reverse P2P cache saves the most bandwidth

24 24 Conclusion  P2P traffic now accounts for the majority of HTTP bytes transferred  P2P documents are three orders of magnitude larger than web objects  A small number of extremely large objects account for an enormous fraction of observed P2P traffic  A small number of clients and servers are responsible for the majority of the traffic we saw in the P2P systems  Each P2P client creates a significant bandwidth load in both directions


Download ppt "An Analysis of Internet Content Delivery Systems 19 rd November, 2007 Youngsub CSE, SNU."

Similar presentations


Ads by Google