An Analysis of Internet Content Delivery Systems Stefan Saroiu, Krishna P. Gommadi, Richard J. Dunn, Steven D. Gribble, and Henry M. Levy Proceedings of the 5th Symposium on Operating Systems Design and Implementation December 2002
Outline Goals of Paper Overview of Content Delivery Systems Experimental Methodology Results Caching Conclusions
Goals Quantify the increasing importance of novel content delivery systems Characterize the behavior of these systems from the perspectives of clients, objects, and servers Derive implications for caching in these systems
Content Delivery Systems HTTP Web Traffic Content Delivery Networks Akamai Peer-to-peer file sharing networks Gnutella Kazaa
HTTP Traffic Clients request objects from web servers using HTTP Most web objects are small, 5-10KB. Web object requests follow a Zipf-like distribution Caching Cache hit rate increases logarithmically with client population Impossible for dynamic content
Zipf Distribution Compared with Sun Log Data
Content Delivery Networks (CDNs) Dedicated collections of servers that are geographically distributed Provide static content, e.g. images, streaming video Allows user to access replica of content that is “close” Replica location done via DNS interposition or URL rewriting at origin servers Redirection adds overhead Reduces average download response time
Peer-to-Peer Systems Peers form a distributed system to exchange content Batch-style downloads Most peers have low-availability and limited network capacity Files transferred via direct connection between peers
Experiment Methodology Use passive network monitoring to collect trace of TCP traffic between University of Washington (UW) to rest of Internet Collected 9 days of data, over 20 TB
Some Interesting Observations UW is an HTTP content provider Exported TB. Imported 3.44 TB Bandwidth consumption (in+out).2% Akamai 6.04% Gnutella 14.3% WWW 36.9% Kazaa Rest is other TCP protocols: mail, streaming video/audio, etc.
Some More Interesting Observations Compared to 1999 study HTML traffic has decreased 43% GIF/JPG traffic has decreased 59% AVI/MPG traffic increased nearly 400% MP3 traffic increased nearly 300%
Objects Median P2P object size is 4MB. Median Web object is 2KB 5% of Kazaa objects are over 100MB Top 1% of Kazaa objects account for 50% of bytes transferred For Web, top 1% account for 16% of bytes transferred
Clients For both Web and Kazaa, small number of clients account for large portion of traffic In Web, top 200 clients (0.5% of the population) account for 13% of the traffic In Kazaa, top 200 clients (4% of the population) account for 50% of the traffic
Servers Would expect server load for Kazaa to be much more distributed than for WWW This is not the case: Top 500 external Web servers provide 22% of the bytes Top 500 external Kazaa servers provide 10% of the bytes
Scalability With respect to bandwidth cost: adding another 450 Kazaa clients would be equivalent to doubling the web client population (from 40,000 to 80,000)
CDN Caching Do CDNs provide any performance benefits over local proxy cache? If Akamai traffic were directed to proxy cache instead: 88% ideal object hit rate (all objects cacheable) 50% practical hit rate Conclusion: Widely deployed proxy caches reduce need for separate CDNs
P2P Caching Inbound cache byte hit rate = 35% Outbound cache byte hit rate = 85% Hit rate increases with client population 1,000 clients = 40% hit rate 500,000 clients = 85% hit rate Conclusion: Reverse P2P cache saves the most bandwidth
Conclusions P2P traffic accounts for majority of HTTP bytes transferred P2P objects are significantly larger than Web objects Small number of large objects account for a large percentage of P2P traffic Small number of clients and servers responsible for majority of P2P traffic P2P traffic creates significant bandwidth load