Scale and Performance in the CoBlitz Large-File Distribution Service KyoungSoo Park Vivek S. Pai Princeton University
Large-file Distribution Increasing demand for large files Movies or software release On-line movie downloads Linux distribution Files are 100MB ~ a couple of GB One-to-many downloads Nice to use a CDN, but… KyoungSoo Park NSDI 2006
Why Not Web CDNs? Whole file caching Memory pressure Optimized for 10KB objects 2GB = 200,000 x 10KB Memory pressure Working sets do not fit in memory Disk access 1000 times slower Waste of resources More servers needed Provisioning is a must KyoungSoo Park NSDI 2006
Peer-to-Peer? BitTorrent takes up ~30% Internet BW Custom software Deployment is a must Configurations needed Companies may want managed service Handles flash crowds Handles long-lived objects KyoungSoo Park NSDI 2006
What We’d like is Large-file Service with No custom client No custom server No prepositioning No rehosting No manual provisoning KyoungSoo Park NSDI 2006
CoBlitz: Scalable large-file CDN Reducing the problem to small-file CDN Split large-files into chunks Distribute chunks at proxies Aggregate memory/cache HTTP needs no deployment Benefits Faster than BitTorrent by 55-86% (~500%) One copy from origin serves 43-55 nodes Incremental build on existing CDNs KyoungSoo Park NSDI 2006
How it works DNS CDN CDN Origin Server Client Agent CDN CDN Agent Only reverse proxy(CDN) caches the chunks! CDN = Redirector + Reverse Proxy DNS chunk1 chunk2 CDN CDN Origin Server HTTP RANGE QUERY coblitz.codeen.org chunk1 chunk 1 chunk 2 chunk 2 chunk 1 Client Agent CDN chunk 3 chunk3 CDN Agent Client chunk 3 chunk 4 chunk 5 chunk 5 chunk 4 chunk 5 CDN CDN chunk5 chunk4 KyoungSoo Park NSDI 2006
Smart Agent Preserves HTTP semantics Parallel chunk requests CDN sliding window of “chunks” done waiting CDN done HTTP Client CDN waiting CDN waiting done CDN done waiting … … waiting … waiting KyoungSoo Park NSDI 2006
Operation & Challenges Provides public service over 2 years http://coblitz.codeen.org:3125/URL Challenges Scalability & robustness Peering set difference Load to the origin server KyoungSoo Park NSDI 2006
Unilateral Peering Independent peering decision No synchronized maintenance problem Motivation Partial network connectivity Internet2, CANARIE nodes Routing disruption Isolated nodes Improve both scalability & robustness KyoungSoo Park NSDI 2006
Peering Set Difference No perfect clustering by design Assumption Close nodes shares common peers Both can reach Only can reach KyoungSoo Park NSDI 2006
Peering Set Difference Highly variable App-level RTTs 10 x times variance than ICMP High rate of change in peer set Close nodes share less than 50% Low cache hit Low memory utility Excessive load to the origin KyoungSoo Park NSDI 2006
Peering Set Difference How to fix? Avg RTT min RTT Increase # of samples Increase # of peers Hysteresis Close nodes share more than 90% KyoungSoo Park NSDI 2006
Reducing Origin Load Origin server Still have peering set difference Critical in traffic to origin Proximity-based routing cf. P2P: key-based routing Converge exponentially fast 3-15% do one more hop Implicit overlay tree Result Origin load reduction by 5x Rerun hashing KyoungSoo Park NSDI 2006
Scale Experiments Use all live PlanetLab nodes as clients 380~400 live nodes at any time Simultaneous fetch of 50MB file Test scenarios Direct BitTorrent Total/Core CoBlitz uncached/cached/staggered Out-of-order numbers in paper KyoungSoo Park NSDI 2006
Throughput Distribution 1 0.9 BT-Core 0.8 Out-of-order staggered 55-86% 0.7 0.6 Fraction of Nodes <= X (CDF) 0.5 Direct 0.4 BT - total 0.3 BT - core 0.2 In - order uncached In - order staggered 0.1 In - order cached 2000 4000 6000 8000 10000 KyoungSoo Park NSDI 2006 Throughput(Kbps)
95% percentile: 1000+ secs faster Downloading Times 95% percentile: 1000+ secs faster KyoungSoo Park NSDI 2006
Synchronized Workload Congestion Origin Server KyoungSoo Park NSDI 2006
Addressing Congestion Proximity-based multi-hop routing Overlay tree for each chunk Dynamic chunk-window resizing Increase by 1/log(x), (where x is win size) if chunk finishes < average Decrease by 1 if retry kills the first chunk KyoungSoo Park NSDI 2006
Number of Failures Median number -> % KyoungSoo Park NSDI 2006
Performance after Flash Crowds CoBlitz:70+% > 5Mbps BitTorrent: 20% > 5Mbps KyoungSoo Park NSDI 2006
Data Reuse 7 fetches for 400 nodes, 98% cache hit KyoungSoo Park NSDI 2006
Comparison with Other Systems Shark [NSDI05] Med thruput 0.96 Mbps with 185 clients CoBlitz: 3.15Mbps with 380~400 clients Bullet, Bullet’[SOSP03, USENIX05] Using UDP, Avg 7Mbps with 41 nodes CoBlitz: slightly better(7.4Mbps) with only TCP connections KyoungSoo Park NSDI 2006
Real-world Usage Fedora Core official mirror http://coblitz.planet-lab.org/ US-East/West, England, Germany, Korea, Japan CiteSeer repository (50,000+ links) PlanetLab researchers Stork(U of Arizona) + ~10 others KyoungSoo Park NSDI 2006
Usage in Feb 2006 107 Number of Requests 106 105 104 103 102 10 KyoungSoo Park NSDI 2006
Number of Bytes Served CD ISO DVD ISO KyoungSoo Park NSDI 2006
Fedora Core 5 Release March 20th, 2006 Peaks over 700Mbps Release point 10am M M M KyoungSoo Park NSDI 2006
Conclusion Scalable large-file transfer service Evolution under real traffic Up and running 24/7 for over 2 years Unilateral peering, multi-hop routing, window size adjustment Better performance than P2P Better throughput, download time Far less origin traffic KyoungSoo Park NSDI 2006
Thank you! More information: http://codeen.cs.princeton.edu/coblitz/ How to use: http://coblitz.codeen.org:3125/URL* *Some content restrictions apply See Web site for details Contact me if you want full access! KyoungSoo Park NSDI 2006