Download presentation
Presentation is loading. Please wait.
Published byPaulina Kelly Modified over 6 years ago
1
Scale and Performance in the CoBlitz Large-File Distribution Service
KyoungSoo Park Vivek S. Pai Princeton University
2
Large-file Distribution
Increasing demand for large files Movies or software release On-line movie downloads Linux distribution Files are 100MB ~ a couple of GB One-to-many downloads Nice to use a CDN, but… KyoungSoo Park NSDI 2006
3
Why Not Web CDNs? Whole file caching Memory pressure
Optimized for 10KB objects 2GB = 200,000 x 10KB Memory pressure Working sets do not fit in memory Disk access 1000 times slower Waste of resources More servers needed Provisioning is a must KyoungSoo Park NSDI 2006
4
Peer-to-Peer? BitTorrent takes up ~30% Internet BW Custom software
Deployment is a must Configurations needed Companies may want managed service Handles flash crowds Handles long-lived objects KyoungSoo Park NSDI 2006
5
What We’d like is Large-file Service with No custom client
No custom server No prepositioning No rehosting No manual provisoning KyoungSoo Park NSDI 2006
6
CoBlitz: Scalable large-file CDN
Reducing the problem to small-file CDN Split large-files into chunks Distribute chunks at proxies Aggregate memory/cache HTTP needs no deployment Benefits Faster than BitTorrent by 55-86% (~500%) One copy from origin serves nodes Incremental build on existing CDNs KyoungSoo Park NSDI 2006
7
How it works DNS CDN CDN Origin Server Client Agent CDN CDN Agent
Only reverse proxy(CDN) caches the chunks! CDN = Redirector + Reverse Proxy DNS chunk1 chunk2 CDN CDN Origin Server HTTP RANGE QUERY coblitz.codeen.org chunk1 chunk 1 chunk 2 chunk 2 chunk 1 Client Agent CDN chunk 3 chunk3 CDN Agent Client chunk 3 chunk 4 chunk 5 chunk 5 chunk 4 chunk 5 CDN CDN chunk5 chunk4 KyoungSoo Park NSDI 2006
8
Smart Agent Preserves HTTP semantics Parallel chunk requests CDN
sliding window of “chunks” done waiting CDN done HTTP Client CDN waiting CDN waiting done CDN done waiting … … waiting … waiting KyoungSoo Park NSDI 2006
9
Operation & Challenges
Provides public service over 2 years Challenges Scalability & robustness Peering set difference Load to the origin server KyoungSoo Park NSDI 2006
10
Unilateral Peering Independent peering decision
No synchronized maintenance problem Motivation Partial network connectivity Internet2, CANARIE nodes Routing disruption Isolated nodes Improve both scalability & robustness KyoungSoo Park NSDI 2006
11
Peering Set Difference
No perfect clustering by design Assumption Close nodes shares common peers Both can reach Only can reach KyoungSoo Park NSDI 2006
12
Peering Set Difference
Highly variable App-level RTTs 10 x times variance than ICMP High rate of change in peer set Close nodes share less than 50% Low cache hit Low memory utility Excessive load to the origin KyoungSoo Park NSDI 2006
13
Peering Set Difference
How to fix? Avg RTT min RTT Increase # of samples Increase # of peers Hysteresis Close nodes share more than 90% KyoungSoo Park NSDI 2006
14
Reducing Origin Load Origin server Still have peering set difference
Critical in traffic to origin Proximity-based routing cf. P2P: key-based routing Converge exponentially fast 3-15% do one more hop Implicit overlay tree Result Origin load reduction by 5x Rerun hashing KyoungSoo Park NSDI 2006
15
Scale Experiments Use all live PlanetLab nodes as clients
380~400 live nodes at any time Simultaneous fetch of 50MB file Test scenarios Direct BitTorrent Total/Core CoBlitz uncached/cached/staggered Out-of-order numbers in paper KyoungSoo Park NSDI 2006
16
Throughput Distribution
1 0.9 BT-Core 0.8 Out-of-order staggered 55-86% 0.7 0.6 Fraction of Nodes <= X (CDF) 0.5 Direct 0.4 BT - total 0.3 BT - core 0.2 In - order uncached In - order staggered 0.1 In - order cached 2000 4000 6000 8000 10000 KyoungSoo Park NSDI 2006 Throughput(Kbps)
17
95% percentile: 1000+ secs faster
Downloading Times 95% percentile: secs faster KyoungSoo Park NSDI 2006
18
Synchronized Workload Congestion
Origin Server KyoungSoo Park NSDI 2006
19
Addressing Congestion
Proximity-based multi-hop routing Overlay tree for each chunk Dynamic chunk-window resizing Increase by 1/log(x), (where x is win size) if chunk finishes < average Decrease by 1 if retry kills the first chunk KyoungSoo Park NSDI 2006
20
Number of Failures Median number -> % KyoungSoo Park NSDI 2006
21
Performance after Flash Crowds
CoBlitz:70+% > 5Mbps BitTorrent: 20% > 5Mbps KyoungSoo Park NSDI 2006
22
Data Reuse 7 fetches for 400 nodes, 98% cache hit KyoungSoo Park
NSDI 2006
23
Comparison with Other Systems
Shark [NSDI05] Med thruput 0.96 Mbps with 185 clients CoBlitz: 3.15Mbps with 380~400 clients Bullet, Bullet’[SOSP03, USENIX05] Using UDP, Avg 7Mbps with 41 nodes CoBlitz: slightly better(7.4Mbps) with only TCP connections KyoungSoo Park NSDI 2006
24
Real-world Usage Fedora Core official mirror
US-East/West, England, Germany, Korea, Japan CiteSeer repository (50,000+ links) PlanetLab researchers Stork(U of Arizona) + ~10 others KyoungSoo Park NSDI 2006
25
Usage in Feb 2006 107 Number of Requests 106 105 104 103 102 10
KyoungSoo Park NSDI 2006
26
Number of Bytes Served CD ISO DVD ISO KyoungSoo Park NSDI 2006
27
Fedora Core 5 Release March 20th, 2006 Peaks over 700Mbps
Release point 10am M M M KyoungSoo Park NSDI 2006
28
Conclusion Scalable large-file transfer service
Evolution under real traffic Up and running 24/7 for over 2 years Unilateral peering, multi-hop routing, window size adjustment Better performance than P2P Better throughput, download time Far less origin traffic KyoungSoo Park NSDI 2006
29
Thank you! More information: http://codeen.cs.princeton.edu/coblitz/
How to use: *Some content restrictions apply See Web site for details Contact me if you want full access! KyoungSoo Park NSDI 2006
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.