Scale and Performance in the CoBlitz Large-File Distribution Service

Slides:



Advertisements
Similar presentations
Cache Storage For the Next Billion Students: Anirudh Badam, Sunghwan Ihm Research Scientist: KyoungSoo Park Presenter: Vivek Pai Collaborator: Larry Peterson.
Advertisements

CoBlitz: A Scalable Large-file Transfer Service (COS 461) KyoungSoo Park Princeton University.
Rarest First and Choke Algorithms are Enough Arnaud LEGOUT INRIA, Sophia Antipolis France G. Urvoy-Keller and P. Michiardi Institut Eurecom France.
Incentives Build Robustness in BitTorrent Bram Cohen.
1 Communication Networks Kolja Eger, Prof. Dr. U. Killat 1 From Packet-level to Flow-level Simulations of P2P Networks Kolja Eger, Ulrich Killat Hamburg.
Dynamic Adaptive Streaming over HTTP2.0. What’s in store ▪ All about – MPEG DASH, pipelining, persistent connections and caching ▪ Google SPDY - Past,
CompSci 356: Computer Network Architectures Lecture 21: Content Distribution Chapter 9.4 Xiaowei Yang
Cooperative Caching of Dynamic Content on a Distributed Web Server Vegard Holmedahl, Ben Smith, Tao Yang Speaker: SeungLak Choi, DB Lab., CS Dept.
Distributed hash tables Protocols and applications Jinyang Li.
Web Caching Schemes1 A Survey of Web Caching Schemes for the Internet Jia Wang.
Cis e-commerce -- lecture #6: Content Distribution Networks and P2P (based on notes from Dr Peter McBurney © )
An Analysis of Internet Content Delivery Systems Stefan Saroiu, Krishna P. Gommadi, Richard J. Dunn, Steven D. Gribble, and Henry M. Levy Proceedings of.
Peer-Assisted Content Distribution Networks: Techniques and Challenges Pei Cao Stanford University.
Wide-area Network Acceleration for the Developing World Sunghwan Ihm (Princeton) KyoungSoo Park (KAIST) Vivek S. Pai (Princeton)
Adaptive Content Delivery for Scalable Web Servers Authors: Rahul Pradhan and Mark Claypool Presented by: David Finkel Computer Science Department Worcester.
Web Caching and CDNs March 3, Content Distribution Motivation –Network path from server to client is slow/congested –Web server is overloaded Web.
On-Demand Media Streaming Over the Internet Mohamed M. Hefeeda, Bharat K. Bhargava Presented by Sam Distributed Computing Systems, FTDCS Proceedings.
Content Delivery Networks. History Early 1990s sees 100% growth in internet traffic per year 1994 o Netscape forms and releases their first browser.
Tradeoffs in CDN Designs for Throughput Oriented Traffic Minlan Yu University of Southern California 1 Joint work with Wenjie Jiang, Haoyuan Li, and Ion.
1 Content Distribution Networks. 2 Replication Issues Request distribution: how to transparently distribute requests for content among replication servers.
Engineering A Public Affairs Project Donna Liu, Executive Director UChannel, Princeton University Vivek Pai, Associate Professor CS Department, Princeton.
P2P File Sharing Systems
Yang Chen.  More and more people are using online SNS to share their photos, news, …  Large Amount of data from the SNS site to the end users  How.
Content Distribution March 8, : Application Layer1.
Global NetWatch Copyright © 2003 Global NetWatch, Inc. Factors Affecting Web Performance Getting Maximum Performance Out Of Your Web Server.
Infrastructure for Better Quality Internet Access & Web Publishing without Increasing Bandwidth Prof. Chi Chi Hung School of Computing, National University.
Overcast: Reliable Multicasting with an Overlay Network CS294 Paul Burstein 9/15/2003.
Healing the Web: An Overview of CoDeeN & Related Projects Vivek Pai, Larry Peterson + many others Princeton University.
2: Application Layer1 Chapter 2 outline r 2.1 Principles of app layer protocols r 2.2 Web and HTTP r 2.3 FTP r 2.4 Electronic Mail r 2.5 DNS r 2.6 Socket.
Bellwether: Surrogate Services for Popular Content Duane Wessels & Ted Hardie NANOG 19 June 12, 2000.
An Efficient Approach for Content Delivery in Overlay Networks Mohammad Malli Chadi Barakat, Walid Dabbous Planete Project To appear in proceedings of.
PlanetLab Applications and Federation Kiyohide NAKAUCHI NICT 23 rd ITRC Symposium 2008/05/16 Aki NAKAO Utokyo / NICT
Adaptive Web Caching CS411 Dynamic Web-Based Systems Flying Pig Fei Teng/Long Zhao/Pallavi Shinde Computer Science Department.
Empirical Quantification of Opportunities for Content Adaptation in Web Servers Michael Gopshtein and Dror Feitelson School of Engineering and Computer.
MiddleMan: A Video Caching Proxy Server NOSSDAV 2000 Brian Smith Department of Computer Science Cornell University Ithaca, NY Soam Acharya Inktomi Corporation.
T3 data access via BitTorrent Charles G Waldman USATLAS/University of Chicago USATLAS T2/T3 Workshop Aug
Hiearchial Caching in Traffic Server. Hiearchial Caching  A set of techniques and mechanisms to increase the size and performance of network caches.
09/13/04 CDA 6506 Network Architecture and Client/Server Computing Peer-to-Peer Computing and Content Distribution Networks by Zornitza Genova Prodanoff.
CoDeeN,Large Files, & CoDeploy KyoungSoo Park, Vivek Pai, Larry Peterson Princeton University.
Multicast in Information-Centric Networking March 2012.
25/09/2016 INASP: Effective Network Management Workshops Unit 6: Solving Network Problems.
Wide-area Network Acceleration for the Developing World
Lab A: Planning an Installation
Confluent vs. Splittable Flows
Coral: A Peer-to-peer Content Distribution Network
Nuno Salta Supervisor: Manuel Ricardo Supervisor: Ricardo Morla
Administrative Things
Mohammad Malli Chadi Barakat, Walid Dabbous Alcatel meeting
Utilization of Azure CDN for the large file distribution
Co* Projects : CoDNS, CoDeploy, CoMon
Department of Computer Science University of Calgary
Plethora: Infrastructure and System Design
On the Scale and Performance of Cooperative Web Proxy Caching
ECE 671 – Lecture 16 Content Distribution Networks
VDN: Virtual Machine Image Distribution Network for Cloud Data Centers
ECF: an MPTCP Scheduler to Manage Heterogeneous Paths
COS 518: Advanced Computer Systems Lecture 9 Michael Freedman
Be Fast, Cheap and in Control
Content Distribution Networks
Edge computing (1) Content Distribution Networks
DotSlash: An Automated Web Hotspot Rescue System
AWS Cloud Computing Masaki.
Peer-to-Peer Streaming: An Hierarchical Approach
Small Is Not Always Beautiful
Content Distribution Networks + P2P File Sharing
Content Delivery and Remote DNS services
EE 122: Lecture 22 (Overlay Networks)
Performance-Robust Parallel I/O
Client/Server Computing and Web Technologies
Content Distribution Networks + P2P File Sharing
Presentation transcript:

Scale and Performance in the CoBlitz Large-File Distribution Service KyoungSoo Park Vivek S. Pai Princeton University

Large-file Distribution Increasing demand for large files Movies or software release On-line movie downloads Linux distribution Files are 100MB ~ a couple of GB One-to-many downloads Nice to use a CDN, but… KyoungSoo Park NSDI 2006

Why Not Web CDNs? Whole file caching Memory pressure Optimized for 10KB objects 2GB = 200,000 x 10KB Memory pressure Working sets do not fit in memory Disk access 1000 times slower Waste of resources More servers needed Provisioning is a must KyoungSoo Park NSDI 2006

Peer-to-Peer? BitTorrent takes up ~30% Internet BW Custom software Deployment is a must Configurations needed Companies may want managed service Handles flash crowds Handles long-lived objects KyoungSoo Park NSDI 2006

What We’d like is Large-file Service with No custom client No custom server No prepositioning No rehosting No manual provisoning KyoungSoo Park NSDI 2006

CoBlitz: Scalable large-file CDN Reducing the problem to small-file CDN Split large-files into chunks Distribute chunks at proxies Aggregate memory/cache HTTP needs no deployment Benefits Faster than BitTorrent by 55-86% (~500%) One copy from origin serves 43-55 nodes Incremental build on existing CDNs KyoungSoo Park NSDI 2006

How it works DNS CDN CDN Origin Server Client Agent CDN CDN Agent Only reverse proxy(CDN) caches the chunks! CDN = Redirector + Reverse Proxy DNS chunk1 chunk2 CDN CDN Origin Server HTTP RANGE QUERY coblitz.codeen.org chunk1 chunk 1 chunk 2 chunk 2 chunk 1 Client Agent CDN chunk 3 chunk3 CDN Agent Client chunk 3 chunk 4 chunk 5 chunk 5 chunk 4 chunk 5 CDN CDN chunk5 chunk4 KyoungSoo Park NSDI 2006

Smart Agent Preserves HTTP semantics Parallel chunk requests CDN sliding window of “chunks” done waiting CDN done HTTP Client CDN waiting CDN waiting done CDN done waiting … … waiting … waiting KyoungSoo Park NSDI 2006

Operation & Challenges Provides public service over 2 years http://coblitz.codeen.org:3125/URL Challenges Scalability & robustness Peering set difference Load to the origin server KyoungSoo Park NSDI 2006

Unilateral Peering Independent peering decision No synchronized maintenance problem Motivation Partial network connectivity Internet2, CANARIE nodes Routing disruption Isolated nodes Improve both scalability & robustness KyoungSoo Park NSDI 2006

Peering Set Difference No perfect clustering by design Assumption Close nodes shares common peers  Both can reach Only can reach KyoungSoo Park NSDI 2006

Peering Set Difference Highly variable App-level RTTs 10 x times variance than ICMP High rate of change in peer set Close nodes share less than 50% Low cache hit Low memory utility Excessive load to the origin KyoungSoo Park NSDI 2006

Peering Set Difference How to fix? Avg RTT  min RTT Increase # of samples Increase # of peers Hysteresis Close nodes share more than 90% KyoungSoo Park NSDI 2006

Reducing Origin Load Origin server Still have peering set difference Critical in traffic to origin Proximity-based routing cf. P2P: key-based routing Converge exponentially fast 3-15% do one more hop Implicit overlay tree Result Origin load reduction by 5x Rerun hashing KyoungSoo Park NSDI 2006

Scale Experiments Use all live PlanetLab nodes as clients 380~400 live nodes at any time Simultaneous fetch of 50MB file Test scenarios Direct BitTorrent Total/Core CoBlitz uncached/cached/staggered Out-of-order numbers in paper KyoungSoo Park NSDI 2006

Throughput Distribution 1 0.9 BT-Core 0.8 Out-of-order staggered 55-86% 0.7 0.6 Fraction of Nodes <= X (CDF) 0.5 Direct 0.4 BT - total 0.3 BT - core 0.2 In - order uncached In - order staggered 0.1 In - order cached 2000 4000 6000 8000 10000 KyoungSoo Park NSDI 2006 Throughput(Kbps)

95% percentile: 1000+ secs faster Downloading Times 95% percentile: 1000+ secs faster KyoungSoo Park NSDI 2006

Synchronized Workload Congestion Origin Server KyoungSoo Park NSDI 2006

Addressing Congestion Proximity-based multi-hop routing Overlay tree for each chunk Dynamic chunk-window resizing Increase by 1/log(x), (where x is win size) if chunk finishes < average Decrease by 1 if retry kills the first chunk KyoungSoo Park NSDI 2006

Number of Failures Median number -> % KyoungSoo Park NSDI 2006

Performance after Flash Crowds CoBlitz:70+% > 5Mbps BitTorrent: 20% > 5Mbps KyoungSoo Park NSDI 2006

Data Reuse 7 fetches for 400 nodes, 98% cache hit KyoungSoo Park NSDI 2006

Comparison with Other Systems Shark [NSDI05] Med thruput 0.96 Mbps with 185 clients CoBlitz: 3.15Mbps with 380~400 clients Bullet, Bullet’[SOSP03, USENIX05] Using UDP, Avg 7Mbps with 41 nodes CoBlitz: slightly better(7.4Mbps) with only TCP connections KyoungSoo Park NSDI 2006

Real-world Usage Fedora Core official mirror http://coblitz.planet-lab.org/ US-East/West, England, Germany, Korea, Japan CiteSeer repository (50,000+ links) PlanetLab researchers Stork(U of Arizona) + ~10 others KyoungSoo Park NSDI 2006

Usage in Feb 2006 107 Number of Requests 106 105 104 103 102 10 KyoungSoo Park NSDI 2006

Number of Bytes Served CD ISO DVD ISO KyoungSoo Park NSDI 2006

Fedora Core 5 Release March 20th, 2006 Peaks over 700Mbps Release point 10am M M M KyoungSoo Park NSDI 2006

Conclusion Scalable large-file transfer service Evolution under real traffic Up and running 24/7 for over 2 years Unilateral peering, multi-hop routing, window size adjustment Better performance than P2P Better throughput, download time Far less origin traffic KyoungSoo Park NSDI 2006

Thank you! More information: http://codeen.cs.princeton.edu/coblitz/ How to use: http://coblitz.codeen.org:3125/URL* *Some content restrictions apply See Web site for details Contact me if you want full access! KyoungSoo Park NSDI 2006