Efficient and Adaptive Replication using Content Clustering Yan Chen EECS Department UC Berkeley.

Slides:

Advertisements

Similar presentations

Dynamic Replica Placement for Scalable Content Delivery Yan Chen, Randy H. Katz, John D. Kubiatowicz {yanchen, randy, EECS Department.

Advertisements

Alex Cheung and Hans-Arno Jacobsen August, 14 th 2009 MIDDLEWARE SYSTEMS RESEARCH GROUP.

Ningning HuCarnegie Mellon University1 Optimizing Network Performance In Replicated Hosting Peter Steenkiste (CMU) with Ningning Hu (CMU), Oliver Spatscheck.

1 Content Delivery Networks iBAND2 May 24, 1999 Dave Farber CTO Sandpiper Networks, Inc.

1 A Case For End System Multicast Yang-hua Chu, Sanjay Rao and Hui Zhang Carnegie Mellon University Largely adopted from Jonathan Shapiro’s slides at umass.

A Taxonomy and Survey of Content Delivery Networks Meng-Huan Wu 2011/10/26 1.

Mohamed Hefeeda 1 School of Computing Science Simon Fraser University, Canada ISP-Friendly Peer Matching without ISP Collaboration Mohamed Hefeeda (Joint.

Spring 2003CS 4611 Content Distribution Networks Outline Implementation Techniques Hashing Schemes Redirection Strategies.

SCAN: A Dynamic, Scalable, and Efficient Content Distribution Network Yan Chen, Randy H. Katz, John D. Kubiatowicz {yanchen, randy,

Web Caching Schemes1 A Survey of Web Caching Schemes for the Internet Jia Wang.

1 Clustering Web Content for Efficient Replication Yan Chen, Lili Qiu*, Weiyu Chen, Luan Nguyen, Randy H. Katz EECS Department UC Berkeley *Microsoft Research.

Cis e-commerce -- lecture #6: Content Distribution Networks and P2P (based on notes from Dr Peter McBurney © )

An Analysis of Internet Content Delivery Systems Stefan Saroiu, Krishna P. Gommadi, Richard J. Dunn, Steven D. Gribble, and Henry M. Levy Proceedings of.

Peer-to-Peer Based Multimedia Distribution Service Zhe Xiang, Qian Zhang, Wenwu Zhu, Zhensheng Zhang IEEE Transactions on Multimedia, Vol. 6, No. 2, April.

Internet Iso-bar: A Scalable Overlay Distance Monitoring System Yan Chen, Lili Qiu, Chris Overton and Randy H. Katz.

1 Clustering Web Content for Efficient Replication Yan Chen, Lili Qiu*, Weiyu Chen, Luan Nguyen, Randy H. Katz EECS Department UC Berkeley *Microsoft Research.

Scalable Adaptive Data Dissemination Under Heterogeneous Environment Yan Chen, John Kubiatowicz and Ben Zhao UC Berkeley.

Application Layer Multicast

Flash Crowds And Denial of Service Attacks: Characterization and Implications for CDNs and Web Sites Aaron Beach Cs395 network security.

Anycast Jennifer Rexford Advanced Computer Networks Tuesdays/Thursdays 1:30pm-2:50pm.

1 An Overlay Scheme for Streaming Media Distribution Using Minimum Spanning Tree Properties Journal of Internet Technology Volume 5(2004) No.4 Reporter.

Introspective Replica Management Yan Chen, Hakim Weatherspoon, and Dennis Geels Our project developed and evaluated a replica management algorithm suitable.

Internet-Scale Research at Universities Panel Session SAHARA Retreat, Jan 2002 Prof. Randy H. Katz, Bhaskaran Raman, Z. Morley Mao, Yan Chen.

CS218 – Final Project A “Small-Scale” Application- Level Multicast Tree Protocol Jason Lee, Lih Chen & Prabash Nanayakkara Tutor: Li Lao.

Squirrel: A decentralized peer- to-peer web cache Paul Burstein 10/27/2003.

Or, Providing Scalable, Decentralized Location and Routing Network Services Tapestry: Fault-tolerant Wide-area Application Infrastructure Motivation and.

Clustering of Web Content for Efficient Replication Yan Chen, Lili Qiu, Wei Chen, Luan Nguyen and Randy H. Katz {yanchen, wychen, luann,

Building a Strong Foundation for a Future Internet Jennifer Rexford ’91 Computer Science Department (and Electrical Engineering and the Center for IT Policy)

World Wide Web Caching: Trends and Technology Greg Barish and Katia Obraczka USC Information Science Institute IEEE Communications Magazine, May 2000 Presented.

Tradeoffs in CDN Designs for Throughput Oriented Traffic Minlan Yu University of Southern California 1 Joint work with Wenjie Jiang, Haoyuan Li, and Ion.

1 Content Distribution Networks. 2 Replication Issues Request distribution: how to transparently distribute requests for content among replication servers.

Content Distribution March 8, : Application Layer1.

Ao-Jan Su, David R. Choffnes, Fabián E. Bustamante and Aleksandar Kuzmanovic Department of EECS Northwestern University Relative Network Positioning via.

PIC: Practical Internet Coordinates for Distance Estimation Manuel Costa joint work with Miguel Castro, Ant Rowstron, Peter Key Microsoft Research Cambridge.

SCAN: a Scalable, Adaptive, Secure and Network-aware Content Distribution Network Yan Chen CS Department Northwestern University.

End-to-end QoE Optimization Through Overlay Network Deployment Bart De Vleeschauwer, Filip De Turck, Bart Dhoedt and Piet Demeester Ghent University -

Network Aware Resource Allocation in Distributed Clouds.

Overcast: Reliable Multicasting with an Overlay Network CS294 Paul Burstein 9/15/2003.

World Wide Web Caching: Trends and Technologys Gerg Barish & Katia Obraczka USC Information Sciences Institute, USA,2000.

2: Application Layer1 Chapter 2 outline r 2.1 Principles of app layer protocols r 2.2 Web and HTTP r 2.3 FTP r 2.4 Electronic Mail r 2.5 DNS r 2.6 Socket.

1 Resilient and Coherence Preserving Dissemination of Dynamic Data Using Cooperating Peers Shetal Shah, IIT Bombay Kirthi Ramamritham, IIT Bombay Prashant.

1 On the Placement of Web Server Replicas Lili Qiu, Microsoft Research Venkata N. Padmanabhan, Microsoft Research Geoffrey M. Voelker, UCSD IEEE INFOCOM’2001,

A Scalable, Adaptive, Network-aware Infrastructure for Efficient Content Delivery Yan Chen Ph.D. Status Talk EECS Department UC Berkeley.

TOMA: A Viable Solution for Large- Scale Multicast Service Support Li Lao, Jun-Hong Cui, and Mario Gerla UCLA and University of Connecticut Networking.

Microsoft Research1 Characterizing Alert and Browse Services for Mobile Clients Atul Adya, Victor Bahl, Lili Qiu Microsoft Research USENIX Annual Technical.

1 On the Placement of Web Server Replicas Lili Qiu, Microsoft Research Venkata N. Padmanabhan, Microsoft Research Geoffrey M. Voelker, UCSD IEEE INFOCOM’2001,

Towards a Scalable, Adaptive and Network-aware Content Distribution Network Yan Chen EECS Department UC Berkeley.

PROP: A Scalable and Reliable P2P Assisted Proxy Streaming System Computer Science Department College of William and Mary Lei Guo, Songqing Chen, and Xiaodong.

Towards a Transparent and Proactively-Managed Internet Ehab Al-Shaer School of Computer Science DePaul University Yan Chen EECS Department Northwestern.

NUS.SOC.CS5248 Ooi Wei Tsang Course Matters. NUS.SOC.CS5248 Ooi Wei Tsang Deadlines 11 Oct: Survey Paper Due 18 Oct: Paper Reviews Due.

KAIS T On the problem of placing Mobility Anchor Points in Wireless Mesh Networks Lei Wu & Bjorn Lanfeldt, Wireless Mesh Community Networks Workshop, 2006.

Plethora: Infrastructure and System Design. Introduction Peer-to-Peer (P2P) networks: –Self-organizing distributed systems –Nodes receive and provide.

Efficient and Adaptive Replication using Content Clustering Yan Chen EECS Department UC Berkeley.

CS 6401 Overlay Networks Outline Overlay networks overview Routing overlays Resilient Overlay Networks Content Distribution Networks.

Content Delivery Networks: Status and Trends Speaker: Shao-Fen Chou Advisor: Dr. Ho-Ting Wu 5/8/

Dynamic Replica Placement for Scalable Content Delivery Yan Chen, Randy H. Katz, John D. Kubiatowicz {yanchen, randy, EECS Department.

An overlay for latency gradated multicasting Anwitaman Datta SCE, NTU Singapore Ion Stoica, Mike Franklin EECS, UC Berkeley

09/13/04 CDA 6506 Network Architecture and Client/Server Computing Peer-to-Peer Computing and Content Distribution Networks by Zornitza Genova Prodanoff.

Network Computing Laboratory Load Balancing and Stability Issues in Algorithms for Service Composition Bhaskaran Raman & Randy H.Katz U.C Berkeley INFOCOM.

On the Placement of Web Server Replicas Yu Cai. Paper On the Placement of Web Server Replicas Lili Qiu, Venkata N. Padmanabhan, Geoffrey M. Voelker Infocom.

Content Distribution Networks (CDNs)

/ Fast Web Content Delivery An Introduction to Related Techniques by Paper Survey B Li, Chien-chang R Sung, Chih-kuei.

John S. Otto Mario A. Sánchez John P. Rula Fabián E. Bustamante Northwestern, EECS.

1 Scalability and Accuracy in a Large-Scale Network Emulator Nov. 12, 2003 Byung-Gon Chun.

Drafting Behind Akamai (Travelocity-Based Detouring) Ao-Jan Su, David R. Choffnes, Aleksandar Kuzmanovic and Fabián E. Bustamante Department of EECS Northwestern.

Mohammad Malli Chadi Barakat, Walid Dabbous Alcatel meeting

Dynamic Replica Placement for Scalable Content Delivery

Replica Placement Heuristics of Application-level Multicast

EE 122: Lecture 22 (Overlay Networks)

Design and Implementation of OverLay Multicast Tree Protocol

Presentation transcript:

Efficient and Adaptive Replication using Content Clustering Yan Chen EECS Department UC Berkeley

Motivation Internet has evolved to become a commercial infrastructure for service delivery –Web delivery, VoIP, streaming media … Challenges for Internet-scale services –Scalability: 600M users, 35M Web sites, 28Tb/s –Efficiency: bandwidth, storage, management –Agility: dynamic clients/network/servers –Security, etc. Focus on content delivery - Content Distribution Network (CDN) –Totally 4 Billion Web pages, daily growth of 7M pages –Annual growth of 200% for next 4 years

CDN and its Challenges

Inefficient replication No coherence for dynamic content Unscalable network monitoring - O(M*N) X

CDN Applications (e.g. streaming media) SCAN: Scalable Content Access Network Provision: Cooperative Clustering-based Replication User Behavior/ Workload Monitoring Coherence: Update Multicast Tree Construction Network Performance Monitoring Network Distance/ Congestion/ Failure Estimation red: my work, black: out of scope

SCAN Coherence for dynamic content Cooperative clustering-based replication s1, s4, s5

SCAN X Scalable network monitoring - O(M+N) s1, s4, s5 Cooperative clustering-based replication Coherence for dynamic content

Internet-scale Simulation Network Topology –Pure-random, Waxman & transit-stub synthetic topology –An AS-level topology from 7 widely-dispersed BGP peers Web Workload Web Site PeriodDuration# Requests avg –min-max # Clients avg –min-max # Client groups avg –min-max MSNBCAug-Oct/199910–11am1.5M–642K–1.7M129K–69K–150K15.6K-10K-17K NASAJul-Aug/1995All day79K-61K-101K World Cup May-Jul/1998All day29M – 1M – 73M103K–13K–218KN/A –Aggregate MSNBC Web clients with BGP prefix »BGP tables from a BBNPlanet router –Aggregate NASA Web clients with domain names –Map the client groups onto the topology

Internet-scale Simulation – E2E Measurement NLANR Active Measurement Project data set –111 sites on America, Asia, Australia and Europe –Round-trip time (RTT) between every pair of hosts every minute –17M daily measurement –Raw data: Jun. – Dec. 2001, Nov Keynote measurement data –Measure TCP performance from about 100 worldwide agents –Heterogeneous core network: various ISPs –Heterogeneous access network: »Dial up 56K, DSL and high-bandwidth business connections –Targets »40 most popular Web servers + 27 Internet Data Centers –Raw data: Nov. – Dec. 2001, Mar. – May 2002

Clustering Web Content for Efficient Replication

Overview CDN uses non-cooperative replication - inefficient Paradigm shift: cooperative push –Where to push – greedy algorithms can achieve close to optimal performance [JJKRS01, QPV01] –But what content to be pushed? –At what granularity? Clustering of objects for replication –Close-to-optimal performance with small overhead Incremental clustering –Push before accessed: improve availability during flash crowds

Outline Architecture Problem formulation Granularity of replication Incremental clustering and replication Conclusions Future Research

CDN name server Client 1 Local DNS serverLocal CDN server 1. GET request 4. local CDN server IP address Web content server Client 2 Local DNS server Local CDN server 2. Request for hostname resolution 3. Reply: local CDN server IP address 5.GET request 8. Response 6.GET request if cache miss ISP 2 ISP 1 Conventional CDN: Non-cooperative Pull 7. Response Inefficient replication

CDN name server Client 1 Local DNS serverLocal CDN server 1. GET request 4. Redirected server IP address Web content server Client 2 Local DNS server Local CDN server 2. Request for hostname resolution 3. Reply: nearby replica server or Web server IP address ISP 2 ISP 1 5. GET request 6. Response 5.GET request if no replica yet SCAN: Cooperative Push 0. Push replicas Significantly reduce the # of replicas and update cost

Comparison between Conventional CDNs and SCAN Conventional CDNs SCAN Average retrieval latency (ms) Number of URL replicas deployed 121,0165,000 Number of update messages 1,349,65554,564

Problem Formulation Find a scalable, adaptive replication strategy to reduce –Clients’ average retrieval cost –Replica location computation cost –Amount of replica directory state to maintain Subject to certain total replication cost (e.g., # of URL replicas)

Outline Architecture Problem formulation Granularity of replication Incremental clustering and replication Conclusions Future Research

Per Web site Per URL

60 – 70% average retrieval cost reduction for Per URL scheme Per URL is too expensive for management! Replica Placement: Per Website vs. Per URL

Where R: # of replicas per URL M: # of URLs To compute on average 10 replicas/URL for top 1000 URLs takes several days on a normal server! Replication SchemeState to MaintainComputation Cost Per WebsiteO(R) Per URLO(R × M) Overhead Comparison

Where R: # of replicas per URL K: # of clusters M: # of URLs (M >> K) Replication SchemeStates to MaintainComputation Cost Per WebsiteO(R) Per ClusterO(R × K + M)O(R × K) Per URLO(R × M) Overhead Comparison

Clustering Web Content General clustering framework –Define the correlation distance between URLs –Cluster diameter: the max distance between any two members »Worst correlation in a cluster –Generic clustering: minimize the max diameter of all clusters Correlation distance definition based on –Spatial locality –Temporal locality –Popularity

Spatial Clustering Correlation distance between two URLs defined as –Euclidean distance –Vector similarity URL spatial access vector –Blue URL

Clustering Web Content (cont’d) Popularity-based clustering –OR even simpler, sort them and put the first N/K elements into the first cluster, etc. - binary correlation Temporal clustering – Divide traces into multiple individuals’ access sessions [ABQ01] – In each session, – Average over multiple sessions in one day

Performance of Cluster-based Replication Spatial clustering with Euclidean distance and popularity-based clustering perform the best –Small # of clusters (with only 1-2% of # of URLs) can achieve close to per-URL performance, with much less overhead MSNBC, 8/2/1999, 5 replicas/URL

Outline Architecture Problem formulation Granularity of replication Incremental clustering and replication Conclusions Future Research

Static clustering and replication Two daily traces: training trace and new trace Static clustering performs poorly beyond a week MethodsStatic 1Static 2Optimal Traces used for clusteringTraining New Traces used for replicationTrainingNew Traces used for evaluationNew Performance of static clustering almost doubles the optimal !

Incremental Clustering Generic framework 1.If new URL u match with existing clusters c, add u to c and replicate u to existing replicas of c 2.Else create new clusters and replicate them Two types of incremental clustering –Online: without any access logs »High availability –Offline: with access logs »Close-to-optimal performance

Groups of siblings URL1 URL2 URL Groups of the same hyperlink depth (smallest # of links from root) Online Incremental Clustering Predict access patterns based on semantics Simplify to popularity prediction Groups of URLs with similar popularity? Use hyperlink structures!

Online Popularity Prediction Experiments –Crawl on 5/3/2002 with hyperlink depth 4, then group the URLs –Use corresponding access logs to analyze the correlation –Groups of siblings have the best correlation Measure the divergence of URL popularity within a group: access freq span =

Semantics-based Incremental Clustering Put new URL into existing cluster with largest # of siblings –In case of a tie, choose the cluster w/ more replicas Simulation on 5/3/2002 MSNBC –8-10am trace: static popularity clustering + replication –At 10am: 16 new URLs - online inc. clustering + replication –Evaluation with 10-12am trace: 16 URLs has 33K requests ?

Online Incremental Clustering and Replication Results 1/8 compared w/ no replication, and 1/5 for random replication

Online Incremental Clustering and Replication Results Double the optimal retrieval cost, but only 4% of its replication cost

Conclusions Cooperative, clustering-based replication –Cooperative push: only 4 - 5% replication/update cost compared with existing CDNs –URL Clustering reduce the management/computational overhead by two orders of magnitude »Spatial clustering and popularity-based clustering recommended –Incremental clustering to adapt to emerging URLs »Hyperlink-based online incremental clustering for high availability and performance improvement Self-organize replicas into app-level multicast tree for update dissemination Scalable overlay network monitoring –O(M+N) instead of O(M*N), given M client groups and N servers

Outline Architecture Problem formulation Granularity of replication Incremental clustering and replication Conclusions Future Research

Future Research (I) Measurement-based Internet study and protocol/architecture design –Use inference techniques to develop Internet behavior models »Network operators reluctant to reveal internal network configurations –Root cause analysis: large, heterogeneous data mining »Leverage graphics/visualization for interactive mining –Apply deeper understanding of Internet behaviors for reassessment/design of protocol/architecture –E.g., Internet bottleneck – peering links? How and Why? Implications?

Future Research (II) Network traffic anomaly characterization, identification and detection –Many unknown flow-level anomalies revealed from real router traffic analysis (AT&T) –Profile traffic patterns of new applications (e.g. P2P) –> benign anomalies –Understand the cause, pattern and prevalence of other unknown anomalies –Identify malicious patterns for intrusion detection –E.g., fight against Sapphire/Slammer Worm

Backup Materials

SCAN Coherence for dynamic content Cooperative clustering-based replication X Scalable network monitoring O(M+N) s1, s4, s5

Problem Formulation Subject to certain total replication cost (e.g., # of URL replicas) Find a scalable, adaptive replication strategy to reduce avg access cost

Simulation Methodology Network Topology –Pure-random, Waxman & transit-stub synthetic topology –An AS-level topology from 7 widely-dispersed BGP peers Web Workload Web Site PeriodDuration# Requests avg –min-max # Clients avg –min-max # Client groups avg –min-max MSNBCAug-Oct/199910–11am1.5M–642K–1.7M129K–69K–150K15.6K-10K-17K NASAJul-Aug/1995All day79K-61K-101K –Aggregate MSNBC Web clients with BGP prefix »BGP tables from a BBNPlanet router –Aggregate NASA Web clients with domain names –Map the client groups onto the topology

Online Incremental Clustering Predict access patterns based on semantics Simplify to popularity prediction Groups of URLs with similar popularity? Use hyperlink structures! –Groups of siblings –Groups of the same hyperlink depth: smallest # of links from root

Challenges for CDN Over-provisioning for replication –Provide good QoS to clients (e.g., latency bound, coherence) –Small # of replicas with small delay and bandwidth consumption for update Replica Management –Scalability: billions of replicas if replicating in URL »O(10 4 ) URLs/server, O(10 5 ) CDN edge servers in O(10 3 ) networks –Adaptation to dynamics of content providers and customers Monitoring –User workload monitoring –End-to-end network distance/congestion/failures monitoring »Measurement scalability »Inference accuracy and stability

SCAN Architecture Leverage Decentralized Object Location and Routing (DOLR) - Tapestry for –Distributed, scalable location with guaranteed success –Search with locality Soft state maintenance of dissemination tree (for each object) data plane network plane data source Web server SCAN server client replica always update adaptive coherence cache Tapestry mesh Request Location Dynamic Replication/Update and Content Management

Cluster A Clients Cluster B Monitors Cluster C Distance measured from a host to its monitor Distance measured among monitors SCAN edge servers Wide-area Network Measurement and Monitoring System (WNMMS) Select a subset of SCAN servers to be monitors E2E estimation for Distance Congestion Failures network plane

Dynamic Provisioning Dynamic replica placement –Meeting clients’ latency and servers’ capacity constraints –Close-to-minimal # of replicas Self-organized replicas into app-level multicast tree –Small delay and bandwidth consumption for update multicast –Each node only maintains states for its parent & direct children Evaluated based on simulation of –Synthetic traces with various sensitivity analysis –Real traces from NASA and MSNBC Publication –IPTPS 2002 –Pervasive Computing 2002

Effects of the Non-Uniform Size of URLs Replication cost constraint : bytes Similar trends exist –Per URL replication outperforms per Website dramatically –Spatial clustering with Euclidean distance and popularity- based clustering are very cost-effective

End Host Cluster A Cluster B Cluster C Landmark Diagram of Internet Iso-bar

Cluster A End Host Cluster B Monitor Cluster C Distance probes from monitor to its hosts Distance probes among monitors Landmark Diagram of Internet Iso-bar

Real Internet Measurement Data NLANR Active Measurement Project data set –119 sites on US (106 after filtering out most offline sites) –Round-trip time (RTT) between every pair of hosts every minute –Raw data: 6/24/00 – 12/3/01 Keynote measurement data –Measure TCP performance from about 100 agents –Heterogeneous core network: various ISPs –Heterogeneous access network: »Dial up 56K, DSL and high-bandwidth business connections –Targets »Web site perspective: 40 most popular Web servers »27 Internet Data Centers (IDCs)

Related Work Internet content delivery systems –Web caching »Client-initiated »Server-initiated –Pull-based Content Delivery Networks (CDNs) –Push-based CDNs Update dissemination –IP multicast –Application-level multicast Network E2E Distance Monitoring Systems

Clien t Local DNS serverProxy cache server Web content server Client Local DNS server Proxy cache server 1.GET request 4. Response 2.GET request if cache miss 3. Response ISP 2 ISP 1 Web Proxy Caching

CDN name server Clien t Local DNS serverLocal CDN server 1. GET request 4. local CDN server IP address Web content server Client Local DNS server Local CDN server 2. Request for hostname resolution 3. Reply: local CDN server IP address 5.GET request 8. Response 6.GET request if cache miss 7. Response ISP 2 Pull-based CDN ISP 1

CDN name server Clien t Local DNS serverLocal CDN server 1. GET request 4. Redirected server IP address Web content server Client Local DNS server Local CDN server 2. Request for hostname resolution 3. Reply: nearby replica server or Web server IP address ISP 2 Push-based CDN ISP 1 0. Push replicas 5. GET request 6. Response 5.GET request if no replica yet

Internet Content Delivery Systems Scalability for request redirection Pre- configured in browser Use Bloom filter to exchange replica locations Centralized CDN name server Decentra- lized P2P location Properties Web caching (client initiated) Web caching (server initiated) Pull-based CDNs (Akamai) Push- based CDNs SCAN Efficiency (# of caches or replicas) No cache sharing among proxies Cache sharing No replica sharing among edge servers Replica sharing Network- awareness No Yes, unscalable monitoring system NoYes, scalable monitoring system Coherence support No YesNoYes

Previous Work: Update Dissemination No inter-domain IP multicast Application-level multicast (ALM) unscalable –Root maintains states for all children (Narada, Overcast, ALMI, RMX) –Root handles all “join” requests (Bayeux) –Root split is common solution, but suffers consistency overhead

Design Principles Scalability –No centralized point of control: P2P location services, Tapestry –Reduce management states: minimize # of replicas, object clustering –Distributed load balancing: capacity constraints Adaptation to clients’ dynamics –Dynamic distribution/deletion of replicas with regarding to clients’ QoS constraints –Incremental clustering Network-awareness and fault-tolerance (WNMMS) –Distance estimation: Internet Iso-bar –Anomaly detection and diagnostics

Comparison of Content Delivery Systems (cont’d) Properties Web caching (client initiated) Web caching (server initiated) Pull-based CDNs (Akamai) Push- based CDNs SCAN Distributed load balancing NoYes NoYes Dynamic replica placement Yes NoYes Network- awareness No Yes, unscalable monitoring system NoYes, scalable monitoring system No global network topology assumption Yes NoYes

Network-awareness (cont’d) Loss/congestion prediction –Maximize the true positive and minimize the false positive Orthogonal loss/congestion paths discovery –Without underlying topology –How stable is such orthogonality? »Degradation of orthogonality over time Reactive and proactive adaptation for SCAN