Towards a Scalable, Adaptive and Network-aware Content Distribution Network Yan Chen EECS Department UC Berkeley.

Towards a Scalable, Adaptive and Network-aware Content Distribution Network Yan Chen EECS Department UC Berkeley

Outline Motivation and Challenges Our Contributions: SCAN system Case Study: Tomography-based overlay network monitoring system Conclusions

Motivation The Internet has evolved to become a commercial infrastructure for service delivery –Web delivery, VoIP, streaming media … Challenges for Internet-scale services –Scalability: 600M users, 35M Web sites, 2.1Tb/s –Efficiency: bandwidth, storage, management –Agility: dynamic clients/network/servers –Security, etc. Focus on content delivery - Content Distribution Network (CDN) –Totally 4 Billion Web pages, daily growth of 7M pages –Annual traffic growth of 200% for next 4 years

How CDN Works

Challenges for CDN Replica Location –Find nearby replicas with good DoS attack resilience Replica Deployment –Dynamics, efficiency –Client QoS and server capacity constraints Replica Management –Replica index state maintenance scalability Adaptation to Network Congestion/Failures –Overlay monitoring scalability and accuracy

Provision: Dynamic Replication + Update Multicast Tree Building Replica Management: (Incremental) Content Clustering Network End-to-End Distance Monitoring Internet Iso-bar: latency TOM: loss rate Network DoS Resilient Replica Location: Tapestry SCAN: Scalable Content Access Network

Replica Location Existing Work and Problems –Centralized, Replicated and Distributed Directory Services –No security benchmarking, which one has the best DoS attack resilience? Solution –Proposed the first simulation-based network DoS resilience benchmark –Applied it to compare three directory services –DHT-based Distributed Directory Services has best resilience in practice Publication –3 rd Int. Conf. on Info. and Comm. Security (ICICS), 2001

Replica Placement/Maintenance Existing Work and Problems –Static placement –Dynamic but inefficient placement –No coherence support Solution –Dynamically place close to optimal # of replicas with clients QoS (latency) and servers capacity constraints –Self-organize replica into a scalable application-level multicast for disseminating updates –With overlay network topology only Publication –IPTPS 2002, Pervasive Computing 2002

Existing Work and Problems –Cooperative access for good efficiency requires maintaining replica indices –Per Website replication, scalable, but poor performance –Per URL replication, good performance, but unscalable Solution –Clustering-based replication reduces the overhead significantly without sacrificing much performance –Proposed a unique online Web object popularity prediction scheme based on hyperlink structures –Online incremental clustering and replication to push replicas before accessed Publication –ICNP 2002, IEEE J-SAC 2003 Replica Management

Adaptation to Network Congestion/Failures Existing Work and Problems –Latency estimation »Clustering-based: network proximity based, inaccurate »Coordinate-based: symmetric distance, unscalable to update –General metrics: n 2 measurement for n end hosts Solution –Latency: Internet Iso-bar - clustering based on latency similarity to a small number of landmarks –Loss rate: Tomography-based Overlay Monitoring (TOM) - selectively monitor a basis set of O(n logn) paths to infer the loss rates of other paths Publication –Internet Iso-bar: SIGMETRICS PER 2002 –TOM: SIGCOMM IMC 2003

SCAN Architecture Leverage Distributed Hash Table - Tapestry for –Distributed, scalable location with guaranteed success –Search with locality data plane network plane data source Web server SCAN server client replica always update cache Tapestry mesh Replica Location Dynamic Replication/Update and Replica Management adaptive coherence Overlay Network Monitoring

Methodology Network topology Web workload Network end-to-end latency measurement Analytical evaluation Algorithm design Realistic simulation iterate PlanetLab tests

Case Study: Tomography-based Overlay Network Monitoring

TOM Outline Goal and Problem Formulation Algebraic Modeling and Basic Algorithms Scalability Analysis Practical Issues Evaluation Application: Adaptive Overlay Streaming Media Conclusions

Existing Work General Metrics: RON (n 2 measurement) Latency Estimation –Clustering-based: IDMaps, Internet Isobar, etc. –Coordinate-based: GNP, ICS, Virtual Landmarks Network tomography –Focusing on inferring the characteristics of physical links rather than E2E paths –Limited measurements -> under-constrained system, unidentifiable links Goal: a scalable, adaptive and accurate overlay monitoring system to detect e2e congestion/failures

Problem Formulation Given an overlay of n end hosts and O(n 2 ) paths, how to select a minimal subset of paths to monitor so that the loss rates/latency of all other paths can be inferred. Assumptions: Topology measurable Can only measure the E2E path, not the link

Our Approach Select a basis set of k paths that fully describe O(n 2 ) paths (k «O(n 2 )) Monitor the loss rates of k paths, and infer the loss rates of all other paths Applicable for any additive metrics, like latency End hosts Overlay Network Operation Center topology measurements

Algebraic Model Path loss rate p, link loss rate l A D C B 1 2 3 p1p1

Putting All Paths Together Totally r = O(n 2 ) paths, s links, s «r A D C B 1 2 3 p1p1 … =

Sample Path Matrix x 1 - x 2 unknown => cannot compute x 1, x 2 Set of vectors form null space To separate identifiable vs. unidentifiable components: x = x G + x N A D C B 1 2 3 b1b1 b2b2 b3b3 (1,-1,0) x2x2 x1x1 x3x3 (1,1,0) path/row space (measured) null space (unmeasured)

Intuition through Topology Virtualization Virtual links: Minimal path segments whose loss rates uniquely identified Can fully describe all paths x G is composed of virtual links A D C B 1 2 3 b1b1 b2b2 b3b3 (1,-1,0) x2x2 x1x1 x3x3 (1,1,0) path/row space (measured) null space (unmeasured) 1 2 Virtualization Virtual links All E2E paths are in path space, i.e., Gx N = 0

More Examples Real links (solid) and all of the overlay paths (dotted) traversing them Virtualization Virtual links 1 23 1’2’ Rank(G)=2 1 2 1 2 3 1’ 2’ 4 Rank(G)=3 3’ 4’ 1 2 3

Basic Algorithms Select k = rank(G) linearly independent paths to monitor –Use QR decomposition –Leverage sparse matrix: time O(rk 2 ) and memory O(k 2 ) »E.g., 79 sec for n = 300 (r = 44850) and k = 2541 Compute the loss rates of other paths –Time O(k 2 ) and memory O(k 2 ) »E.g., 1.89 sec for the example above … = … =

Scalability Analysis k « O(n 2 ) ? For a power-law Internet topology When the majority of end hosts are on the overlay When a small portion of end hosts are on overlay –If Internet a pure hierarchical structure (tree): k = O(n) –If Internet no hierarchy at all (worst case, clique): k = O(n 2 ) –Internet has moderate hierarchical structure [TGJ+02] k = O(n) (with proof) For reasonably large n, (e.g., 100), k = O(nlogn) (extensive linear regression tests on both synthetic and real topologies)

TOM Outline Goal and Problem Formulation Algebraic Modeling and Basic Algorithms Scalability Analysis Practical Issues Evaluation Application: Adaptive Overlay Streaming Media Summary

Practical Issues Topology measurement errors tolerance –Router aliases –Incomplete routing info Measurement load balancing –Randomly order the paths for scan and selection of Adaptive to topology changes –Designed efficient algorithms for incrementally update –Add/remove a path: O(k 2 ) time (O(n 2 k 2 ) for reinitialize) –Add/remove end hosts and Routing changes

Path loss rate estimation accuracy –Absolute error |p – p’ | –Error factor [BDPT02] –Lossy path inference: coverage and false positive ratio Measurement load balancing –Coefficient of variation (CV) –Maximum vs. mean ratio (MMR) Speed of setup, update and adaptation Evaluation Metrics

Areas and Domains # of hosts US (40).edu33.org3.net2.gov1.us1 Interna- tional (11) Europe (6) France1 Sweden1 Denmark1 Germany1 UK2 Asia (2) Taiwan1 Hong Kong1 Canada2 Australia1 Evaluation Extensive Simulations Experiments on PlanetLab –51 hosts, each from different organizations –51 × 50 = 2,550 paths –On average k = 872 Results on Accuracy –Avg real loss rate: 0.023 –Absolute error mean: 0.0027 90% < 0.014 –Error factormean: 1.1 90% < 2.0 –On average 248 out of 2550 paths have no or incomplete routing information –No router aliases resolved

Evaluation (cont’d) Results on Speed –Path selection (setup): 0.75 sec –Path loss rate calculation: 0.16 sec for all 2550 paths Results on Load Balancing –Significantly reduce CV and MMR, up to a factor of 7.3 With load balancing Without load balancing

TOM Outline Goal and Problem Formulation Algebraic Modeling and Basic Algorithms Scalability Analysis Practical Issues Evaluation Application: Adaptive Overlay Streaming Media Conclusions

Motivation Traditional streaming media systems treat the network as a black box Adaptation only performed at the transmission end points Overlay relay can effectively bypass congestion/failures Built an adaptive streaming media system that leverages –TOM for real-time path info –An overlay network for adaptive packet buffering and relay

X UC Berkeley UC San Diego Stanford HP Labs Adaptive Overlay Streaming Media Implemented with Winamp client and SHOUTcast server Congestion introduced with a Packet Shaper Skip-free playback: server buffering and rewinding Total adaptation time < 4 seconds

Adaptive Streaming Media Architecture

Summary A tomography-based overlay network monitoring system –Selectively monitor a basis set of O(n logn) paths to infer the loss rates of O(n 2 ) paths –Works in real-time, adaptive to topology changes, has good load balancing and tolerates topology errors Both simulation and real Internet experiments promising Built adaptive overlay streaming media system on top of TOM –Bypass congestion/failures for smooth playback within seconds

Tie Back to SCAN Provision: Dynamic Replication + Update Multicast Tree Building Replica Management: (Incremental) Content Clustering Network End-to-End Distance Monitoring Internet Iso-bar: latency TOM: loss rate Network DoS Resilient Replica Location: Tapestry

Contribution of My Thesis Replica location –Proposed the first simulation-based network DoS resilience benchmark and quantify three types of directory services Dynamically place close to optimal # of replicas –Self-organize replicas into a scalable app-level multicast tree for disseminating updates Cluster objects to significantly reduce the management overhead with little performance sacrifice –Online incremental clustering and replication to adapt to users’ access pattern changes Scalable overlay network monitoring

Thank you !

Backup Materials

Existing CDNs Fail to Address these Challenges Non-cooperative replication inefficient No coherence for dynamic content Unscalable network monitoring - O(M × N) M: # of client groups, N: # of server farms X

Network Topology and Web Workload Network Topology –Pure-random, Waxman & transit-stub synthetic topology –An AS-level topology from 7 widely-dispersed BGP peers Web Workload Web Site PeriodDuration# Requests avg –min-max # Clients avg –min-max # Client groups avg –min-max MSNBCAug-Oct/199910–11am1.5M–642K–1.7M129K–69K–150K15.6K-10K-17K NASAJul-Aug/1995All day79K-61K-101K5940-4781-76712378-1784-3011 –Aggregate MSNBC Web clients with BGP prefix »BGP tables from a BBNPlanet router –Aggregate NASA Web clients with domain names –Map the client groups onto the topology

Network E2E Latency Measurement NLANR Active Measurement Project data set –111 sites on America, Asia, Australia and Europe –Round-trip time (RTT) between every pair of hosts every minute –17M daily measurement –Raw data: Jun. – Dec. 2001, Nov. 2002 Keynote measurement data –Measure TCP performance from about 100 worldwide agents –Heterogeneous core network: various ISPs –Heterogeneous access network: »Dial up 56K, DSL and high-bandwidth business connections –Targets »40 most popular Web servers + 27 Internet Data Centers –Raw data: Nov. – Dec. 2001, Mar. – May 2002

Properties Web caching (client initiated) Web caching (server initiated) Conventional CDNs (Akamai) SCAN Replica access Non- cooperative Cooperative (bloomfilter) Non- cooperative Cooperative Load balancing No Yes Pull/push PullPushPullPush Transparent to clients No Yes Coherence support No Yes Network- awareness No Yes, unscalable monitoring system Yes, scalable monitoring system Internet Content Delivery Systems

Absolute and Relative Errors For each experiment, get its 95 percentile absolute and relative errors for estimation of 2,550 paths

Lossy Path Inference Accuracy 90 out of 100 runs have coverage over 85% and false positive less than 10% Many caused by the 5% threshold boundary effects

Loss rate distribution Metrics –Absolute error |p – p’ |: »Average 0.0027 for all paths, 0.0058 for lossy paths –Relative error [BDPT02] –Lossy path inference: coverage and false positive ratio On average k = 872 out of 2550 loss rate [0, 0.05) lossy path [0.05, 1.0] (4.1%) [0.05, 0.1)[0.1, 0.3)[0.3, 0.5)[0.5, 1.0)1.0 %95.9%15.2%31.0%23.9%4.3%25.6% PlanetLab Experiment Results

Areas and Domains # of hosts US (40).edu33.org3.net2.gov1.us1 Interna- tional (11) Europe (6) France1 Sweden1 Denmark1 Germany1 UK2 Asia (2) Taiwan1 Hong Kong1 Canada2 Australia1 Experiments on Planet Lab 51 hosts, each from different organizations –51 × 50 = 2,550 paths Simultaneous loss rate measurement –300 trials, 300 msec each –In each trial, send a 40-byte UDP pkt to every other host Simultaneous topology measurement –Traceroute Experiments: 6/24 – 6/27 –100 experiments in peak hours

Motivation With single node relay Loss rate improvement –Among 10,980 lossy paths: –5,705 paths (52.0%) have loss rate reduced by 0.05 or more –3,084 paths (28.1%) change from lossy to non-lossy Throughput improvement –Estimated with –60,320 paths (24%) with non-zero loss rate, throughput computable –Among them, 32,939 (54.6%) paths have throughput improved, 13,734 (22.8%) paths have throughput doubled or more Implications: use overlay path to bypass congestion or failures

SCAN Coherence for dynamic content Cooperative clustering-based replication X Scalable network monitoring O(M+N) s1, s4, s5

Problem Formulation Subject to certain total replication cost (e.g., # of URL replicas) Find a scalable, adaptive replication strategy to reduce avg access cost

CDN Applications (e.g. streaming media) SCAN: Scalable Content Access Network Provision: Cooperative Clustering-based Replication User Behavior/ Workload Monitoring Coherence: Update Multicast Tree Construction Network Performance Monitoring Network Distance/ Congestion/ Failure Estimation red: my work, black: out of scope

Evaluation of Internet-scale System Analytical evaluation Realistic simulation –Network topology –Web workload –Network end-to-end latency measurement Network topology –Pure-random, Waxman & transit-stub synthetic topology –A real AS-level topology from 7 widely-dispersed BGP peers

Web Workload Web Site PeriodDuration# Requests avg –min-max # Clients avg –min-max # Client groups avg –min-max MSNBCAug-Oct/199910–11am1.5M–642K–1.7M129K–69K–150K15.6K-10K-17K NASAJul-Aug/1995All day79K-61K-101K5940-4781-76712378-1784-3011 World Cup May-Jul/1998All day29M – 1M – 73M103K–13K–218KN/A Aggregate MSNBC Web clients with BGP prefix –BGP tables from a BBNPlanet router Aggregate NASA Web clients with domain names Map the client groups onto the topology

Simulation Methodology Network Topology –Pure-random, Waxman & transit-stub synthetic topology –An AS-level topology from 7 widely-dispersed BGP peers Web Workload Web Site PeriodDuration# Requests avg –min-max # Clients avg –min-max # Client groups avg –min-max MSNBCAug-Oct/199910–11am1.5M–642K–1.7M129K–69K–150K15.6K-10K-17K NASAJul-Aug/1995All day79K-61K-101K5940-4781-76712378-1784-3011 –Aggregate MSNBC Web clients with BGP prefix »BGP tables from a BBNPlanet router –Aggregate NASA Web clients with domain names –Map the client groups onto the topology

Online Incremental Clustering Predict access patterns based on semantics Simplify to popularity prediction Groups of URLs with similar popularity? Use hyperlink structures! –Groups of siblings –Groups of the same hyperlink depth: smallest # of links from root

Challenges for CDN Over-provisioning for replication –Provide good QoS to clients (e.g., latency bound, coherence) –Small # of replicas with small delay and bandwidth consumption for update Replica Management –Scalability: billions of replicas if replicating in URL »O(10 4 ) URLs/server, O(10 5 ) CDN edge servers in O(10 3 ) networks –Adaptation to dynamics of content providers and customers Monitoring –User workload monitoring –End-to-end network distance/congestion/failures monitoring »Measurement scalability »Inference accuracy and stability

SCAN Architecture Leverage Decentralized Object Location and Routing (DOLR) - Tapestry for –Distributed, scalable location with guaranteed success –Search with locality Soft state maintenance of dissemination tree (for each object) data plane network plane data source Web server SCAN server client replica always update adaptive coherence cache Tapestry mesh Request Location Dynamic Replication/Update and Content Management

Cluster A Clients Cluster B Monitors Cluster C Distance measured from a host to its monitor Distance measured among monitors SCAN edge servers Wide-area Network Measurement and Monitoring System (WNMMS) Select a subset of SCAN servers to be monitors E2E estimation for Distance Congestion Failures network plane

Dynamic Provisioning Dynamic replica placement –Meeting clients’ latency and servers’ capacity constraints –Close-to-minimal # of replicas Self-organized replicas into app-level multicast tree –Small delay and bandwidth consumption for update multicast –Each node only maintains states for its parent & direct children Evaluated based on simulation of –Synthetic traces with various sensitivity analysis –Real traces from NASA and MSNBC Publication –IPTPS 2002 –Pervasive Computing 2002

Effects of the Non-Uniform Size of URLs Replication cost constraint : bytes Similar trends exist –Per URL replication outperforms per Website dramatically –Spatial clustering with Euclidean distance and popularity- based clustering are very cost-effective 1 2 3 4

Provisioning (replica placement) Network Monitoring Coherence Support Ad hoc pair-wise monitoring O(M×N) Tomography -based monitoring O(M+N) IP multicast App-level multicast on P2P DHT Unicast SCAN: Scalable Content Access Network Granularity SCANPush Existing CDNsPull CooperativeNon-cooperative Per object Per Website Per cluster Access/Deployment Mechanisms

Clien t Local DNS serverProxy cache server Web content server Client Local DNS server Proxy cache server 1.GET request 4. Response 2.GET request if cache miss 3. Response ISP 2 ISP 1 Web Proxy Caching

CDN name server Client 1 Local DNS serverLocal CDN server 1. GET request 4. local CDN server IP address Web content server Client 2 Local DNS server Local CDN server 2. Request for hostname resolution 3. Reply: local CDN server IP address 5.GET request 8. Response 6.GET request if cache miss ISP 2 ISP 1 Conventional CDN: Non-cooperative Pull 7. Response Inefficient replication

CDN name server Client 1 Local DNS serverLocal CDN server 1. GET request 4. Redirected server IP address Web content server Client 2 Local DNS server Local CDN server 2. Request for hostname resolution 3. Reply: nearby replica server or Web server IP address ISP 2 ISP 1 5. GET request 6. Response 5.GET request if no replica yet SCAN: Cooperative Push 0. Push replicas Significantly reduce the # of replicas and update cost

Internet Content Delivery Systems Scalability for request redirection Pre- configured in browser Use Bloom filter to exchange replica locations Centralized CDN name server Decentra- lized P2P location Properties Web caching (client initiated) Web caching (server initiated) Pull-based CDNs (Akamai) Push- based CDNs SCAN Efficiency (# of caches or replicas) No cache sharing among proxies Cache sharing No replica sharing among edge servers Replica sharing Network- awareness No Yes, unscalable monitoring system NoYes, scalable monitoring system Coherence support No YesNoYes

Previous Work: Update Dissemination No inter-domain IP multicast Application-level multicast (ALM) unscalable –Root maintains states for all children (Narada, Overcast, ALMI, RMX) –Root handles all “join” requests (Bayeux) –Root split is common solution, but suffers consistency overhead

Comparison of Content Delivery Systems (cont’d) Properties Web caching (client initiated) Web caching (server initiated) Pull-based CDNs (Akamai) Push- based CDNs SCAN Distributed load balancing NoYes NoYes Dynamic replica placement Yes NoYes Network- awareness No Yes, unscalable monitoring system NoYes, scalable monitoring system No global network topology assumption Yes NoYes

Towards a Scalable, Adaptive and Network-aware Content Distribution Network Yan Chen EECS Department UC Berkeley.

Similar presentations

Presentation on theme: "Towards a Scalable, Adaptive and Network-aware Content Distribution Network Yan Chen EECS Department UC Berkeley."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Towards a Scalable, Adaptive and Network-aware Content Distribution Network Yan Chen EECS Department UC Berkeley.

Similar presentations

Presentation on theme: "Towards a Scalable, Adaptive and Network-aware Content Distribution Network Yan Chen EECS Department UC Berkeley."— Presentation transcript:

Similar presentations

About project

Feedback