Presentation is loading. Please wait.

Presentation is loading. Please wait.

Towards a Scalable, Adaptive and Network-aware Content Distribution Network Yan Chen EECS Department UC Berkeley.

Similar presentations


Presentation on theme: "Towards a Scalable, Adaptive and Network-aware Content Distribution Network Yan Chen EECS Department UC Berkeley."— Presentation transcript:

1 Towards a Scalable, Adaptive and Network-aware Content Distribution Network Yan Chen EECS Department UC Berkeley

2 Outline Motivation and Challenges Our Contributions: SCAN system Case Study: Tomography-based overlay network monitoring system Conclusions

3 Motivation The Internet has evolved to become a commercial infrastructure for service delivery –Web delivery, VoIP, streaming media … Challenges for Internet-scale services –Scalability: 600M users, 35M Web sites, 2.1Tb/s –Efficiency: bandwidth, storage, management –Agility: dynamic clients/network/servers –Security, etc. Focus on content delivery - Content Distribution Network (CDN) –Totally 4 Billion Web pages, daily growth of 7M pages –Annual traffic growth of 200% for next 4 years

4 How CDN Works

5 Challenges for CDN Replica Location –Find nearby replicas with good DoS attack resilience Replica Deployment –Dynamics, efficiency –Client QoS and server capacity constraints Replica Management –Replica index state maintenance scalability Adaptation to Network Congestion/Failures –Overlay monitoring scalability and accuracy

6 Provision: Dynamic Replication + Update Multicast Tree Building Replica Management: (Incremental) Content Clustering Network End-to-End Distance Monitoring Internet Iso-bar: latency TOM: loss rate Network DoS Resilient Replica Location: Tapestry SCAN: Scalable Content Access Network

7 Replica Location Existing Work and Problems –Centralized, Replicated and Distributed Directory Services –No security benchmarking, which one has the best DoS attack resilience? Solution –Proposed the first simulation-based network DoS resilience benchmark –Applied it to compare three directory services –DHT-based Distributed Directory Services has best resilience in practice Publication –3 rd Int. Conf. on Info. and Comm. Security (ICICS), 2001

8 Replica Placement/Maintenance Existing Work and Problems –Static placement –Dynamic but inefficient placement –No coherence support Solution –Dynamically place close to optimal # of replicas with clients QoS (latency) and servers capacity constraints –Self-organize replica into a scalable application-level multicast for disseminating updates –With overlay network topology only Publication –IPTPS 2002, Pervasive Computing 2002

9 Existing Work and Problems –Cooperative access for good efficiency requires maintaining replica indices –Per Website replication, scalable, but poor performance –Per URL replication, good performance, but unscalable Solution –Clustering-based replication reduces the overhead significantly without sacrificing much performance –Proposed a unique online Web object popularity prediction scheme based on hyperlink structures –Online incremental clustering and replication to push replicas before accessed Publication –ICNP 2002, IEEE J-SAC 2003 Replica Management

10 Adaptation to Network Congestion/Failures Existing Work and Problems –Latency estimation »Clustering-based: network proximity based, inaccurate »Coordinate-based: symmetric distance, unscalable to update –General metrics: n 2 measurement for n end hosts Solution –Latency: Internet Iso-bar - clustering based on latency similarity to a small number of landmarks –Loss rate: Tomography-based Overlay Monitoring (TOM) - selectively monitor a basis set of O(n logn) paths to infer the loss rates of other paths Publication –Internet Iso-bar: SIGMETRICS PER 2002 –TOM: SIGCOMM IMC 2003

11 SCAN Architecture Leverage Distributed Hash Table - Tapestry for –Distributed, scalable location with guaranteed success –Search with locality data plane network plane data source Web server SCAN server client replica always update cache Tapestry mesh Replica Location Dynamic Replication/Update and Replica Management adaptive coherence Overlay Network Monitoring

12 Methodology Network topology Web workload Network end-to-end latency measurement Analytical evaluation Algorithm design Realistic simulation iterate PlanetLab tests

13 Case Study: Tomography-based Overlay Network Monitoring

14 TOM Outline Goal and Problem Formulation Algebraic Modeling and Basic Algorithms Scalability Analysis Practical Issues Evaluation Application: Adaptive Overlay Streaming Media Conclusions

15 Existing Work General Metrics: RON (n 2 measurement) Latency Estimation –Clustering-based: IDMaps, Internet Isobar, etc. –Coordinate-based: GNP, ICS, Virtual Landmarks Network tomography –Focusing on inferring the characteristics of physical links rather than E2E paths –Limited measurements -> under-constrained system, unidentifiable links Goal: a scalable, adaptive and accurate overlay monitoring system to detect e2e congestion/failures

16 Problem Formulation Given an overlay of n end hosts and O(n 2 ) paths, how to select a minimal subset of paths to monitor so that the loss rates/latency of all other paths can be inferred. Assumptions: Topology measurable Can only measure the E2E path, not the link

17 Our Approach Select a basis set of k paths that fully describe O(n 2 ) paths (k «O(n 2 )) Monitor the loss rates of k paths, and infer the loss rates of all other paths Applicable for any additive metrics, like latency End hosts Overlay Network Operation Center topology measurements

18 Algebraic Model Path loss rate p, link loss rate l A D C B 1 2 3 p1p1

19 Putting All Paths Together Totally r = O(n 2 ) paths, s links, s «r A D C B 1 2 3 p1p1 … =

20 Sample Path Matrix x 1 - x 2 unknown => cannot compute x 1, x 2 Set of vectors form null space To separate identifiable vs. unidentifiable components: x = x G + x N A D C B 1 2 3 b1b1 b2b2 b3b3 (1,-1,0) x2x2 x1x1 x3x3 (1,1,0) path/row space (measured) null space (unmeasured)

21 Intuition through Topology Virtualization Virtual links: Minimal path segments whose loss rates uniquely identified Can fully describe all paths x G is composed of virtual links A D C B 1 2 3 b1b1 b2b2 b3b3 (1,-1,0) x2x2 x1x1 x3x3 (1,1,0) path/row space (measured) null space (unmeasured) 1 2 Virtualization Virtual links All E2E paths are in path space, i.e., Gx N = 0

22 More Examples Real links (solid) and all of the overlay paths (dotted) traversing them Virtualization Virtual links 1 23 1’2’ Rank(G)=2 1 2 1 2 3 1’ 2’ 4 Rank(G)=3 3’ 4’ 1 2 3

23 Basic Algorithms Select k = rank(G) linearly independent paths to monitor –Use QR decomposition –Leverage sparse matrix: time O(rk 2 ) and memory O(k 2 ) »E.g., 79 sec for n = 300 (r = 44850) and k = 2541 Compute the loss rates of other paths –Time O(k 2 ) and memory O(k 2 ) »E.g., 1.89 sec for the example above … = … =

24 Scalability Analysis k « O(n 2 ) ? For a power-law Internet topology When the majority of end hosts are on the overlay When a small portion of end hosts are on overlay –If Internet a pure hierarchical structure (tree): k = O(n) –If Internet no hierarchy at all (worst case, clique): k = O(n 2 ) –Internet has moderate hierarchical structure [TGJ+02] k = O(n) (with proof) For reasonably large n, (e.g., 100), k = O(nlogn) (extensive linear regression tests on both synthetic and real topologies)

25 TOM Outline Goal and Problem Formulation Algebraic Modeling and Basic Algorithms Scalability Analysis Practical Issues Evaluation Application: Adaptive Overlay Streaming Media Summary

26 Practical Issues Topology measurement errors tolerance –Router aliases –Incomplete routing info Measurement load balancing –Randomly order the paths for scan and selection of Adaptive to topology changes –Designed efficient algorithms for incrementally update –Add/remove a path: O(k 2 ) time (O(n 2 k 2 ) for reinitialize) –Add/remove end hosts and Routing changes

27 Path loss rate estimation accuracy –Absolute error |p – p’ | –Error factor [BDPT02] –Lossy path inference: coverage and false positive ratio Measurement load balancing –Coefficient of variation (CV) –Maximum vs. mean ratio (MMR) Speed of setup, update and adaptation Evaluation Metrics

28 Areas and Domains # of hosts US (40).edu33.org3.net2.gov1.us1 Interna- tional (11) Europe (6) France1 Sweden1 Denmark1 Germany1 UK2 Asia (2) Taiwan1 Hong Kong1 Canada2 Australia1 Evaluation Extensive Simulations Experiments on PlanetLab –51 hosts, each from different organizations –51 × 50 = 2,550 paths –On average k = 872 Results on Accuracy –Avg real loss rate: 0.023 –Absolute error mean: 0.0027 90% < 0.014 –Error factormean: 1.1 90% < 2.0 –On average 248 out of 2550 paths have no or incomplete routing information –No router aliases resolved

29 Evaluation (cont’d) Results on Speed –Path selection (setup): 0.75 sec –Path loss rate calculation: 0.16 sec for all 2550 paths Results on Load Balancing –Significantly reduce CV and MMR, up to a factor of 7.3 With load balancing Without load balancing

30 TOM Outline Goal and Problem Formulation Algebraic Modeling and Basic Algorithms Scalability Analysis Practical Issues Evaluation Application: Adaptive Overlay Streaming Media Conclusions

31 Motivation Traditional streaming media systems treat the network as a black box Adaptation only performed at the transmission end points Overlay relay can effectively bypass congestion/failures Built an adaptive streaming media system that leverages –TOM for real-time path info –An overlay network for adaptive packet buffering and relay

32 X UC Berkeley UC San Diego Stanford HP Labs Adaptive Overlay Streaming Media Implemented with Winamp client and SHOUTcast server Congestion introduced with a Packet Shaper Skip-free playback: server buffering and rewinding Total adaptation time < 4 seconds

33 Adaptive Streaming Media Architecture

34 Summary A tomography-based overlay network monitoring system –Selectively monitor a basis set of O(n logn) paths to infer the loss rates of O(n 2 ) paths –Works in real-time, adaptive to topology changes, has good load balancing and tolerates topology errors Both simulation and real Internet experiments promising Built adaptive overlay streaming media system on top of TOM –Bypass congestion/failures for smooth playback within seconds

35 Tie Back to SCAN Provision: Dynamic Replication + Update Multicast Tree Building Replica Management: (Incremental) Content Clustering Network End-to-End Distance Monitoring Internet Iso-bar: latency TOM: loss rate Network DoS Resilient Replica Location: Tapestry

36 Contribution of My Thesis Replica location –Proposed the first simulation-based network DoS resilience benchmark and quantify three types of directory services Dynamically place close to optimal # of replicas –Self-organize replicas into a scalable app-level multicast tree for disseminating updates Cluster objects to significantly reduce the management overhead with little performance sacrifice –Online incremental clustering and replication to adapt to users’ access pattern changes Scalable overlay network monitoring

37 Thank you !

38 Backup Materials

39 Existing CDNs Fail to Address these Challenges Non-cooperative replication inefficient No coherence for dynamic content Unscalable network monitoring - O(M × N) M: # of client groups, N: # of server farms X

40 Network Topology and Web Workload Network Topology –Pure-random, Waxman & transit-stub synthetic topology –An AS-level topology from 7 widely-dispersed BGP peers Web Workload Web Site PeriodDuration# Requests avg –min-max # Clients avg –min-max # Client groups avg –min-max MSNBCAug-Oct/199910–11am1.5M–642K–1.7M129K–69K–150K15.6K-10K-17K NASAJul-Aug/1995All day79K-61K-101K5940-4781-76712378-1784-3011 –Aggregate MSNBC Web clients with BGP prefix »BGP tables from a BBNPlanet router –Aggregate NASA Web clients with domain names –Map the client groups onto the topology

41 Network E2E Latency Measurement NLANR Active Measurement Project data set –111 sites on America, Asia, Australia and Europe –Round-trip time (RTT) between every pair of hosts every minute –17M daily measurement –Raw data: Jun. – Dec. 2001, Nov. 2002 Keynote measurement data –Measure TCP performance from about 100 worldwide agents –Heterogeneous core network: various ISPs –Heterogeneous access network: »Dial up 56K, DSL and high-bandwidth business connections –Targets »40 most popular Web servers + 27 Internet Data Centers –Raw data: Nov. – Dec. 2001, Mar. – May 2002

42 Properties Web caching (client initiated) Web caching (server initiated) Conventional CDNs (Akamai) SCAN Replica access Non- cooperative Cooperative (bloomfilter) Non- cooperative Cooperative Load balancing No Yes Pull/push PullPushPullPush Transparent to clients No Yes Coherence support No Yes Network- awareness No Yes, unscalable monitoring system Yes, scalable monitoring system Internet Content Delivery Systems

43 Absolute and Relative Errors For each experiment, get its 95 percentile absolute and relative errors for estimation of 2,550 paths

44 Lossy Path Inference Accuracy 90 out of 100 runs have coverage over 85% and false positive less than 10% Many caused by the 5% threshold boundary effects

45 Loss rate distribution Metrics –Absolute error |p – p’ |: »Average 0.0027 for all paths, 0.0058 for lossy paths –Relative error [BDPT02] –Lossy path inference: coverage and false positive ratio On average k = 872 out of 2550 loss rate [0, 0.05) lossy path [0.05, 1.0] (4.1%) [0.05, 0.1)[0.1, 0.3)[0.3, 0.5)[0.5, 1.0)1.0 %95.9%15.2%31.0%23.9%4.3%25.6% PlanetLab Experiment Results

46 Areas and Domains # of hosts US (40).edu33.org3.net2.gov1.us1 Interna- tional (11) Europe (6) France1 Sweden1 Denmark1 Germany1 UK2 Asia (2) Taiwan1 Hong Kong1 Canada2 Australia1 Experiments on Planet Lab 51 hosts, each from different organizations –51 × 50 = 2,550 paths Simultaneous loss rate measurement –300 trials, 300 msec each –In each trial, send a 40-byte UDP pkt to every other host Simultaneous topology measurement –Traceroute Experiments: 6/24 – 6/27 –100 experiments in peak hours

47 Motivation With single node relay Loss rate improvement –Among 10,980 lossy paths: –5,705 paths (52.0%) have loss rate reduced by 0.05 or more –3,084 paths (28.1%) change from lossy to non-lossy Throughput improvement –Estimated with –60,320 paths (24%) with non-zero loss rate, throughput computable –Among them, 32,939 (54.6%) paths have throughput improved, 13,734 (22.8%) paths have throughput doubled or more Implications: use overlay path to bypass congestion or failures

48 SCAN Coherence for dynamic content Cooperative clustering-based replication X Scalable network monitoring O(M+N) s1, s4, s5

49 Problem Formulation Subject to certain total replication cost (e.g., # of URL replicas) Find a scalable, adaptive replication strategy to reduce avg access cost

50 CDN Applications (e.g. streaming media) SCAN: Scalable Content Access Network Provision: Cooperative Clustering-based Replication User Behavior/ Workload Monitoring Coherence: Update Multicast Tree Construction Network Performance Monitoring Network Distance/ Congestion/ Failure Estimation red: my work, black: out of scope

51 Evaluation of Internet-scale System Analytical evaluation Realistic simulation –Network topology –Web workload –Network end-to-end latency measurement Network topology –Pure-random, Waxman & transit-stub synthetic topology –A real AS-level topology from 7 widely-dispersed BGP peers

52 Web Workload Web Site PeriodDuration# Requests avg –min-max # Clients avg –min-max # Client groups avg –min-max MSNBCAug-Oct/199910–11am1.5M–642K–1.7M129K–69K–150K15.6K-10K-17K NASAJul-Aug/1995All day79K-61K-101K5940-4781-76712378-1784-3011 World Cup May-Jul/1998All day29M – 1M – 73M103K–13K–218KN/A Aggregate MSNBC Web clients with BGP prefix –BGP tables from a BBNPlanet router Aggregate NASA Web clients with domain names Map the client groups onto the topology

53 Simulation Methodology Network Topology –Pure-random, Waxman & transit-stub synthetic topology –An AS-level topology from 7 widely-dispersed BGP peers Web Workload Web Site PeriodDuration# Requests avg –min-max # Clients avg –min-max # Client groups avg –min-max MSNBCAug-Oct/199910–11am1.5M–642K–1.7M129K–69K–150K15.6K-10K-17K NASAJul-Aug/1995All day79K-61K-101K5940-4781-76712378-1784-3011 –Aggregate MSNBC Web clients with BGP prefix »BGP tables from a BBNPlanet router –Aggregate NASA Web clients with domain names –Map the client groups onto the topology

54 Online Incremental Clustering Predict access patterns based on semantics Simplify to popularity prediction Groups of URLs with similar popularity? Use hyperlink structures! –Groups of siblings –Groups of the same hyperlink depth: smallest # of links from root

55 Challenges for CDN Over-provisioning for replication –Provide good QoS to clients (e.g., latency bound, coherence) –Small # of replicas with small delay and bandwidth consumption for update Replica Management –Scalability: billions of replicas if replicating in URL »O(10 4 ) URLs/server, O(10 5 ) CDN edge servers in O(10 3 ) networks –Adaptation to dynamics of content providers and customers Monitoring –User workload monitoring –End-to-end network distance/congestion/failures monitoring »Measurement scalability »Inference accuracy and stability

56 SCAN Architecture Leverage Decentralized Object Location and Routing (DOLR) - Tapestry for –Distributed, scalable location with guaranteed success –Search with locality Soft state maintenance of dissemination tree (for each object) data plane network plane data source Web server SCAN server client replica always update adaptive coherence cache Tapestry mesh Request Location Dynamic Replication/Update and Content Management

57 Cluster A Clients Cluster B Monitors Cluster C Distance measured from a host to its monitor Distance measured among monitors SCAN edge servers Wide-area Network Measurement and Monitoring System (WNMMS) Select a subset of SCAN servers to be monitors E2E estimation for Distance Congestion Failures network plane

58 Dynamic Provisioning Dynamic replica placement –Meeting clients’ latency and servers’ capacity constraints –Close-to-minimal # of replicas Self-organized replicas into app-level multicast tree –Small delay and bandwidth consumption for update multicast –Each node only maintains states for its parent & direct children Evaluated based on simulation of –Synthetic traces with various sensitivity analysis –Real traces from NASA and MSNBC Publication –IPTPS 2002 –Pervasive Computing 2002

59 Effects of the Non-Uniform Size of URLs Replication cost constraint : bytes Similar trends exist –Per URL replication outperforms per Website dramatically –Spatial clustering with Euclidean distance and popularity- based clustering are very cost-effective 1 2 3 4

60 Provisioning (replica placement) Network Monitoring Coherence Support Ad hoc pair-wise monitoring O(M×N) Tomography -based monitoring O(M+N) IP multicast App-level multicast on P2P DHT Unicast SCAN: Scalable Content Access Network Granularity SCANPush Existing CDNsPull CooperativeNon-cooperative Per object Per Website Per cluster Access/Deployment Mechanisms

61 Clien t Local DNS serverProxy cache server Web content server Client Local DNS server Proxy cache server 1.GET request 4. Response 2.GET request if cache miss 3. Response ISP 2 ISP 1 Web Proxy Caching

62 CDN name server Client 1 Local DNS serverLocal CDN server 1. GET request 4. local CDN server IP address Web content server Client 2 Local DNS server Local CDN server 2. Request for hostname resolution 3. Reply: local CDN server IP address 5.GET request 8. Response 6.GET request if cache miss ISP 2 ISP 1 Conventional CDN: Non-cooperative Pull 7. Response Inefficient replication

63 CDN name server Client 1 Local DNS serverLocal CDN server 1. GET request 4. Redirected server IP address Web content server Client 2 Local DNS server Local CDN server 2. Request for hostname resolution 3. Reply: nearby replica server or Web server IP address ISP 2 ISP 1 5. GET request 6. Response 5.GET request if no replica yet SCAN: Cooperative Push 0. Push replicas Significantly reduce the # of replicas and update cost

64 Internet Content Delivery Systems Scalability for request redirection Pre- configured in browser Use Bloom filter to exchange replica locations Centralized CDN name server Decentra- lized P2P location Properties Web caching (client initiated) Web caching (server initiated) Pull-based CDNs (Akamai) Push- based CDNs SCAN Efficiency (# of caches or replicas) No cache sharing among proxies Cache sharing No replica sharing among edge servers Replica sharing Network- awareness No Yes, unscalable monitoring system NoYes, scalable monitoring system Coherence support No YesNoYes

65 Previous Work: Update Dissemination No inter-domain IP multicast Application-level multicast (ALM) unscalable –Root maintains states for all children (Narada, Overcast, ALMI, RMX) –Root handles all “join” requests (Bayeux) –Root split is common solution, but suffers consistency overhead

66 Comparison of Content Delivery Systems (cont’d) Properties Web caching (client initiated) Web caching (server initiated) Pull-based CDNs (Akamai) Push- based CDNs SCAN Distributed load balancing NoYes NoYes Dynamic replica placement Yes NoYes Network- awareness No Yes, unscalable monitoring system NoYes, scalable monitoring system No global network topology assumption Yes NoYes


Download ppt "Towards a Scalable, Adaptive and Network-aware Content Distribution Network Yan Chen EECS Department UC Berkeley."

Similar presentations


Ads by Google