CDNs Content Outsourcing via Generalized Communities Dimitrios Katsaros, Ph.D. Heraklion, March 20 th, Dept. of Computer & Communication Engineering,

Slides:



Advertisements
Similar presentations
Supporting Cooperative Caching in Disruption Tolerant Networks
Advertisements

Minimum Energy Mobile Wireless Networks IEEE JSAC 2001/10/18.
IKI 10100: Data Structures & Algorithms Ruli Manurung (acknowledgments to Denny & Ade Azurat) 1 Fasilkom UI Ruli Manurung (Fasilkom UI)IKI10100: Lecture10.
Online Social Networks and Media. Graph partitioning The general problem – Input: a graph G=(V,E) edge (u,v) denotes similarity between u and v weighted.
Generated Waypoint Efficiency: The efficiency considered here is defined as follows: As can be seen from the graph, for the obstruction radius values (200,
Democratizing Content Publication with Coral Mike Freedman Eric Freudenthal David Mazières New York University NSDI 2004.
Topology Generation Suat Mercan. 2 Outline Motivation Topology Characterization Levels of Topology Modeling Techniques Types of Topology Generators.
Small-world Overlay P2P Network
The Cache Location Problem IEEE/ACM Transactions on Networking, Vol. 8, No. 5, October 2000 P. Krishnan, Danny Raz, Member, IEEE, and Yuval Shavitt, Member,
A Taxonomy and Survey of Content Delivery Networks Meng-Huan Wu 2011/10/26 1.
SCAN: A Dynamic, Scalable, and Efficient Content Distribution Network Yan Chen, Randy H. Katz, John D. Kubiatowicz {yanchen, randy,
P2p, Spring 05 1 Topics in Database Systems: Data Management in Peer-to-Peer Systems March 29, 2005.
Web Caching Schemes1 A Survey of Web Caching Schemes for the Internet Jia Wang.
Beneficial Caching in Mobile Ad Hoc Networks Bin Tang, Samir Das, Himanshu Gupta Computer Science Department Stony Brook University.
Social Network Analysis & Network Optimization Dimitrios Katsaros, Ph.D. Koblenz, February 18 th, Dept. of Computer & Communication Engineering,
Peer-to-Peer Based Multimedia Distribution Service Zhe Xiang, Qian Zhang, Wenwu Zhu, Zhensheng Zhang IEEE Transactions on Multimedia, Vol. 6, No. 2, April.
ICNP'061 Benefit-based Data Caching in Ad Hoc Networks Bin Tang, Himanshu Gupta and Samir Das Computer Science Department Stony Brook University.
Distributed Cluster Repair for OceanStore Irena Nadjakova and Arindam Chakrabarti Acknowledgements: Hakim Weatherspoon John Kubiatowicz.
Exploiting Content Localities for Efficient Search in P2P Systems Lei Guo 1 Song Jiang 2 Li Xiao 3 and Xiaodong Zhang 1 1 College of William and Mary,
ICNP'061 Benefit-based Data Caching in Ad Hoc Networks Bin Tang, Himanshu Gupta and Samir Das Department of Computer Science Stony Brook University.
March 15, rd Latin American Web Congress (LA-WEB 2005) 1 George Pallis Athena Vakali Konstantinos Stamos Antonis Sidiropoulos Dimitrios Katsaros.
1 Drafting Behind Akamai (Travelocity-Based Detouring) AoJan Su, David R. Choffnes, Aleksandar Kuzmanovic, and Fabian E. Bustamante Department of Electrical.
Web Caching Robert Grimm New York University. Before We Get Started  Illustrating Results  Type Theory 101.
March 15, nd International Workshop on Challenges in Web Information Retrieval and Integration (WIRI) 1 George Pallis Konstantinos Stamos Athena.
Mario Čagalj supervised by prof. Jean-Pierre Hubaux (EPFL-DSC-ICA) and prof. Christian Enz (EPFL-DE-LEG, CSEM) Wireless Sensor Networks:
Web Caching Schemes For The Internet – cont. By Jia Wang.
Web Caching and CDNs March 3, Content Distribution Motivation –Network path from server to client is slow/congested –Web server is overloaded Web.
Overview of Web Data Mining and Applications Part I
Content Distribution March 8, : Application Layer1.
By Ravi Shankar Dubasi Sivani Kavuri A Popularity-Based Prediction Model for Web Prefetching.
1 Caching in Wireless Multimedia Sensor Dept. of Computer & Communication Engineering, University of Dept. of Informatics, Aristotle.
SCAN: a Scalable, Adaptive, Secure and Network-aware Content Distribution Network Yan Chen CS Department Northwestern University.
Capacity Scaling with Multiple Radios and Multiple Channels in Wireless Mesh Networks Oguz GOKER.
Network Aware Resource Allocation in Distributed Clouds.
Mehdi Kargar Aijun An York University, Toronto, Canada Keyword Search in Graphs: Finding r-cliques.
MapReduce and Graph Data Chapter 5 Based on slides from Jimmy Lin’s lecture slides ( (licensed.
Web Cache Replacement Policies: Properties, Limitations and Implications Fabrício Benevenuto, Fernando Duarte, Virgílio Almeida, Jussara Almeida Computer.
Using the Small-World Model to Improve Freenet Performance Hui Zhang Ashish Goel Ramesh Govindan USC.
1 On the Placement of Web Server Replicas Lili Qiu, Microsoft Research Venkata N. Padmanabhan, Microsoft Research Geoffrey M. Voelker, UCSD IEEE INFOCOM’2001,
An Efficient Approach for Content Delivery in Overlay Networks Mohammad Malli Chadi Barakat, Walid Dabbous Planete Project To appear in proceedings of.
Full-Text Search in P2P Networks Christof Leng Databases and Distributed Systems Group TU Darmstadt.
Glasgow 02/02/04 NN k networks for content-based image retrieval Daniel Heesch.
COM1721: Freshman Honors Seminar A Random Walk Through Computing Lecture 2: Structure of the Web October 1, 2002.
Aditya Akella The Performance Benefits of Multihoming Aditya Akella CMU With Bruce Maggs, Srini Seshan, Anees Shaikh and Ramesh Sitaraman.
1 On the Placement of Web Server Replicas Lili Qiu, Microsoft Research Venkata N. Padmanabhan, Microsoft Research Geoffrey M. Voelker, UCSD IEEE INFOCOM’2001,
On Non-Disjoint Dominating Sets for the Lifetime of Wireless Sensor Networks Akshaye Dhawan.
Chapter 3. Community Detection and Evaluation May 2013 Youn-Hee Han
Zibin Zheng DR 2 : Dynamic Request Routing for Tolerating Latency Variability in Cloud Applications CLOUD 2013 Jieming Zhu, Zibin.
Peer Centrality in Socially-Informed P2P Topologies Nicolas Kourtellis, Adriana Iamnitchi Department of Computer Science & Engineering University of South.
Efficient P2P Search by Exploiting Localities in Peer Community and Individual Peers A DISC’04 paper Lei Guo 1 Song Jiang 2 Li Xiao 3 and Xiaodong Zhang.
PROP: A Scalable and Reliable P2P Assisted Proxy Streaming System Computer Science Department College of William and Mary Lei Guo, Songqing Chen, and Xiaodong.
Most of contents are provided by the website Graph Essentials TJTSD66: Advanced Topics in Social Media.
Efficient Computing k-Coverage Paths in Multihop Wireless Sensor Networks XuFei Mao, ShaoJie Tang, and Xiang-Yang Li Dept. of Computer Science, Illinois.
On Reducing Mesh Delay for Peer- to-Peer Live Streaming Dongni Ren, Y.-T. Hillman Li, S.-H. Gary Chan Department of Computer Science and Engineering The.
1 1 COMP5331: Knowledge Discovery and Data Mining Acknowledgement: Slides modified based on the slides provided by Lawrence Page, Sergey Brin, Rajeev Motwani.
Stefanos Antaris A Socio-Aware Decentralized Topology Construction Protocol Stefanos Antaris *, Despina Stasi *, Mikael Högqvist † George Pallis *, Marios.
November 4, 2003Applied Research Laboratory, Washington University in St. Louis APOC 2003 Wuhan, China Cost Efficient Routing in Ad Hoc Mobile Wireless.
1 Friends and Neighbors on the Web Presentation for Web Information Retrieval Bruno Lepri.
Content Delivery Networks: Status and Trends Speaker: Shao-Fen Chou Advisor: Dr. Ho-Ting Wu 5/8/
Data Consolidation: A Task Scheduling and Data Migration Technique for Grid Networks Author: P. Kokkinos, K. Christodoulopoulos, A. Kretsis, and E. Varvarigos.
Video Caching in Radio Access network: Impact on Delay and Capacity
On the Placement of Web Server Replicas Yu Cai. Paper On the Placement of Web Server Replicas Lili Qiu, Venkata N. Padmanabhan, Geoffrey M. Voelker Infocom.
Network Partition –Finding modules of the network. Graph Clustering –Partition graphs according to the connectivity. –Nodes within a cluster is highly.
NOTE: To change the image on this slide, select the picture and delete it. Then click the Pictures icon in the placeholder to insert your own image. Fast.
Efficient Placement and Dispatch of Sensors in a Wireless Sensor Network You-Chiun Wang, Chun-Chi Hu, and Yu-Chee Tseng IEEE Transactions on Mobile Computing.
1 New metrics for characterizing the significance of nodes in wireless networks via path-based neighborhood analysis Leandros A. Maglaras 1 Dimitrios Katsaros.
Nikos Dimokas1 Dimitrios Katsaros2 (presentation)
The Impact of Replacement Granularity on Video Caching
Ahmed Helmy Computer and Information Science and Engineering (CISE)
Nikos Dimokas1 Dimitrios Katsaros (presentation) Leandros Tassiulas2
Presentation transcript:

CDNs Content Outsourcing via Generalized Communities Dimitrios Katsaros, Ph.D. Heraklion, March 20 th, Dept. of Computer & Communication Engineering, University of Dept. of Informatics, Aristotle University

2 Outline of the talk A summary of my research Latest results: “ CDNs Content Outsourcing via Generalized Communities” (IEEE Transactions on Knowledge & Data Engineering) PRIMITIVE: Community Identification METHOD: Content Outsourcing for CDNs GOAL: Access Latency Reduction & Robustness

3 My Research Areas (chronological info) WIRELESS NETWORKS Mobile & Pervasive Computing Data Management Caching ( ’04 ) Air-Indexing ( ’07 ) Data Dissemination Broadcast Scheduling ( ’04 ) Prediction Mobility Prediction ( ’03+’08 ) Prefetching ( ’03 ) Mobile Ad Hoc Networks Content-based Multimedia Retrieval ( ’05+’08 ) Broadcasting ( ’06+’08 ) Wireless Sensor Networks Sensor Network Clustering ( ’07 ) (Distr+Local) Data Indexing ( ’06+’08 ) Cooperative Caching ( ’07+’08 ) Data Dissemination ( ’08 ) WIRED NETWORKS Conventional and Streaming Media Distribution in the Web Replication ( ’03 ) Prefetching ( ’01+’02+03 ) Caching ( ’04 ) Overlay and P2P Networks Content Distribution Networks ( ’05+’06 ) Content Placement in CDNs ( ’07+’08 ) Indexing & Query Routing in P2P ( progress ) Distributed Structures over P2P ( progress ) Web Information Retrieval and Data Mining Web Link Mining ( ’05 ) Web Ranking ( ’07+’08 ) Rank Aggregation ( ’07+’08 ) Social Network Analysis ( ’07+’08 ) Bibliometrics (’06+’07+’08)

4 Research areas: Ultimately  ??? Overlay Nets Mobile/Pervasive Computing Sensors Ad Hoc Information Retrieval Web Location Tracking Caching & Air-Indexing Peer-to-Peer Networks Content Distribution Networks Caching & Prefetching & Replication & Semistructured Data & Web views Web Ranking & Search Engines Social Network Analysis Cooperative Caching & Sensor Node Clustering & Distributed Indexing & Coverage/Connectivity & Flash storage & Content-Based MIR Broadcasting & Data Dissemination Webcasting INTELLIGENCE Pervasive Web

5 Content Outsourcing The problem: flash crowds The solution: CDNs Reactive vs proactive solutions Community identification The CiBC algorithm Evaluation

6 A problem… Feb 3, 2004: Google linked banner to “julia fractals” Users clicking directed to Australian University web site …University’s network link overloaded, web server taken down temporarily…

7 The problem strikes again! Feb 4, 2004: Slashdot ran the story about Google …Site taken down temporarily…again

8 The response from down under… later…Paul Bourke asks: “They have hundreds (thousands?) of servers worldwide that distribute their traffic load. If even a small percentage of that traffic is directed to a single server … what chance does it have?” → Help him ←

9 Existing approaches Client-side proxying Squid, Summary Cache, hierarchical cache, CoDeeN, Squirrel, Backslash, PROOFS, … Problem: Not 100% coverage Throw money at the problem Load-balanced servers, fast network connections Problem: Can’t afford or don’t anticipate need Content Distribution Networks (CDNs) Akamai, Digital Island, Mirror Image, …

10 End User Origin Server Origin Server From Internet Mazes to …

11 SydneySydney SeattleSeattle San SanJose Jose DenverDenver TokyoTokyo Los Angeles Hong Kong Hong Kong DallasDallas MiamiMiami AtlantaAtlanta New York New York ChicagoChicago ParisParisStockholmStockholmZurichZurich AmsterdamAmsterdam Toronto Toronto Boston Boston Washington D.C. LondonLondon SingaporeSingapore FrankfurtFrankfurt Content distribution

12 Content Distribution Network (CDNs)

13 Types of CDNs uncooperative cooperative Akamai Coral pullpush First IEEE JSAC’03, and What is described here today X

14 Comparison Outsourcing Policies Replication redundancy Commun. cost Update costTemporal coherency Uncooperative pull High Low Cooperative pull LowHighMedium Uncooperative push HighLowMedium Cooperative push LowMediumLowHigh

15 Cooperative push What to push? Frequently accessed content (IEEE JSAC’03) Hard to predict what will be popular! Popularity changes rapidly, too! Request statistics? Reactive approach Can we devise a proactive solution? Where to store the pushed content? Easy; a lot of replica placement algorithms

16 Communities as “attractors”

17 Web-site communities DO exist hollins.eduAntonis Sidiropoulos et al., WWW Journal, 11(1), 2008

18 “Hard” (max-flow) communities COMMUNITY: a subset of the nodes of a graph, with the property that: (for each node of the community) The number of links to other nodes belonging to the community is larger than the number of links to nodes NOT belonging to the community

19 “Hard”, but inefficient

20 Generalized communities … COMMUNITY: a subset of the nodes of a graph, with the property that: (for each node of the community) The sum of all degrees within the community is larger than the sum of all degrees toward the rest of graph

21 Social Network Analysis A social network is a social structure to describe social relations (wikipedia) History of Social Network is older than everybody who is here (more than 100 years – Cooley 1909, Durkheim 1893) [ book: Stanley Wasserman & Katherine Faust ] 1.Mathematical Representation 2.Structural & Locational Properties 1.Centrality Betweenness centrality 3.Roles & Positions 4.Dyadic & Triadic Methods

22 Betweenness Centrality σ uw = σ wu : number of shortest paths from u  V to w  V ( σ uu = 0) σ uw (v) : number of shortest paths from u to w that some vertex v  V lies on Betweenness Centrality NI(v) of a vertex v is:

23 Betweenness Centrality in sample graphs W R U P A C X Y T V Q B

24 Betweenness Centrality in sample graphs 13 (0) 15 (0) 20 (0) 19 (0) 17 (1) 1 (0) 2 (0) 3 (68) 6 (0) 5 (0) 4 (96) 7 (156) 14 (233) 12 (0) 8 (26) 18 (97) 16 (131) 11 (0) 10 (0) 9 (0) W (3,33) R (9,33) U (54) P (41) A (6,67) C (0) X (0) Y (0) T (1,33) V (1,33) Q (8) B (13) Nodes with large NI:  Articulation nodes (in bridges), e.g., 3, 4, 7, 16, 18  With large fanout, e.g., 14, 8, U

25 Betweenness centrality in … [ WEB ] Performing graph clustering and recognizing communities in Web site graphs

26 CiBC Method Target: is true CiBC method: Building “cliques” and clusters around representative (pole) nodes (with low CB)

27 CiBC Method IDNI index Phase 1: NI Computation -O(nm) Phase 2: Initialization of cliques O(n)

28 CiBC Method IDNI index Phase 2: Initialization of cliques O(n)

29 CiBC Method IDNI index Phase 2: Initialization of cliques O(n)

30 CiBC Method IDNI index Phase 2: Initialization of cliques O(n)

31 CiBC Method IDNI index Phase 2: Initialization of cliques O(n)

32 CiBC Method A B ABCD A3300 B3311 C0134 D CD Phase 3: Clique Merging & Creation of Communities Complexity: O(l 2 ) l is the number of cliques

33 CiBC Method A B ABCD A3300 B3311 C01 34 D CD Phase 3: Clique Merging & Creation of Communities 4343

34 CiBC Method A B ABC A330 B332 C C Phase 3: Clique Merging & Creation of Communities

35 CiBC Method A B ABC A 33 0 B332 C C Phase 3: Clique Merging & Creation of Communities

36 CiBC Method A AC A92 C C Phase 3: Clique Merging & Creation of Communities Phase 4: Check constraints

37 Evaluation … Need for: Web site graphs CDN Topology Networking issues Request streams Roaming over the site graph Impossible to find real data for all these … Simulators for each of them To compensate for the lack of any of the above

38 Simulators Web site graphs Simulating the growth process of the Web Request streams Random surfer (following links + teleportation) CDN CDNSim (

39 Competing methods Communities-based methods Clique Percolation Method (CPM) Correlation Clustering Communities identification method (C3i) Simple Web Caching (LRU) No CDN (only the origin server) Full Replication

40 Metrics Mean Response Time (MRT) : the expected time for a request to be satisfied Response time CDF : the Cumulative Distribution Function (CDF) denotes the probability of having response times lower or equal to a given response time Replica Factor (RF) : the percentage of the number of replica objects to the whole CDN infrastructure w.r.t. the total outsourced objects Byte Hit Ratio (BHR) Independent parameters a) Surrogates’ cache size b) graph assortativity

41 Situations examined Regular traffic Network delay dominates the other components Flash crowd event TCP setup delay + network delay dominate the other components

42 Regular traffic: MRT vs. comm. strength

43 Regular traffic: BHR vs. comm. strength

44 Regular traffic: MRT vs. cache size

45 Surge of requests: CiBC

46 Surge of requests: CPM

47 Surge of requests: C3i

48 Surge of requests: LRU

49 Discussion CDNs: industrial interest for them Content outsourcing: significant issue Proactive content outsourcing Discovery of communities Placement to surrogate servers CiBC prevails

50 References Our work D. Katsaros , G. Pallis, K. Stamos, A. Sidiropoulos, A. Vakali, Y. Manolopoulos. “ CDNs Content Outsourcing via Generalized Communities ”. IEEE Transactions on Knowledge and Data Engineering, State-of-the-art competing method [ CPM community identification method ] G. Palla, I.Derenyi, I.Farkas, and T.Vicsek. Uncovering the overlapping community structure of complex networks in nature and society. Nature, 435(7043):814–818, 2005.

Thanks to my collaborators at A.U.Th Thank you for your attention! Questions?

52 Generalized Web Page Community A subgraph U ( V u,E u ) of a Web site graph G constitutes a Web page community, if satisfies the following which means that the sum of all degrees within the community U is larger than the sum of all degrees toward the rest of the graph G The hard Web page community (Flake et. al) A subgraph U(V u,E u ) of a Web site graph G constitutes a Web page community, if every node v satisfies the following criterion:

53 The CiBC Algorithm (1/4) The Web site is represented by a Web graph G = ( V,E ), where its nodes are the Web pages and the edges depict the hyperlinks among Web pages Input:Web site graph Output: a set of Web page communities; These communities constitute the set of objects which are outsourced to the surrogate servers Phase I: Computation of Betweenness Centrality the pole nodes – the nodes with the lowest Betweenness Centrality the concept of Betweenness Centrality (BC) is used to select the pole nodes Phase II: Nodes Accumulation around Pole Nodes nodes are accumulated around identified pole nodes by making use of Web graph properties; a set of Web page communities is created

54 The CiBC Algorithm (2/4) Betweenness Centrality (BC) reflects the amount of control exerted by a given Web page over the interactions between the other Web pages in the Web server content structure.

55 The CiBC Algorithm (3/4) Phase I: Computation of BC Compute the Betweenness Centrality (BC) of the Web graph’s nodes Nodes with high (low) BC reside at the center (borders) of the clusters Sort the nodes by the ascending order of their BC values

56 The CiBC Algorithm (4/4) Phase II: Accumulation around Pole Nodes The pole node with the lowest BC value is selected to be accumulated It is checked if it belongs to any group. If not, it indicates a distinctive one and then, it is accumulated by the nodes which are directly connected with it The pole node is expanded by the nodes which are traversed using the BFS algorithm BFS expands uniformly the groups The resulted kernel groups are processed (merged/deleted) in order to create generalized Web page communities

57 Performance Evaluation Examined Methods Clique Percolation Method (CPM): The outsourced objects obtained by the CPM correspond to k-clique percolation clusters in the network. A k-clique percolation cluster is a sub-graph containing k-cliques (complete sub-graphs of size k) that can all reach each other through chains of k- clique adjacency, where two k-cliques are said to be adjacent if they share k - 1 nodes. Experiments have shown that this method is quite efficient when it is applied on large graphs. Web caching scheme (LRU): The objects are stored reactively to proxy cache servers. We consider that each proxy cache server follows the LRU (Least Recently Used) replacement policy since it is the typical case for the popularity of proxy servers (e.g., Squid ). No Replication (W/O CDN): All the objects are placed on the origin server and there is no CDN/no proxy servers. This policy represents the “worst-case” scenario. Full Replication (FR): All the objects have been outsourced to all the CDN’s surrogate servers. This (unrealistic) policy represents the “optimal-case” scenario.

58 Performance evaluation parameters Simulation Testbed

59 Content Replication Problem Lat-cdn: the outsourced objects are placed to surrogate servers with respect to the total network’s latency, without taking into account the objects’ popularity (La-Web 2005) il2p: the outsourced objects are placed to surrogate servers integrating both the network’s latency and the objects’ load (ICDE workshop 2006)

60 Problem Formulation The content replication problem is to select the optimal placement x such that it minimizes D ik (x) is the “distance” to a replica of object k from surrogate server i under the placement x the distance reflects the latency (the elapsed time between when a user issues a request and when it receives the response) N is the number of surrogate servers, K is the number of outsourced objects, λ i is the request rate for surrogate server i, and p κ is the probability that a client will request the object k. Content Replication Problem

61 For each outsourced object, we find which is the best surrogate server in order to place it (produces the minimum network latency) We select from all the pairs of outsourced object – surrogate server that have been occurred in the previous step, the one which produces the largest network latency, and thus place this object to that surrogate server Surrogate servers become full? No Yes All the “outsourced objects” are stored in the origin server and all the CDN’s surrogate servers are empty The final Placement CDN Infrastructure outsourced objects The Lat-cdn Algorithm: The Flowchart

62 The il2p ( i ntegration of l atency and l oad object p lacement) Algorithm Main idea Considering that all the outsourced objects are initially placed on an origin server, the content replication problem is separated into two sub-problems: Choice of the best surrogate server to replicate an outsourced object (based on the network’s latency) Arrangement priorities for outsourced objects replication (based on the objects’ load)

63 The il2p Algorithm Arrangement priorities for outsourced objects replication From the objects assigned to a single server we replicate the object k which has the maximum utility value. Utility_Value k =load k *latency k,where load k =access_rate k * s k latency k is the latency that the object k produces if it is replicated to the surrogate server which has been determined by the previous step, load k is the total load due to object k, access_rate k is defined as the number of accesses of object k per unit time and s k is the size of object k.

64 For each outsourced object, we find which is the best surrogate server in order to place it (produces the minimum network latency) We select from all the pairs of outsoursed object – surrogate server that have been occurred in the previous step, the one with the maximum utility value and thus place this object to that surrogate server Surrogate servers become full? No Yes All the “outsourced objects” are stored in the origin server and all the CDN’s surrogate servers are empty The final Placement CDN Infrastructure outsourced objects The il2p Algorithm: The Flowchart