Efficient and Robust Computation of Resource Clusters in the Internet Efficient and Robust Computation of Resource Clusters in the Internet Chuang Liu,

Slides:



Advertisements
Similar presentations
Automatic Configuration of Internet Services Wei Zheng, Ricardo Bianchini, and Thu Nguyen Department of Computer Science Rutgers University.
Advertisements

1 Evaluation Rong Jin. 2 Evaluation  Evaluation is key to building effective and efficient search engines usually carried out in controlled experiments.
Fast Algorithms For Hierarchical Range Histogram Constructions
Community Detection Laks V.S. Lakshmanan (based on Girvan & Newman. Finding and evaluating community structure in networks. Physical Review E 69,
Community Detection Algorithm and Community Quality Metric Mingming Chen & Boleslaw K. Szymanski Department of Computer Science Rensselaer Polytechnic.
Maximizing the Spread of Influence through a Social Network
Context-aware Query Suggestion by Mining Click-through and Session Data Authors: H. Cao et.al KDD 08 Presented by Shize Su 1.
Algorithm Strategies Nelson Padua-Perez Chau-Wen Tseng Department of Computer Science University of Maryland, College Park.
Forwarding Redundancy in Opportunistic Mobile Networks: Investigation and Elimination Wei Gao 1, Qinghua Li 2 and Guohong Cao 3 1 The University of Tennessee,
Expediting Searching Processes via Long Paths in P2P Systems 05/30 IDEA Lab.
On Computing Compression Trees for Data Collection in Wireless Sensor Networks Jian Li, Amol Deshpande and Samir Khuller Department of Computer Science,
Clustering short time series gene expression data Jason Ernst, Gerard J. Nau and Ziv Bar-Joseph BIOINFORMATICS, vol
Using Structure Indices for Efficient Approximation of Network Properties Matthew J. Rattigan, Marc Maier, and David Jensen University of Massachusetts.
© University of Minnesota Data Mining for the Discovery of Ocean Climate Indices 1 CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance.
Distributed Cluster Repair for OceanStore Irena Nadjakova and Arindam Chakrabarti Acknowledgements: Hakim Weatherspoon John Kubiatowicz.
A Symmetric and Polyvalent Resource Location System Candidate: Chuang Liu Advisor: Ian Foster University of Chicago.
SubSea: An Efficient Heuristic Algorithm for Subgraph Isomorphism Vladimir Lipets Ben-Gurion University of the Negev Joint work with Prof. Ehud Gudes.
Peer-to-peer archival data trading Brian Cooper and Hector Garcia-Molina Stanford University.
Systematic Analysis of Interactome: A New Trend in Bioinformatics KOCSEA Technical Symposium 2010 Young-Rae Cho, Ph.D. Assistant Professor Department of.
Clustering Unsupervised learning Generating “classes”
Efficient and Robust Query Processing in Dynamic Environments Using Random Walk Techniques Chen Avin Carlos Brito.
Texas Learning and Computation Center High Performance Systems Lab Automatic Clustering of Grid Nodes Nov 14, 2005 Qiang Xu, Jaspal Subhlok University.
Ahsanul Haque *, Swarup Chandra *, Latifur Khan * and Charu Aggarwal + * Department of Computer Science, University of Texas at Dallas + IBM T. J. Watson.
Mehdi Kargar Aijun An York University, Toronto, Canada Keyword Search in Graphs: Finding r-cliques.
Ahsanul Haque *, Swarup Chandra *, Latifur Khan * and Michael Baron + * Department of Computer Science, University of Texas at Dallas + Department of Mathematical.
May 2004 Department of Electrical and Computer Engineering 1 ANEW GRAPH STRUCTURE FOR HARDWARE- SOFTWARE PARTITIONING OF HETEROGENEOUS SYSTEMS A NEW GRAPH.
The X-Tree An Index Structure for High Dimensional Data Stefan Berchtold, Daniel A Keim, Hans Peter Kriegel Institute of Computer Science Munich, Germany.
Bug Localization with Machine Learning Techniques Wujie Zheng
Glasgow 02/02/04 NN k networks for content-based image retrieval Daniel Heesch.
Algorithm Course Dr. Aref Rashad February Algorithms Course..... Dr. Aref Rashad Part: 5 Graph Algorithms.
Distributed Information Retrieval Server Ranking for Distributed Text Retrieval Systems on the Internet B. Yuwono and D. Lee Siemens TREC-4 Report: Further.
Project funded by the Future and Emerging Technologies arm of the IST Programme Analytical Insights into Immune Search Niloy Ganguly Center for High Performance.
Efficient Route Computation on Road Networks Based on Hierarchical Communities Qing Song, Xiaofan Wang Department of Automation, Shanghai Jiao Tong University,
Clustering What is clustering? Also called “unsupervised learning”Also called “unsupervised learning”
Semantic Wordfication of Document Collections Presenter: Yingyu Wu.
Markov Cluster (MCL) algorithm Stijn van Dongen.
A Stable Fixed-outline Floorplanning Method Song Chen and Takeshi Yoshimura Graduate School of IPS, Waseda University March, 2007.
MINING COLOSSAL FREQUENT PATTERNS BY CORE PATTERN FUSION FEIDA ZHU, XIFENG YAN, JIAWEI HAN, PHILIP S. YU, HONG CHENG ICDE07 Advisor: Koh JiaLing Speaker:
Institute of Computing Technology, Chinese Academy of Sciences 1 A Unified Framework of Recommending Diverse and Relevant Queries Speaker: Xiaofei Zhu.
Network Community Behavior to Infer Human Activities.
Accelerating Dynamic Time Warping Clustering with a Novel Admissible Pruning Strategy Nurjahan BegumLiudmila Ulanova Jun Wang 1 Eamonn Keogh University.
Presentation: Genetic clustering of social networks using random walks ELSEVIER Computational Statistics & Data Analysis February 2007 Genetic clustering.
1 A Multi-Rate Routing Protocol with Connection Entropy for MANETs Cao Trong Hieu, Young Cheol Bang, Jin Ho Kim, Young An Kim, and Choong Seon Hong Presenter:
Scalable and Topology-Aware Load Balancers in Charm++ Amit Sharma Parallel Programming Lab, UIUC.
By: Gang Zhou Computer Science Department University of Virginia 1 Medians and Beyond: New Aggregation Techniques for Sensor Networks CS851 Seminar Presentation.
On Exploiting Transient Social Contact Patterns for Data Forwarding in Delay-Tolerant Networks 1 Wei Gao Guohong Cao Tom La Porta Jiawei Han Presented.
Repeated Game Modeling of Multicast Overlays Mike Afergan (MIT CSAIL/Akamai) Rahul Sami (University of Michigan) April 25, 2006.
Complexity and Efficient Algorithms Group / Department of Computer Science Testing the Cluster Structure of Graphs Christian Sohler joint work with Artur.
Community detection via random walk Draft slides.
Network Partition –Finding modules of the network. Graph Clustering –Partition graphs according to the connectivity. –Nodes within a cluster is highly.
Enhanced hypertext categorization using hyperlinks Soumen Chakrabarti (IBM Almaden) Byron Dom (IBM Almaden) Piotr Indyk (Stanford)
NN k Networks for browsing and clustering image collections Daniel Heesch Communications and Signal Processing Group Electrical and Electronic Engineering.
A Tutorial on Spectral Clustering Ulrike von Luxburg Max Planck Institute for Biological Cybernetics Statistics and Computing, Dec. 2007, Vol. 17, No.
XRANK: RANKED KEYWORD SEARCH OVER XML DOCUMENTS Lin Guo Feng Shao Chavdar Botev Jayavel Shanmugasundaram Abhishek Chennaka, Alekhya Gade Advanced Database.
Dynamic Network Analysis Case study of PageRank-based Rewiring Narjès Bellamine-BenSaoud Galen Wilkerson 2 nd Second Annual French Complex Systems Summer.
Topics In Social Computing (67810) Module 1 (Structure) Centrality Measures, Graph Clustering Random Walks on Graphs.
Resource Characterization Rich Wolski, Dan Nurmi, and John Brevik Computer Science Department University of California, Santa Barbara VGrADS Site Visit.
Yu Wang1, Gao Cong2, Guojie Song1, Kunqing Xie1
Greedy & Heuristic algorithms in Influence Maximization
Sequential Algorithms for Generating Random Graphs
A paper on Join Synopses for Approximate Query Answering
Minimum Spanning Tree 8/7/2018 4:26 AM
Graph Analysis by Persistent Homology
Providing Secure Storage on the Internet
Korea University of Technology and Education
Lecture 6: Counting triangles Dynamic graphs & sampling
Storing and Replication in Topic-Based Pub/Sub Networks
Clustering The process of grouping samples so that the samples are similar within each group.
Presented by Nick Janus
Presentation transcript:

Efficient and Robust Computation of Resource Clusters in the Internet Efficient and Robust Computation of Resource Clusters in the Internet Chuang Liu, Ian Foster University of Chicago Argonne National Laboratory

2 What is the Problem? l Locate a set of resources with particular network connections in the Internet. l Q1: Find a set of R resources very close to each other: –The network latency between any pair of those resources is less than L milliseconds. l Q2: Find a set of R resources very far to each other: –The network latency between any pair of those resources is more than L milliseconds.

3 Challenges l It is a NP-hard problem. l It requires a large number of measurements. l Unstable networks and resources may lead to individual measurements failing. l Network latency data is noisy because of the sharing of network resources among users.

4 Intuition of Our Method l Clustering –Cluster: a set of resources that have smaller latency among each other, and much bigger latency with resources not in the cluster –Result of Clustering: a partition of resources such that each partition is a cluster, called Cluster structure l Search based on the cluster structure –Q1. Search for resources in a cluster –Q2. Search for resources from different clusters

5 Contributions l An effective clustering algorithm. –This method can find cluster structure even the latency measurements is incomplete. –Cluster structure is stable in a dynamic environment l An efficient search algorithm –Answer Q1 and Q2 based on the cluster structure. –Order-of-magnitude performance improvements

6 Outline l Problems l > Clustering Algorithm –Markov Cluster Algorithm –Stability of the Cluster Structure –Robustness of Clustering Algorithm l Search Algorithm l Performance Evaluation l Summary

7 Model of the Problem l Represent network connection of resources as a weighted graph –Each resource as a node –An edge between any two nodes –the reciprocal of latency measurement as the weight of each edge l In the graph representation of resources, a cluster is a set of nodes connected by heavy- weight edges l How to find the cluster structure in the graph?

8 The Markov Cluster Algorithm l Algorithm developed by S. van Dongen –A walker departs from one node. –Moves to one adjacent node in each step. The outlet edge is selected randomly in favor of heavy-weight edges. –After k steps, calculate the probability of a random walk starting from node i and ending at node j. If the probability is large, put them in the same cluster. l Walker tends to stay in the same cluster because he chooses high weight edges with high probability. l Granularity parameter G –With bigger G, the algorithm will create smaller clusters with smaller latency among resources in a cluster

9 Resources on Planetlab l Resources on Planetlab –400 Computers –200 sites –End-to-end pair-wise latencies collected by Jeremy Stribling

10 Cluster Structure of Resources on Planetlab l Cluster structures with different granularity –East America, West America, Central America, East Asian, South European, etc… –California, Texas, China, Korean, etc… –San Jose (HP, UCB, Stanford), Boston (BU, MIT), etc..

11 Outline l Problems l Clustering Algorithm –Markov Cluster Algorithm –> Stability of the Cluster Structure –Robustness of Clustering Algorithm l Search Algorithm l Performance Evaluation l Summary

12 Stability of the Cluster Structure l Latency among resources changes over time due to dynamic nature of Internet l Questions –Will the created cluster structures change over time? –Will the difference becomes larger over time?

13 Stability of the Cluster Structure l Latencies measurements collected at the beginning of each hour over a 5-day period. –24*5=120 sets of data in total l Produce cluster structure for each set of latency data and calculate the difference between these structure. l Metric of difference: D value –D value is defined as the proportion of nodes that must be exchanged to transform any of the two cluster structures into the other. –D is between 0 to 1 –Small D means cluster structure is stable

14 Histogram distribution of D values l Difference between each cluster structure with the one based on data 1 hour ago. l About 30% of cluster structures change less than 10% (D = 0.1) from one hour ago, More than 60% of cluster structures change between 10% and 15% from one hour ago l Will the created cluster structures change over time? –Yes, but not much.

15 Histogram distribution of D values l Compare clustering result with results based on data two and four hours ago. l D values does not get larger with the increase of time. –Distribution of D values is similar for 1, 2, and 4 hours. l Will the difference becomes larger over time? –No, in a few hours.

16 Conclusion l Cluster structure for Planetlab resources is relatively stable in short term l We do not need to rebuild the cluster structure frequently

17 Outline l Problems l Clustering Algorithm –Markov Cluster Algorithm –Stability of the Cluster Structure –> Robustness of Clustering Algorithm l Search Algorithm l Performance Evaluation l Summary

18 Robustness of Clustering Algorithm l Available latency measurements is only a subset of all possible measurements. –On Planetlab, from to , 25% to 30% of measurements are available at most of the time. Occasionally, only 10-15% are available.

19 Robustness of Clustering Algorithm l Question –Can the cluster algorithm find the right cluster structure based on an incomplete set of measurements?

20 Experimental Result l Compute cluster structures using a 10-90% of all data. l Compare the difference with the structure based on all data by D value. l The cluster algorithm is still effective when running on an incomplete set of data Frac90%80%70%60%50%40%30%20%1% D

21 Outline l Problems l Clustering Algorithm l > Search Algorithm l Performance Evaluation l Summary

22 Traditional Tree Search Algorithm l Starts with an empty set l Repeatedly picks from available resources one resource that has required connections with current members in the set, and adds it to the set l Rolls back the addition in previous step if no such resource exists l Finishes when the set contains all required resources l It is a NP hard problem

23 Modified Tree Search Algorithm l Q1: pick resources from the same clusters. l Q2: pick resources from different clusters. l Reduce the search space remarkably. –Search space is defined as the possible combinations of resources Granularity Ratio1.4E-43.6E-61E-6

24 Outline l Problems l Clustering Algorithm l Search Algorithm l > Performance Evaluation l Summary

25 Benchmark Queries l Q1 searches for R resources with latency between any two of them smaller than L. l Q2 searches for R resources with latency between any two of them more than L. l We build 1000 Q1 and 1000 Q2 by randomly choosing value of R and L.

26 Performance of Q1 l Cumulative distribution of execution time l Our algorithm answers 80% percentage of queries in less than a few millisecond. Algorithm70%90% tree0.6 s26 s modified1.6 ms0.4 s

27 Performance of Q2 l Cumulative distribution of execution time Algorithm70%90% tree0.6 s26 s heuristic5 s52 s

28 Summary l Markov Cluster algorithm can determine cluster structure based on incomplete latency measurements. l The cluster structure is stable in an Internet environment. l A heuristic algorithm to answer Q1 and Q2. l The algorithm archives order-of-magnitude performance improvements.

29 Contact l Chuang Liu: l Paper details: l Thank you

30 Model of the Problem

31 The Markov Cluster Algorithm l Random walk –A walker departs from one node –In each step, he randomly selects a outlet edge in the probability proportional to the edge weight, and moves to the other end of the edge l Intuition –Walker will stay in the same cluster with high probability because he tends to choose high weight edges. –Calculate the probability of a random walk starting from node i and ending at node j after k steps, and put them in the same cluster if the probability is large. –Granularity parameter G l Developed by S. van Dongen

32 Capability to Find Resources l Q1 l Q2 AlgorithmResult FoundNot ResultTimeout Tree Modified AlgorithmResult FoundNot ResultTimeout Tree Modified

33 Cluster Structure of Resources on Planetlab l Clusters –East America, West America, Central America, East Asian, South European, etc… –California, Texas, China, Korean, etc… –San Jose (HP, UCB, Stanford), Boston (BU, MIT), etc.. G # of clusters Median latency ms ms ms