Charalampos (Babis) E. Tsourakakis Brown University Brown University May 22 nd 2014 Brown University1.

Slides:



Advertisements
Similar presentations
CMU SCS I2.2 Large Scale Information Network Processing INARC 1 Overview Goal: scalable algorithms to find patterns and anomalies on graphs 1. Mining Large.
Advertisements

Charalampos (Babis) E. Tsourakakis KDD 2013 KDD'131.
A Model of Computation for MapReduce
CHARALAMPOS E. TSOURAKAKIS SCHOOL OF COMPUTER SCIENCE CARNEGIE MELLON UNIVERSITY Fast counting of triangles in large networks without counting: Algorithms.
Multicut Lower Bounds via Network Coding Anna Blasiak Cornell University.
Information Networks Graph Clustering Lecture 14.
Online Social Networks and Media. Graph partitioning The general problem – Input: a graph G=(V,E) edge (u,v) denotes similarity between u and v weighted.
Xiaowei Ying, Xintao Wu, Daniel Barbara Spectrum based Fraud Detection in Social Networks 1.
Spectrum Based RLA Detection Spectral property : the eigenvector entries for the attacking nodes,, has the normal distribution with mean and variance bounded.
1 Modularity and Community Structure in Networks* Final project *Based on a paper by M.E.J Newman in PNAS 2006.
On the Spread of Viruses on the Internet Noam Berger Joint work with C. Borgs, J.T. Chayes and A. Saberi.
Object Detection by Matching Longin Jan Latecki. Contour-based object detection Database shapes: …..
Fast FAST By Noga Alon, Daniel Lokshtanov And Saket Saurabh Presentation by Gil Einziger.
Los Angeles September 27, 2006 MOBICOM Localization in Sparse Networks using Sweeps D. K. Goldenberg P. Bihler M. Cao J. Fang B. D. O. Anderson.
Introduction to Approximation Algorithms Lecture 12: Mar 1.
Graph Sparsifiers by Edge-Connectivity and Random Spanning Trees Nick Harvey U. Waterloo C&O Joint work with Isaac Fung TexPoint fonts used in EMF. Read.
Online Graph Avoidance Games in Random Graphs Reto Spöhel Diploma Thesis Supervisors: Martin Marciniszyn, Angelika Steger.
CPSC 689: Discrete Algorithms for Mobile and Wireless Systems Spring 2009 Prof. Jennifer Welch.
Randomized Algorithms and Randomized Rounding Lecture 21: April 13 G n 2 leaves
Clustering Social Networks Isabelle Stanton, University of Virginia Joint work with Nina Mishra, Robert Schreiber, and Robert E. Tarjan.
Online Vertex Colorings of Random Graphs Without Monochromatic Subgraphs Reto Spöhel, ETH Zurich Joint work with Martin Marciniszyn.
Computing Sketches of Matrices Efficiently & (Privacy Preserving) Data Mining Petros Drineas Rensselaer Polytechnic Institute (joint.
The community-search problem and how to plan a successful cocktail party Mauro SozioAris Gionis Max Planck Institute, Germany Yahoo! Research, Barcelona.
Randomness in Computation and Communication Part 1: Randomized algorithms Lap Chi Lau CSE CUHK.
Charalampos (Babis) E. Tsourakakis WAW 2010, Stanford 16 th December ‘10 WAW '101.
Finding a maximum independent set in a sparse random graph Uriel Feige and Eran Ofek.
Eigenvectors of random graphs: nodal domains James R. Lee University of Washington Yael Dekel and Nati Linial Hebrew University TexPoint fonts used in.
Models of Influence in Online Social Networks
Neighbourhood Sampling for Local Properties on a Graph Stream A. Pavan, Iowa State University Kanat Tangwongsan, IBM Research Srikanta Tirthapura, Iowa.
Graph Sparsifiers Nick Harvey University of British Columbia Based on joint work with Isaac Fung, and independent work of Ramesh Hariharan & Debmalya Panigrahi.
Efficient Gathering of Correlated Data in Sensor Networks
Dense subgraphs of random graphs Uriel Feige Weizmann Institute.
Approximation Algorithms for NP-hard Combinatorial Problems Magnús M. Halldórsson Reykjavik University
Finding dense components in weighted graphs Paul Horn
Theory of Computing Lecture 15 MAS 714 Hartmut Klauck.
Greedy Approximation Algorithms for finding Dense Components in a Graph Paper by Moses Charikar Presentation by Paul Horn.
A Clustering Algorithm based on Graph Connectivity Balakrishna Thiagarajan Computer Science and Engineering State University of New York at Buffalo.
Graph Sparsifiers Nick Harvey Joint work with Isaac Fung TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A A.
DATA MINING LECTURE 13 Pagerank, Absorbing Random Walks Coverage Problems.
An Efficient Algorithm for Enumerating Pseudo Cliques Dec/18/2007 ISAAC, Sendai Takeaki Uno National Institute of Informatics & The Graduate University.
Randomized Composable Core-sets for Submodular Maximization Morteza Zadimoghaddam and Vahab Mirrokni Google Research New York.
Lecture 3 1.Different centrality measures of nodes 2.Hierarchical Clustering 3.Line graphs.
Artur Czumaj DIMAP DIMAP (Centre for Discrete Maths and it Applications) Computer Science & Department of Computer Science University of Warwick Testing.
Data Structures and Algorithms in Parallel Computing Lecture 7.
1 Latency-Bounded Minimum Influential Node Selection in Social Networks Incheol Shin
Graphs, Vectors, and Matrices Daniel A. Spielman Yale University AMS Josiah Willard Gibbs Lecture January 6, 2016.
Estimating PageRank on Graph Streams Atish Das Sarma (Georgia Tech) Sreenivas Gollapudi, Rina Panigrahy (Microsoft Research)
Complexity and Efficient Algorithms Group / Department of Computer Science Testing the Cluster Structure of Graphs Christian Sohler joint work with Artur.
Sampling in Graphs Alexandr Andoni (Microsoft Research)
Network Partition –Finding modules of the network. Graph Clustering –Partition graphs according to the connectivity. –Nodes within a cluster is highly.
NOTE: To change the image on this slide, select the picture and delete it. Then click the Pictures icon in the placeholder to insert your own image. Fast.
Laplacian Matrices of Graphs: Algorithms and Applications ICML, June 21, 2016 Daniel A. Spielman.
CHAPTER SIX T HE P ROBABILISTIC M ETHOD M1 Zhang Cong 2011/Nov/28.
Laplacian Matrices of Graphs: Algorithms and Applications ICML, June 21, 2016 Daniel A. Spielman.
Graph clustering to detect network modules
Random Walk for Similarity Testing in Complex Networks
Cohesive Subgraph Computation over Large Graphs
Finding Dense and Connected Subgraphs in Dual Networks
DOULION: Counting Triangles in Massive Graphs with a Coin
Minimum Spanning Tree 8/7/2018 4:26 AM
June 2017 High Density Clusters.
Density Independent Algorithms for Sparsifying
MST in Log-Star Rounds of Congested Clique
CIS 700: “algorithms for Big Data”
Randomized Algorithms CS648
CSCI B609: “Foundations of Data Science”
On the effect of randomness on planted 3-coloring models
3.3 Network-Centric Community Detection
Lecture 6: Counting triangles Dynamic graphs & sampling
Approximate Graph Mining with Label Costs
Presentation transcript:

Charalampos (Babis) E. Tsourakakis Brown University Brown University May 22 nd 2014 Brown University1

 Introduction  Finding near-cliques in graphs  Conclusion Brown University2

b) Internet (AS) c) Social networks a) World Wide Web d) Braine) Airline Brown University f) Communication 3

Daniel Spielman “Graph theory is the new calculus” Used in analyzing: log files, user browsing behavior, telephony data, webpages, shopping history, language translation, images … Brown University4

genes tumors Gene Expression data Protein interactions aCGH data 5

 Big data is not about creating huge data warehouses.  The true goal is to create value out of data  How do we design better marketing strategies?  How do people establish connections and how does the underlying social network structure affect the spread of ideas or diseases?  Why do some mutations cause cancer whereas others don’t? Brown University Unprecedented opportunities for answering long-standing and emerging problems come with unprecedented challenges 6

Imperial College Research topics Modelling Q1: Real-world networks Q2: Graph mining problems Q3: Cancer progression (joint work with NIH) Algorithm design Q4: Efficient algorithm design ( RAM, MapReduce, streaming) Q5: Average case analysis Q6: Machine learning Implementations and Applications Q7: Efficient implementations for Petabyte-sized graphs. Q8: Mining large-scale datasets (graphs and biological datasets)

 Introduction  Finding near-cliques in graphs  Conclusion Brown University8

9 K4

Brown University10

Brown University A single edge achieves always maximum possible f e Densest subgraph problem k-Densest subgraph problem DalkS (Damks) 11

 Solvable in polynomial time (Goldberg, Charikar, Khuller-Saha)  Fast ½-approximation algorithm (Charikar)  Remove iteratively the smallest degree vertex  Remark: For the k-densest subgraph problem the best known approximation is O(n 1/4 ) (Bhaskara et al.) Brown University12

Brown University13

Brown University14

Brown University15

Brown University16

 Motivating question Can we combine the best of both worlds? A) Formulation solvable in polynomial time. B) Consistently succeeds in finding near- cliques? Yes! [T. ’14] Brown University17

Brown University Whenever the densest subgraph problem fails to output a near-clique, use the triangle densest subgraph instead!

Brown University19

Brown University20 Type 3 Type 1 Type 2

Brown University21

..To a max flow computation on this network Brown University22 s t A=V(G) B=T(G) tvtv 2 1 3α3α v

s A1A1 B1B1 t A2A B2B2 Min-(s,t) cut Imperial College

s A1A B1B1 t A2A B2B2 We pay 0 for each type 3 triangle in a minimum st cut Brown University24

s A1A B1B1 t A2A B2B2 2 s A1A B1B1 t A2A B2B2 1 1 We pay 2 for each type 2 triangle in a minimum st cut Brown University25

s A1A B2B2 t A2A B1B1 1 We pay 1 for each type 1 triangle in a minimum st cut Brown University26

Brown University27

Brown University28

Brown University29

Brown University Theorem: There exists an efficient MapReduce algorithm which runs for any ε>0 in O(log(n)/ε) rounds and provides a 1/(3+3ε) approximation to the triangle densest subgraph problem. 30

Brown University31

Brown University32

 Our techniques generalize to maximizing the average k-clique density for any constant k. Brown University33 s t A=V(G) B=C(G) cvcv k-1 1 kαkα v

A C B [Wasserman Faust ’94] Friends of friends tend to become friends themselves! Brown University34 Social networks are abundant in triangles. E.g., Jazz network n=198, m=2,742, T=143,192  Triangle counting appears in many applications!

Brown University35 Degree-triangle correlations Empirical observation Spammers/sybil accounts have small clustering coefficients. Used by [Becchetti et al., ‘08], [Yang et al., ‘11] to find Web Spam and fake accounts respectively The neighborhood of a typical spammer (in red)

AlonYusterZwick Asymptotically the fastest algorithm but not practical for large graphs. In practice, one of the iterator algorithms are preferred. Node Iterator (count the edges among the neighbors of each vertex) Edge Iterator (count the common neighbors of the endpoints of each edge) Both run asymptotically in O(mn) time. Brown University36

 r independent samples of three distinct vertices Brown University37 X=1 X=0 T3T3 T2T2 T1T1 T0T0

 r independent samples of three distinct vertices Brown University 38 Then the following holds: with probability at least 1-δ Works for dense graphs. e.g., T 3 n 2 logn

 (Yosseff, Kumar, Sivakumar ‘02) require n 2 /polylogn edges  More follow up work:  (Jowhari, Ghodsi ‘05)  (Buriol, Frahling, Leondardi, Marchetti, Spaccamela, Sohler ‘06)  (Becchetti, Boldi, Castillio, Gionis ‘08)  ….. Brown University39

Brown University Keep only 3! 3 eigenvalues of adjacency matrix i-th eigenvector Political Blogs [T.’08] 40

 Approximate a given graph G with a sparse graph H, such that H is close to G in a certain notion.  Examples: Cut preserving Benczur-Karger Spectral Sparsifier Spielman-Teng Brown University41

 t: number of triangles.  T: triangles in sparsified graph, essentially our estimate.  Δ: maximum number of triangles an edge is contained in.  Δ=O(n)  t max : maximum number of triangles a vertex is contained in.  t max =Ο(n 2 ) Brown University42

Brown University43 Gary L. Miller CMU Mihail N. Kolountzakis University of Crete Joint work with:

Brown University44

Brown University45 …. t/Δ Δ

Brown University46 …. t=n/3

 Notice that speedups are quadratic in p if we use any classic iterator counting algorithm.  Expected Speedup: 1/p 2  To see why, let R be the running time of Node Iterator after the sparsification: Therefore, expected speedup: Brown University47

Brown University48 Can we do even better? Yes, [Pagh, T.]

Brown University49 Rasmus Pagh, U. of Copenhagen Joint work with:

Brown University50

Brown University51

Brown University52

Brown University53 …. t/Δ Δ

Brown University54 …. t=n/3

Brown University55

Brown University56 1 k+1 2 Every graph on n vertices with max. degree Δ(G) =k is (k+1) -colorable with all color classes differing at size by at most 1. ….

 Create an auxiliary graph where each triangle is a vertex and two vertices are connected iff the corresponding triangles share a vertex.  Invoke Hajnal-Szemerédi theorem and apply Chernoff bound per each chromatic class. Finally, take a union bound. Q.E.D. Brown University57

Brown University58 Pr(X i =1|rest are monochromatic) =p ≠ Pr(X i =1)=p 2

 This algorithm is easy to implement in the MapReduce and streaming computational models.  See also Suri, Vassilvitski ‘11  As noted by Cormode, Jowhari [TCS’14] this results in the state of the art streaming algorithm in practice as it uses O(mΔ/Τ+m/T 0.5 ) space. Compare with Braverman et al’ [ICALP’13], space usage O(m/T 1/3 ). Brown University59

 Introduction  Finding near-cliques in graphs  Conclusion Brown University60

 Faster exact triangle-densest subgraph algorithm.  How do approximate triangle counting methods affect the quality of our algorithms for the triangle densest subgraph problem?  How do we extract efficiently all subgraphs whose density exceeds a given threshold? Brown University61

Acknowledgements Philip Klein Yannis Koutis Vahab Mirrokni Clifford Stein Eli Upfal ICERM Imperial College

Brown University63

Brown University64