Correlation Clustering Nikhil Bansal Joint Work with Avrim Blum and Shuchi Chawla.

Slides:



Advertisements
Similar presentations
Approximation algorithms for geometric intersection graphs.
Advertisements

Circuit and Communication Complexity. Karchmer – Wigderson Games Given The communication game G f : Alice getss.t. f(x)=1 Bob getss.t. f(y)=0 Goal: Find.
Minimum Clique Partition Problem with Constrained Weight for Interval Graphs Jianping Li Department of Mathematics Yunnan University Jointed by M.X. Chen.
Minimum Vertex Cover in Rectangle Graphs
Poly-Logarithmic Approximation for EDP with Congestion 2
Lecture 24 Coping with NPC and Unsolvable problems. When a problem is unsolvable, that's generally very bad news: it means there is no general algorithm.
Primal-Dual Algorithms for Connected Facility Location Chaitanya SwamyAmit Kumar Cornell University.
1 Maximal Independent Set. 2 Independent Set (IS): In a graph G=(V,E), |V|=n, |E|=m, any set of nodes that are not adjacent.
Combinatorial Algorithms
CS774. Markov Random Field : Theory and Application Lecture 17 Kyomin Jung KAIST Nov
Incidences and Many Faces via cuttings Sivanne Goldfarb
Approximating Maximum Edge Coloring in Multigraphs
1 Maximal Independent Set. 2 Independent Set (IS): In a graph, any set of nodes that are not adjacent.
Graph Sparsifiers by Edge-Connectivity and Random Spanning Trees Nick Harvey University of Waterloo Department of Combinatorics and Optimization Joint.
Graph Algorithms: Minimum Spanning Tree We are given a weighted, undirected graph G = (V, E), with weight function w:
CPSC 411, Fall 2008: Set 4 1 CPSC 411 Design and Analysis of Algorithms Set 4: Greedy Algorithms Prof. Jennifer Welch Fall 2008.
Rigidity and Persistence of Directed Graphs
Minimum-Cost Spanning Tree weighted connected undirected graph spanning tree cost of spanning tree is sum of edge costs find spanning tree that has minimum.
Online Oblivious Routing Nikhil Bansal, Avrim Blum, Shuchi Chawla & Adam Meyerson Carnegie Mellon University 6/7/2003.
1 Algorithmic Aspects in Property Testing of Dense Graphs Oded Goldreich – Weizmann Institute Dana Ron - Tel-Aviv University.
Primal-Dual Algorithms for Connected Facility Location Chaitanya SwamyAmit Kumar Cornell University.
1 Introduction to Approximation Algorithms Lecture 15: Mar 5.
(work appeared in SODA 10’) Yuk Hei Chan (Tom)
Approximation Algorithms Motivation and Definitions TSP Vertex Cover Scheduling.
Improved results for a memory allocation problem Rob van Stee University of Karlsruhe Germany Leah Epstein University of Haifa Israel WADS 2007 WAOA 2007.
Clustering Unsupervised learning Generating “classes”
V. V. Vazirani. Approximation Algorithms Chapters 3 & 22
1 Introduction to Approximation Algorithms. 2 NP-completeness Do your best then.
 Jim has six children.  Chris fights with Bob,Faye, and Eve all the time; Eve fights (besides with Chris) with Al and Di all the time; and Al and Bob.
Online Oblivious Routing Nikhil Bansal, Avrim Blum, Shuchi Chawla & Adam Meyerson Carnegie Mellon University 6/7/2003.
Graph Sparsifiers Nick Harvey University of British Columbia Based on joint work with Isaac Fung, and independent work of Ramesh Hariharan & Debmalya Panigrahi.
APPROXIMATION ALGORITHMS VERTEX COVER – MAX CUT PROBLEMS
Approximating Minimum Bounded Degree Spanning Tree (MBDST) Mohit Singh and Lap Chi Lau “Approximating Minimum Bounded DegreeApproximating Minimum Bounded.
1 Introduction to Approximation Algorithms. 2 NP-completeness Do your best then.
Approximating the Minimum Degree Spanning Tree to within One from the Optimal Degree R 陳建霖 R 宋彥朋 B 楊鈞羽 R 郭慶徵 R
Ch. 6 - Approximation via Reweighting Presentation by Eran Kravitz.
A Clustering Algorithm based on Graph Connectivity Balakrishna Thiagarajan Computer Science and Engineering State University of New York at Buffalo.
Primal-Dual Algorithms for Connected Facility Location Chaitanya SwamyAmit Kumar Cornell University.
Approximation algorithms for Path-Planning and Clustering problems on graphs (Thesis Proposal) Shuchi Chawla Carnegie Mellon University.
Computational Geometry Piyush Kumar (Lecture 10: Point Location) Welcome to CIS5930.
Approximation Algorithms for NP-hard Combinatorial Problems Magnús M. Halldórsson Reykjavik University Local Search, Greedy and Partitioning
CSE 589 Part VI. Reading Skiena, Sections 5.5 and 6.8 CLR, chapter 37.
Implicit Hitting Set Problems Richard M. Karp Erick Moreno Centeno DIMACS 20 th Anniversary.
Hedonic Clustering Games Moran Feldman Joint work with: Seffi Naor and Liane Lewin-Eytan.
1 22c:31 Algorithms Minimum-cost Spanning Tree (MST)
Complexity and Efficient Algorithms Group / Department of Computer Science Testing the Cluster Structure of Graphs Christian Sohler joint work with Artur.
Matroids, Secretary Problems, and Online Mechanisms Nicole Immorlica, Microsoft Research Joint work with Robert Kleinberg and Moshe Babaioff.
Algorithms for hard problems Introduction Juris Viksna, 2015.
Probabilistic Equational Reasoning Arthur Kantor
Approximation Algorithms for Path-Planning Problems Nikhil Bansal, Avrim Blum, Shuchi Chawla and Adam Meyerson Carnegie Mellon University.
Final Review Chris and Virginia. Overview One big multi-part question. (Likely to be on data structures) Many small questions. (Similar to those in midterm.
Correlation Clustering Shuchi Chawla Carnegie Mellon University Joint work with Nikhil Bansal and Avrim Blum.
Network Partition –Finding modules of the network. Graph Clustering –Partition graphs according to the connectivity. –Nodes within a cluster is highly.
Approximation Algorithms by bounding the OPT Instructor Neelima Gupta
Approximation Algorithms Department of Mathematics and Computer Science Drexel University.
Instructor Neelima Gupta Table of Contents Introduction to Approximation Algorithms Factor 2 approximation algorithm for TSP Factor.
Shuchi Chawla, Carnegie Mellon University Guessing Secrets Efficiently Shuchi Chawla 1/23/2002.
TU/e Algorithms (2IL15) – Lecture 11 1 Approximation Algorithms.
Correlation Clustering
Approximating the MST Weight in Sublinear Time
Minimum Spanning Tree 8/7/2018 4:26 AM
Haim Kaplan and Uri Zwick
Chapter 5. Optimal Matchings
Maximal Independent Set
Haim Kaplan and Uri Zwick
Coverage Approximation Algorithms
Lecture 14 Shortest Path (cont’d) Minimum Spanning Tree
Clustering.
Lecture 14 Minimum Spanning Tree (cont’d)
Lecture 13 Shortest Path (cont’d) Minimum Spanning Tree
Presentation transcript:

Correlation Clustering Nikhil Bansal Joint Work with Avrim Blum and Shuchi Chawla

2 Clustering Say we want to cluster n objects of some kind (documents, images, text strings) But we don’t have a meaningful way to project into Euclidean space. Idea of Cohen, McCallum, Richman: use past data to train up f(x,y)=same/different. Then run f on all pairs and try to find most consistent clustering.

3 The problem Harry Bovik H. Bovik Harry B. Tom X.

4 The problem + + +: Same -: Different Harry Bovik H. Bovik Tom X. Harry B. Train up f(x)= same/different Run f on all pairs +

5 The problem + + +: Same -: Different Harry Bovik H. Bovik Tom X. Harry B. + Totally consistent: 1. + edges inside clusters 2. – edges outside clusters

6 The problem + +: Same -: Different Harry Bovik H. Bovik Harry B. Tom X. Train up f(x)= same/different Run f on all pairs +

7 The problem + + +: Same -: Different Disagreement Harry Bovik H. Bovik Tom X. Harry B. Train up f(x)= same/different Run f on all pairs Find most consistent clustering

8 The problem + + +: Same -: Different Disagreement H. Bovik Harry Bovik Harry B. Tom X. Train up f(x)= same/different Run f on all pairs Find most consistent clustering

9 Problem: Given a complete graph on n vertices. Each edge labeled + or -. Goal is to find partition of vertices as consistent as possible with edge labels. Max #(agreements) or Min #( disagreements) There is no k : # of clusters could be anything The problem + + +: Same -: Different Disagreement Harry Bovik H. Bovik Harry B. Tom X.

10 The Problem Noise Removal: There is some true clustering. However some edges incorrect. Still want to do well. Agnostic Learning: No inherent clustering. Try to find the best representation using hypothesis with limited representation power. Eg: Research communities via collaboration graph

11 Our results Constant-factor approx for minimizing disagreements. PTAS for maximizing agreements. Results for random noise case.

12 PTAS for maximizing agreements Easy to get ½ of the edges Goal: additive apx of  n 2. Standard approach: Draw small sample, Guess partition of sample, Compute partition of remainder. Can do directly, or plug into General Property Tester of [GGR]. Running time doubly exp’l in , or singly with bad exponent.

13 Minimizing Disagreements Goal: Get a constant factor approx. Problem: Even if we can find a cluster that’s as good as best cluster of OPT, we’re headed toward O(log n) [Set-Cover like analysis] Need a way of lower-bounding D opt.

14 Lower bounding idea: bad triangles Consider + + We know any clustering has to disagree with at least one of these edges.

15 Lower bounding idea: bad triangles If several such disjoint, then mistake on each one D opt ¸ #{Edge disjoint bad triangles} Edge disjoint Bad Triangles (1,2,3), (3,4,5) (Not Tight)

16 Lower bounding idea: bad triangles If several such, then mistake on each one How can we use this ? D opt ¸ #{Edge disjoint bad triangles} Edge disjoint Bad Triangles (1,2,3), (3,4,5)

17  -clean Clusters Given a clustering, vertex  -good if few disagreements Cluster C  -clean if all v 2 C are  -good v is  -good N - (v) Within C <  |C| N + (v) Outside C <  |C| C +: Similar -: Dissimilar Essentially, N + (v) ¼ C(v)

18 Observation Any  -clean clustering is 8 approx for  <1/4 Idea: Charging mistakes to bad triangles u v w Intuitively, enough choices of w for each wrong edge (u,v)

19 Observation Any  -clean clustering is 8 approx for  <1/4 Idea: Charging mistakes to bad triangles u v w Intuitively, enough choices of w for each wrong edge (u,v) Can find edge-disjoint bad triangles Similar argument for +ve edges between 2 clusters - + u’

20 General Structure of Argument Any  -clean clustering is 8 approx for  <1/4 Consequence: Just need to produce a  -clean clustering for  <1/4

21 General Structure of Argument Any  -clean clustering is 8 approx for  <1/4 Consequence: Just need to produce a  -clean clustering for  <1/4 Bad News: May not be possible !!

22 General Structure of Argument Any  -clean clustering is 8 approx for  <1/4 Consequence: Just need to produce a  -clean clustering for  <1/4 Approach: 1) Clustering of a special form: Opt(  ) Clusters either  clean or singletons Not many more mistakes than Opt 2) Produce something “close’’ to Opt(  ) Bad News: May not be possible !!

23 Existence of OPT(  ) Opt C1C1 C2C2

24 Existence of OPT(  ) Identify  -bad vertices Opt  /3-bad vertices C1C1 C2C2

25 Existence of OPT(  ) 1) Move  /3-bad vertices out 2) If “many” ( ¸  /3)  /3-bad, “split” Opt OPT(  ) Split C1C1 C2C2 Opt(  ) : Singletons and  -clean clusters D Opt(  ) = O(1) D Opt

26 Main Result Opt(  )  -clean

27 Main Result Opt(  ) Algorithm 11  -clean  -clean Guarantee: Non-Singletons 11  clean Singletons subset of Opt(  )

28 Main Result Opt(  ) Algorithm 11  -clean  -clean Choose  =1/44 Non-singletons: ¼-clean Bound mistakes among non- singletons by bad trianges Involving singletons By those of Opt(  ) Guarantee: Non-Singletons 11  clean Singletons subset of Opt(  )

29 Main Result Opt(  ) Algorithm 11  -clean  -clean Choose  =1/44 Non-singletons: ¼-clean Bound mistakes among non- singletons by bad trianges Involving singletons By those of Opt(  ) Approx. ratio= 9/  Guarantee: Non-Singletons 11  clean Singletons subset of Opt(  )

30 Open Problems How about a small constant? Is D opt · 2 #{Edge disjoint bad triangles} ? Is D opt · 2 #{Fractional e.d. bad triangles}? Extend to {-1, 0, +1} weights: apx to constant or even log factor? Extend to weighted case? Clique Partitioning Problem Cutting plane algorithms [Grotschel et al]

31

32

33 The problem + + +: Same -: Different Disagreement H. Bovik Harry Bovik Harry B. Tom X. Train up f(x)= same/different Run f on all pairs Find most consistent clustering

34 There’s no k. (OPT can have anywhere from 1 to n clusters) If a perfect solution exists, then it’s easy to find: C(v) = N + (v). Easy to get agreement on ½ of edges. Nice features of formulation

35 Algorithm 1. Pick vertex v. Let C(v) = N (v) 2. Modify C(v) (a) Remove 3  -bad vertices from C(v). (b) Add 7  good vertices into C(v). 3. Delete C(v). Repeat until done, or above always makes empty clusters. 4. Output nodes left as singletons. +

36 Step 1 Choose v, C= + neighbors of v C1C1 C2C2 v C

37 Step 2 v C Vertex Removal Phase: If x is 3  bad, C=C-{x} C1C1 C2C2 1)No vertex in C 1 removed. 2)All vertices in C 2 removed

38 Step 3 v C Vertex Addition Phase: Add 7  -good vertices to C C1C1 C2C2 1)All remaining vertices in C 1 will be added 2)None in C 2 added 3)Cluster C is 11  -clean

39 Case 2: v Singleton in OPT(  ) v C Choose v, C= +1 neighbors of v Same idea works

40 The problem