Why do we want a good ratio anyway? Approximation stability and proxy objectives Avrim Blum Carnegie Mellon University Based on work joint with Pranjal.

Slides:



Advertisements
Similar presentations
Cuts, Trees, and Electrical Flows Aleksander Mądry.
Advertisements

Lecturer: Moni Naor Algorithmic Game Theory Uri Feige Robi Krauthgamer Moni Naor Lecture 8: Regret Minimization.
Lecture 24 Coping with NPC and Unsolvable problems. When a problem is unsolvable, that's generally very bad news: it means there is no general algorithm.
Primal-Dual Algorithms for Connected Facility Location Chaitanya SwamyAmit Kumar Cornell University.
Online learning, minimizing regret, and combining expert advice
Online Social Networks and Media. Graph partitioning The general problem – Input: a graph G=(V,E) edge (u,v) denotes similarity between u and v weighted.
On a Theory of Similarity Functions for Learning and Clustering Avrim Blum Carnegie Mellon University [Includes work joint with Nina Balcan, Nati Srebro,
Mechanism Design without Money Lecture 4 1. Price of Anarchy simplest example G is given Route 1 unit from A to B, through AB,AXB OPT– route ½ on AB and.
ALADDIN Workshop on Graph Partitioning in Vision and Machine Learning Jan 9-11, 2003 Welcome! [Organizers: Avrim Blum, Jon Kleinberg, John Lafferty, Jianbo.
Approximation Algoirthms: Semidefinite Programming Lecture 19: Mar 22.
Nash Equilibrium ( p *, q * ) is a N.E. – no player has any incentive to move: PPAD hard problem [DGP’06; CD’06] Q: Why are they so extensively studied?
Clustering and greedy algorithms — Part 2 Prof. Noah Snavely CS1114
Co-Training and Expansion: Towards Bridging Theory and Practice Maria-Florina Balcan, Avrim Blum, Ke Yang Carnegie Mellon University, Computer Science.
Item Pricing for Revenue Maximization in Combinatorial Auctions Maria-Florina Balcan, Carnegie Mellon University Joint with Avrim Blum and Yishay Mansour.
1 Optimization problems such as MAXSAT, MIN NODE COVER, MAX INDEPENDENT SET, MAX CLIQUE, MIN SET COVER, TSP, KNAPSACK, BINPACKING do not have a polynomial.
Sublinear Algorithms for Approximating Graph Parameters Dana Ron Tel-Aviv University.
Robust Network Design with Exponential Scenarios By: Rohit Khandekar Guy Kortsarz Vahab Mirrokni Mohammad Salavatipour.
Sublinear time algorithms Ronitt Rubinfeld Blavatnik School of Computer Science Tel Aviv University TexPoint fonts used in EMF. Read the TexPoint manual.
Analysis of Algorithms CS 477/677
Computational Complexity, Physical Mapping III + Perl CIS 667 March 4, 2004.
Machine Learning for Mechanism Design and Pricing Problems Avrim Blum Carnegie Mellon University Joint work with Maria-Florina Balcan, Jason Hartline,
Complexity (Running Time)
Preference Analysis Joachim Giesen and Eva Schuberth May 24, 2006.
1 Combinatorial Dominance Analysis Keywords: Combinatorial Optimization (CO) Approximation Algorithms (AA) Approximation Ratio (a.r) Combinatorial Dominance.
Facility Location with Nonuniform Hard Capacities Martin Pál Éva Tardos Tom Wexler Cornell University.
A Discriminative Framework for Clustering via Similarity Functions Maria-Florina Balcan Carnegie Mellon University Joint with Avrim Blum and Santosh Vempala.
NP-complete and NP-hard problems. Decision problems vs. optimization problems The problems we are trying to solve are basically of two kinds. In decision.
Clustering and greedy algorithms Prof. Noah Snavely CS1114
Maria-Florina Balcan A Theoretical Model for Learning from Labeled and Unlabeled Data Maria-Florina Balcan & Avrim Blum Carnegie Mellon University, Computer.
A Theory of Learning and Clustering via Similarity Functions Maria-Florina Balcan 09/17/2007 Joint work with Avrim Blum and Santosh Vempala Carnegie Mellon.
On Kernels, Margins, and Low- dimensional Mappings or Kernels versus features Nina Balcan CMU Avrim Blum CMU Santosh Vempala MIT.
Approximation Algorithms
A tale of 2-dimensional (Guillotine) bin packing Nikhil Bansal (IBM) Andrea Lodi (Univ. of Bologna, Italy) Maxim Sviridenko (IBM)
Improved results for a memory allocation problem Rob van Stee University of Karlsruhe Germany Leah Epstein University of Haifa Israel WADS 2007 WAOA 2007.
1.1 Chapter 1: Introduction What is the course all about? Problems, instances and algorithms Running time v.s. computational complexity General description.
Approximation Algorithms for Stochastic Combinatorial Optimization Part I: Multistage problems Anupam Gupta Carnegie Mellon University.
Randomized Algorithm. NP-Complete Problem  A problem that, right now, we need exhaustive search  Example:  SAT  TSP  Vertex Cover  Etc.
Nattee Niparnan. Easy & Hard Problem What is “difficulty” of problem? Difficult for computer scientist to derive algorithm for the problem? Difficult.
Small clique detection and approximate Nash equilibria Danny Vilenchik UCLA Joint work with Lorenz Minder.
Analysis of Algorithms These slides are a modified version of the slides used by Prof. Eltabakh in his offering of CS2223 in D term 2013.
Packing Rectangles into Bins Nikhil Bansal (CMU) Joint with Maxim Sviridenko (IBM)
Image segmentation Prof. Noah Snavely CS1114
CLUSTERABILITY A THEORETICAL STUDY Margareta Ackerman Joint work with Shai Ben-David.
Connections between Learning Theory, Game Theory, and Optimization Maria Florina (Nina) Balcan Lecture 14, October 7 th 2010.
The Matroid Median Problem Viswanath Nagarajan IBM Research Joint with R. Krishnaswamy, A. Kumar, Y. Sabharwal, B. Saha.
Nattee Niparnan Dept. of Computer Engineering, Chulalongkorn University.
1 The Theory of NP-Completeness 2 Cook ’ s Theorem (1971) Prof. Cook Toronto U. Receiving Turing Award (1982) Discussing difficult problems: worst case.
NP-COMPLETE PROBLEMS. Admin  Two more assignments…  No office hours on tomorrow.
Finding Low Error Clusterings TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAA Maria-Florina Balcan.
Connections between Learning Theory, Game Theory, and Optimization Maria Florina (Nina) Balcan Lecture 2, August 26 th 2010.
Analysis of Algorithm. Why Analysis? We need to know the “behavior” of algorithms – How much resource (time/space) does it use So that we know when to.
Harnessing implicit assumptions in problem formulations: Approximation-stability and proxy objectives Avrim Blum Carnegie Mellon University Based on work.
Improved Equilibria via Public Service Advertising Maria-Florina Balcan TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.:
Hedonic Clustering Games Moran Feldman Joint work with: Seffi Naor and Liane Lewin-Eytan.
Unique Games Approximation Amit Weinstein Complexity Seminar, Fall 2006 Based on: “Near Optimal Algorithms for Unique Games" by M. Charikar, K. Makarychev,
Stability Yields a PTAS for k-Median and k-Means Clustering
Correlation Clustering Nikhil Bansal Joint Work with Avrim Blum and Shuchi Chawla.
Correlation Clustering Shuchi Chawla Carnegie Mellon University Joint work with Nikhil Bansal and Avrim Blum.
Lecture. Today Problem set 9 out (due next Thursday) Topics: –Complexity Theory –Optimization versus Decision Problems –P and NP –Efficient Verification.
Clustering Data Streams A presentation by George Toderici.
The NP class. NP-completeness Lecture2. The NP-class The NP class is a class that contains all the problems that can be decided by a Non-Deterministic.
Reconstructing Preferences from Opaque Transactions Avrim Blum Carnegie Mellon University Joint work with Yishay Mansour (Tel-Aviv) and Jamie Morgenstern.
Correlation Clustering
Clustering Data Streams
Generalized Sparsest Cut and Embeddings of Negative-Type Metrics
Unsupervised Learning
A Polynomial-time Tree Decomposition for Minimizing Congestion
Algorithms for Routing Node-Disjoint Paths in Grids
Possibilities and Limitations in Computation
k-center Clustering under Perturbation Resilience
Presentation transcript:

Why do we want a good ratio anyway? Approximation stability and proxy objectives Avrim Blum Carnegie Mellon University Based on work joint with Pranjal Awasthi, Nina Balcan, Anupam Gupta, Or Sheffet, and Santosh Vempala Approximation Algorithms: the last decade and the next, Princeton/DIMACS, 2011

Not to suggest that we don’t want a good approximation ratio… But to suggest thinking about why we want one, especially when the problem at hand is hard to approximate in the worst case. Especially, if the objective is really a proxy (perhaps for something we can’t measure). Theme of this talk

Theory says a lot of the most interesting problems we want to solve are (NP-)hard. Even hard to approximate well. But it may be there are assumptions we are making implicitly about the structure of the input, if objective is a proxy. If make these explicit up front, can give alg more to work with, and potentially get around hardness barriers. Theme of this talk

Clustering Clustering

Given a set of documents or search results, cluster them by topic. Given a collection of protein sequences, cluster them by function. Given a set of images of people, cluster by who is in them. … Clustering comes up in many places

Given a set of documents or search results, cluster them by topic. Given a collection of protein sequences, cluster them by function. Given a set of images of people, cluster by who is in them. … Standard approach

Come up with some set of features (like words in document, various image features,…) Use to view data as points in metric space Run clustering algorithm on points. Hope it gives a good output.

Come up with some set of features (like words in document, various image features,…) Use to view data as points in metric space Pick some objective to optimize like k- median, k-means, min-sum,… Standard theoretical approach

Come up with some set of features (like words in document, various image features,…) Use to view data as points in metric space Pick some objective to optimize like k- median, k-means, min-sum,… –E.g., k-median asks: find center pts c 1, c 2, …, c k to minimize  x min i d(x,c i ) Standard theoretical approach z x y c1c1 c2c2 –k-means asks: find c 1, c 2, …, c k to minimize  x min i d 2 (x,c i ) –Min-sum asks: find k clusters minimizing sum of intra-cluster distances.

Come up with some set of features (like words in document, various image features,…) Use to view data as points in metric space Pick some objective to optimize like k- median, k-means, min-sum,… Develop algorithm to (approx) optimize this objective. (E.g., best known for k-median is 3+  approx [AGKMMP04]. k-means is 9+ , min-sum is (log n) 1+ ². Beating 1 + 1/e is NP-hard [JMS02].) Can we do better… on the cases where doing better would matter? Standard theoretical approach

Remember, what we really wanted was to cluster proteins by function, etc. Implicitly hoping that getting c-approx to our objective will allow us to get most points correct. –This is an assumption about how the distance measure and objective relate to the clustering we are looking for. –What happens if you make it explicit? Why do we want to get a c=2 or c=1.1 approx? Can we do better… on the cases where doing better would matter?

Remember, what we really wanted was to cluster proteins by function, etc. Assume: all c-approximations are  -close (as clusterings) to unknown target. I.e., getting c-approx to objective implies getting  -error wrt real goal. Question: does this buy you anything? Answer: Yes (for clustering with k-median, k-means, or min- sum objectives) –For any constant c>1, can use to get O( ² )-close to target. Even though getting a c-apx may be NP-hard (for min-sum, needed large clusters. Improved by [Balcan-Braverman] ) –For k-means, k-median, can actually get c-apx (and therefore, ² -close), if cluster sizes > ² n. Why do we want to get a c=2 or c=1.1 approx?

Remember, what we really wanted was to cluster proteins by function, etc. Assume: all c-approximations are  -close (as clusterings) to unknown target. I.e., getting c-approx to objective implies getting  -error wrt real goal. –Could say: maybe I just think believe c-approx’s of the kind my algorithm finds are close (but then what’s special about them?) –Or: maybe just that most c-apx are close (good question for future work!) –But for now: what mileage do we get if we assume all good apx to our objective are close to target? Why do we want to get a c=2 or c=1.1 approx?

Approximation-stability Instance is (c,  )-apx-stable for objective  : any c-approximation to  has error ·  –“error” is in terms of distance in solution space. For clustering, we use the fraction of points you would have to reassign to match target. How are we going to use this to cluster well if we don’t know how to get a c-approximation? Will show one result from [Balcan-Blum-Gupta09] for getting error O( ² /(c-1)) under stability to k-median

Clustering from ( c,  ) k-median stability For simplicity, say target is k-median opt, and for now, that all cluster sizes > 2  n. For any x, let w(x)=dist to own center, w 2 (x)=dist to 2 nd -closest center. Let w avg =avg x w(x). Then: –At most  n pts can have w 2 (x) < (c-1)w avg / . –At most 5  n/(c-1) pts can have w(x) ≥ (c-1)w avg /5 . All the rest (the good pts) have a big gap. x [OPT = n  w avg ]

Clustering from ( c,  ) k-median stability Define critical distance d crit =(c-1)w avg /5 . So, a 1-O(  ) fraction of pts look like: x y z · d crit > 4d crit · 2d crit –At most  n pts can have w 2 (x) < (c-1)w avg / . –At most 5  n/(c-1) pts can have w(x) ≥ (c-1)w avg /5 . All the rest (the good pts) have a big gap.

Clustering from ( c,  ) k-median stability So if we define a graph G connecting any two pts within distance ≤ 2d crit, then: –Good pts within cluster form a clique –Good pts in different clusters have no common nbrs x y z · d crit > 4d crit · 2d crit So, a 1-O(  ) fraction of pts look like:

Clustering from ( c,  ) k-median stability So if we define a graph G connecting any two pts within distance ≤ 2d crit, then: –Good pts within cluster form a clique –Good pts in different clusters have no common nbrs So, the world now looks like:

Clustering from ( c,  ) k-median stability If furthermore all clusters have size > 2b+1, where b = # bad pts = O(  n/(c-1)), then: –Create graph H where connect x,y if share > b nbrs in common in G. –Output k largest components in H. So, the world now looks like: (only makes mistakes on bad points)

Clustering from ( c,  ) k-median stability If clusters not so large, then need to be a bit more careful but can still get error O(  c-1)). Could have some clusters dominated by bad pts… Actually, just need to modify algorithm a bit (but won’t go into here).

O(  )-close   -close Back to the large-cluster case: can improve to get  -close. (for any c>1, but “large” depends on c). Idea: Really two kinds of bad pts. –At most  n “confused”: w 2 (x)-w(x) < (c-1)w avg / . –Rest not confused, just far: w(x) ≥ (c-1)w avg /5 . Can recover the non-confused ones…

· d crit O(  )-close   -close Back to the large-cluster case: can improve to get  -close. (for any c>1, but “large” depends on c). Idea: Really two kinds of bad pts. –At most  n “confused”: w 2 (x)-w(x) < (c-1)w avg / . –Rest not confused, just far: w(x) ≥ (c-1)w avg /5 . Can recover the non-confused ones… non-confused bad pt w(x) w 2 (x) w 2 (x) – w(x) ¸ 5 d crit

Back to the large-cluster case: can improve to get  -close. (for any c>1, but “large” depends on c). –Given output C’ from alg so far, reclassify each x into cluster of lowest median distance –Median is controlled by good pts, which will pull the non-confused points in the right direction. · d crit non-confused bad pt w(x) w 2 (x) w 2 (x) – w(x) ¸ 5 d crit O(  )-close   -close

Back to the large-cluster case: can improve to get  -close. (for any c>1, but “large” depends on c). –Given output C’ from alg so far, reclassify each x into cluster of lowest median distance –Median is controlled by good pts, which will pull the non-confused points in the right direction. O(  )-close   -close A bit like 2-rounds of k-means/Lloyd’s algorithm

Extensions [Awasthi-B-Sheffet’10] All  -far solutions are not c-approximations  Strictly weaker condition if all target clusters of size ¸ ² n. (Since that implies a k-1 clustering can’t be ² -close) Deleting a center of OPT is not a c-approximation

Extensions [Awasthi-B-Sheffet’10] All  -far solutions are not c-approximations  Under this condition, for k-median/means, get PTAS: 1+ ® apx in polynomial time (exponential in 1/ ®, 1/(c-1))  Implies getting ² -close solution under original condition (set 1+ ® = c). Deleting a center of OPT is not a c-approximation

What about other problems?

Nash equilibria? Sparsest cut? Phylogenetic Trees?

What about other problems? Nash equilibria What if the reason we want to find an apx Nash equilibrium is to predict how people will play? Then it’s natural to focus on games where all apx equilibria are close to each other. Does this make the problem easier to solve? (a,b) (p*,q*) ¢ All ² -equilibria inside this ball

What about other problems? Nash equilibria (a,b) (p*,q*) ¢ All ² -equilibria inside this ball ¢ 1 ²2² Best general bound is [Lipton,Markakis,Mehta] Can say a little… We can get:

What about other problems? Nash equilibria ¢ 1 ²2² Best general bound is [Lipton,Markakis,Mehta] Can say a little… We can get: Question: get poly-time for all ¢ = O( ² )? [Balcan-Braverman]: Extensions to reversed-quantifier condition. Connections to perturbation-stability. LB of ¢ = O( ² 1/4 ) for helping for ² -Nash

What about other problems? Sparsest cut? Best apx is O((log n) 1/2 ) [ARV] What if the reason you wanted a good cut was to partition cats from dogs, segment an image, etc? (edges represent similarity) Implicitly hoping good apx implies low error… What if assume any 10-apx has error · ² ? Minimize e(A,B)/(|A|*|B|) AB G

What about other problems? Phylogenetic Trees? Trying to reconstruct evolutionary trees Often posed as a Steiner-tree-like optimization problem. But really our goal is to get structure close to the correct answer

Summary For clustering, can say “if data has the property that a 1.1 apx to [pick one: k-median, k-means, min- sum] would be sufficient to have error ² then we can get error O( ² )” …even though you might think NP-hardness results for approximating these objectives would preclude this. Makes sense to examine for other optimization problems where objective function may be a proxy for something else. Nash equilibria Sparsest cut? Evolutionary trees? Or, to weaken assumption: suggests sub-objectives for approximation algorithms.

Further open questions (clustering) For results described to be interesting, need c-1 > ².  General error bound is O( ² /(c-1)).  “Large cluster” condition of [BBG] requires cluster size > 15 ² n/(c-1).  Deletion condition for PTAS of [ABS] implied by approx- stability when clusters > ² n. Poly-time if c-1 is constant. Q: what if only assume that (1+ ² )-apx is ² -close?