Multiplicative weights method: A meta algorithm with applications to linear and semi-definite programming Sanjeev Arora Princeton University Based upon:

Slides:

Advertisements

Similar presentations

Geometry and Expansion: A survey of some results Sanjeev Arora Princeton ( touches upon: S. A., Satish Rao, Umesh Vazirani, STOC04; S. A., Elad Hazan,

Advertisements

Sublinear-time Algorithms for Machine Learning Ken Clarkson Elad Hazan David Woodruff IBM Almaden Technion IBM Almaden.

Iterative Rounding and Iterative Relaxation

Primal Dual Combinatorial Algorithms Qihui Zhu May 11, 2009.

Satyen Kale (Yahoo! Research) Joint work with Sanjeev Arora (Princeton)

C&O 355 Mathematical Programming Fall 2010 Lecture 9

C&O 355 Lecture 23 N. Harvey TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A A A A A A A A A.

The Randomization Repertoire Rajmohan Rajaraman Northeastern University, Boston May 2012 Chennai Network Optimization WorkshopThe Randomization Repertoire1.

Taming the monster: A fast and simple algorithm for contextual bandits

1 Learning with continuous experts using Drifting Games work with Robert E. Schapire Princeton University work with Robert E. Schapire Princeton University.

Continuous optimization Problems and successes

Semi-Definite Algorithm for Max-CUT Ran Berenfeld May 10,2005.

Approximation Algoirthms: Semidefinite Programming Lecture 19: Mar 22.

Graph Sparsifiers: A Survey Nick Harvey Based on work by: Batson, Benczur, de Carli Silva, Fung, Hariharan, Harvey, Karger, Panigrahi, Sato, Spielman,

Graph Sparsifiers: A Survey Nick Harvey UBC Based on work by: Batson, Benczur, de Carli Silva, Fung, Hariharan, Harvey, Karger, Panigrahi, Sato, Spielman,

Sparsest Cut S S  G) = min |E(S, S)| |S| S µ V G = (V, E) c- balanced separator  G) = min |E(S, S)| |S| S µ V c |S| ¸ c ¢ |V| Both NP-hard.

Totally Unimodular Matrices Lecture 11: Feb 23 Simplex Algorithm Elliposid Algorithm.

Semidefinite Programming

Expander flows, geometric embeddings, and graph partitioning Sanjeev Arora Princeton Satish Rao UC Berkeley Umesh Vazirani UC Berkeley.

Approximation Algorithms

Online Oblivious Routing Nikhil Bansal, Avrim Blum, Shuchi Chawla & Adam Meyerson Carnegie Mellon University 6/7/2003.

CISS Princeton, March Optimization via Communication Networks Matthew Andrews Alcatel-Lucent Bell Labs.

Distributed Combinatorial Optimization

(work appeared in SODA 10’) Yuk Hei Chan (Tom)

Experts and Boosting Algorithms. Experts: Motivation Given a set of experts –No prior information –No consistent behavior –Goal: Predict as the best expert.

A General Approach to Online Network Optimization Problems Seffi Naor Computer Science Dept. Technion Haifa, Israel Joint work: Noga Alon, Yossi Azar,

Tight Integrality Gaps for Lovász-Schrijver LP relaxations of Vertex Cover Grant Schoenebeck Luca Trevisan Madhur Tulsiani UC Berkeley.

Finding Almost-Perfect

Experts Learning and The Minimax Theorem for Zero-Sum Games Maria Florina Balcan December 8th 2011.

Daniel Kroening and Ofer Strichman Decision Procedures An Algorithmic Point of View Deciding ILPs with Branch & Bound ILP References: ‘Integer Programming’

1/24 Algorithms for Generalized Caching Nikhil Bansal IBM Research Niv Buchbinder Open Univ. Israel Seffi Naor Technion.

Integrality Gaps for Sparsest Cut and Minimum Linear Arrangement Problems Nikhil R. Devanur Subhash A. Khot Rishi Saket Nisheeth K. Vishnoi.

Multiplicative Weights Algorithms CompSci Instructor: Ashwin Machanavajjhala 1Lecture 13 : Fall 12.

Arora: SDP + Approx Survey Semidefinite Programming and Approximation Algorithms for NP-hard Problems: A Survey Sanjeev Arora Princeton University.

C&O 355 Mathematical Programming Fall 2010 Lecture 17 N. Harvey TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AA A.

Approximation Algorithms for Stochastic Combinatorial Optimization Part I: Multistage problems Anupam Gupta Carnegie Mellon University.

Online Oblivious Routing Nikhil Bansal, Avrim Blum, Shuchi Chawla & Adam Meyerson Carnegie Mellon University 6/7/2003.

online convex optimization (with partial information)

The Multiplicative Weights Update Method Based on Arora, Hazan & Kale (2005) Mashor Housh Oded Cats Advanced simulation methods Prof. Rubinstein.

1 Introduction to Approximation Algorithms. 2 NP-completeness Do your best then.

Approximation Algorithms Department of Mathematics and Computer Science Drexel University.

1 Multiplicative Weights Update Method Boaz Kaminer Andrey Dolgin Based on: Aurora S., Hazan E. and Kale S., “The Multiplicative Weights Update Method:

Learning a Small Mixture of Trees M. Pawan Kumar Daphne Koller Aim: To efficiently learn a.

C&O 355 Mathematical Programming Fall 2010 Lecture 16 N. Harvey TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A A.

Algorithmic Game Theory and Internet Computing Vijay V. Vazirani Georgia Tech Primal-Dual Algorithms for Rational Convex Programs II: Dealing with Infeasibility.

2) Combinatorial Algorithms for Traditional Market Models Vijay V. Vazirani.

Linear Program Set Cover. Given a universe U of n elements, a collection of subsets of U, S = {S 1,…, S k }, and a cost function c: S → Q +. Find a minimum.

Implicit Hitting Set Problems Richard M. Karp Erick Moreno Centeno DIMACS 20 th Anniversary.

Lecture.6. Table of Contents Lp –rounding Dual Fitting LP-Duality.

C&O 355 Lecture 24 N. Harvey TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AA A A A A A A A A.

Unique Games Approximation Amit Weinstein Complexity Seminar, Fall 2006 Based on: “Near Optimal Algorithms for Unique Games" by M. Charikar, K. Makarychev,

Graph Partitioning using Single Commodity Flows

Algorithmic Game Theory and Internet Computing Vijay V. Vazirani 3) New Market Models, Resource Allocation Markets.

C&O 355 Lecture 19 N. Harvey TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A A A A A A A A A.

Approximation Algorithms Duality My T. UF.

Linear Programming Piyush Kumar Welcome to CIS5930.

Approximation Algorithms based on linear programming.

Fernando G.S.L. Brandão MSR -> Caltech Faculty Summit 2016

Lap Chi Lau we will only use slides 4 to 19

Generalized Sparsest Cut and Embeddings of Negative-Type Metrics

Topics in Algorithms Lap Chi Lau.

Classification with Perceptrons Reading:

Sum of Squares, Planted Clique, and Pseudo-Calibration

A Combinatorial, Primal-Dual Approach to Semidefinite Programs

Chapter 6. Large Scale Optimization

CSCI B609: “Foundations of Data Science”

On Approximating Covering Integer Programs

Chapter 6. Large Scale Optimization

Optimization on Graphs

Presentation transcript:

Multiplicative weights method: A meta algorithm with applications to linear and semi-definite programming Sanjeev Arora Princeton University Based upon: Fast algorithms for Approximate SDP [FOCS ‘05]  log(n) approximation to SPARSEST CUT in Õ(n 2 ) time [FOCS ‘04] The multiplicative weights update method and it’s applications [’05] See also recent papers by Hazan and Kale.

Multiplicative update rule (long history) Applications: approximate solutions to LPs and SDPs, flow problems, online learning (boosting), derandomization & chernoff bounds, online convex optimization, computational geometry, metricembeddongs, portfolio management… (see our survey) w1w2...wnw1w2...wn n agentsweights Update weights according to performance: w i t+1 Ã w i t (1 +  ¢ performance of i)

Simplest setting – predicting the market N “experts” on TV Can we perform as good as the best expert ? 1$ for correct prediction 0$ for incorrect

Weighted majority algorithm [LW ‘94] Multiplicative update (initially all w i =1): If expert predicted correctly: w i t+1 Ã w i t If incorrectly, w i t+1 Ã w i t (1 -  ) Claim: #mistakes by algorithm ¼ 2(1+  )(#mistakes by best expert) Potential:  t = Sum of weights=  i w i t (initially n) If algorithm predicts incorrectly )  t+1 ·  t -   t /2  T · (1-  ) m(A) n m(A) =# mistakes by algorithm  T ¸ (1-  ) m i ) m(A) · 2(1+  )m i + O(log n/  ) “Predict according to the weighted majority.”

Generalized Weighted majority [A.,Hazan, Kale ‘05] n agents Set of events (possibly infinite) event j expert i payoff M(i,j)

Generalized Weighted majority [AHK ‘05] n agents Set of events (possibly infinite) Claim: After T iterations, Algorithm payoff ¸ (1-  ) best expert – O(log n /  ) Algorithm: plays distribution on experts (p 1,…,p n ) Payoff for event j:  i p i M(i,j) Update rule: p i t+1 Ã p i t (1 +  ¢ M(i,j)) p1p2...pnp1p2...pn

Game playing, Online optimization Lagrangean relaxation Gradient descent Chernoff bounds Games with Matrix Payoffs Fast soln to LPs, SDPs Boosting

Common features of MW algorithms “competition” amongst n experts Appearance of terms like exp( -  t (performance at time t) ) Time to get  -approximate solutions is proportional to 1/  2.

Boosting and AdaBoost (Schapire’91, Freund-Schapire’97) “Weak learning ) Strong learning” Input: 0/1 vectors with a “Yes/No” label (white, tall, vegetarian, smoker,…,) “Has disease” (nonwhite, short, vegetarian, nonsmoker,..) “No disease” Desired: Short OR-of-AND (i.e., DNF) formula that describes  fraction of data (if one exists) (white Æ vegetarian) Ç (nonwhite Æ nonsmoker)  = 1-  “Strong Learning”  = ½ +  “Weak Learning” Main idea in boosting: data points are “experts”, weak learners are “events.”

Approximate solutions to convex programming e.g., Plotkin Shmoys Tardos ’91, Young’97, Garg Koenemann’99, Fleischer’99 MW Meta-Algorithm gives unified view Note: Distribution $ Convex Combination { p i }  i p i =1 If P is a convex set and each x i 2 P then  i p i x i 2 P

Solving LPs (feasibility) a 1 ¢ x ¸ b 1 a 2 ¢ x ¸ b 2  a m ¢ x ¸ b m x 2 P  k w k (a k ¢ x – b k ) ¸ 0 x 2 P P = convex domain -   -  w1w2wmw1w2wm Event Oracle

Solving LPs (feasibility) a 1 ¢ x ¸ b 1 a 2 ¢ x ¸ b 2  a m ¢ x ¸ b m x 2 P w1w2wmw1w2wm [1 –  (a 1 ¢ x 0 – b 1 )/  ] £ [1 –  (a 2 ¢ x 0 – b 2 )/  ] £  [1 –  (a m ¢ x 0 – b m )/  ] £ x0x0 |a k ¢ x 0 – b k | ·   = width  k w k (a k ¢ x – b k ) ¸ 0 x 2 P Oracle Final solution = Average x vector

Performance guarantees In O(  2 log(n)/  2 ) iterations, average x is  feasible. Packing-Covering LPs: [Plotkin, Shmoys, Tardos ’91] 9 ? x 2 P: j = 1, 2, … m: a j ¢ x ¸ 1 Want to find x 2 P s.t. a j ¢ x ¸ 1 -  Assume: 8 x 2 P: 0 · a j ¢ x ·  MW algorithm gets  feasible x in O(  log(n)/  2 ) iterations Covering problem

Connection to Chernoff bounds and derandomization Deterministic approximation algorithms for 0/1 packing/covering problem a la Raghavan-Thompson Solve LP relaxation of integer program. Obtain soln (y i ) Randomized Rounding O(log n) times) x i Ã 1 w/ prob. y i x i Ã 0 w/ prob.1 - y i Derandomize using pessimistic estimators exp(  i t ¢ f(y i )) Young [95] “Randomized rounding without solving the LP.” MW update rule mimics pessimistic estimator.

Semidefinite programming ( Klein-Lu’97 ) a 1 ¢ x ¸ b 1 a 2 ¢ x ¸ b 2  a m ¢ x ¸ b m x 2 P P = a j and x : symmetric matrices in R n £ n {x: x is psd; tr(x) =1} Oracle: max  j w j (a j ¢ x) over P (eigenvalue computation!) Application 2:

The promise of SDPs… approximation to MAX-CUT (GW’94), 7/8-approximation to MAX-3SAT (KZ’00) p log n –approximation to SPARSEST CUT, BALANCED SEPARATOR (ARV’04). p log n-approximation to MIN-2CNF DELETION, MIN-UNCUT (ACMM’05), MIN-VERTEX SEPARATOR (FHL’06),… etc. etc…. The pitfall… High running times ---as high as n 4.5 in recent works

Solving SDP relaxations more efficiently using MW [AHK’05] Problem Using Interior Point Our result M AX QP (e.g. M AX - C UT ) Õ(n 3.5 ) Õ(n 1.5 N/  2.5 ) or Õ(n 3 /  *  3.5 ) H APLO F REQ Õ(n 4 ) Õ(n 2.5 /  2.5 ) SCPÕ(n 4 ) Õ(n 1.5 N/  4.5 ) E MBEDDING Õ(n 4 ) Õ(n 3 /d 5  3.5 ) S PARSEST C UT Õ(n 4.5 ) Õ(n 3 /  2 ) M IN U NCUT etcÕ(n 4.5 )Õ(n 3.5 /  2 )

Key difference between efficient and not-so-efficient implementations of the MW idea: Width management. (e.g., the difference between PST’91 and GK’99)

Recall: issue of width Õ(  2 /  2 ) iterations to obtain  feasible x  = max k |a k ¢ x – b k |  is too large!! MW a 1 ¢ x ¸ b 1 a 2 ¢ x ¸ b 2  a m ¢ x ¸ b m Oracle  k w k (a k ¢ x – b k ) ¸ 0 x 2 P

Issue 1:Dealing with width A few high width constraints Run a hybrid of MW and ellipsoid/Vaidya poly(m, log(  /  )) iterations to obtain  feasible x MW a 1 ¢ x ¸ b 1 a 2 ¢ x ¸ b 2  a m ¢ x ¸ b m Oracle  k w k (a k ¢ x – b k ) ¸ 0 x 2 P

Hybrid of MW and Vaidya Õ(  L 2 /  2 ) iterations to obtain  feasible x L ¿ L ¿  MW a 1 ¢ x ¸ b 1 a 2 ¢ x ¸ b 2  a m ¢ x ¸ b m Oracle  k w k (a k ¢ x – b k ) ¸ 0 x 2 P Dual ellipsoid/Vaidya Dealing with width (contd)

Issue 2: Efficient implementation of Oracle: fast eigenvalues via matrix sparsification Lanczos algorithm effectively uses sparsity of C Similar to Achlioptas, McSherry [’01], but better in some situations (also easier analysis) C O(  n  ij |C ij |/  ) O(  n  ij |C ij |/  ) non-zero entries k C – C’ k ·  Random sampling C’

O(n 2 )-time algorithm to compute O( p log n)-approximation to SPARSEST CUT [A., Hazan, Kale ’04] (combinatorial algorithm; improves upon O(n 4.5 ) algorithm of A, Rao, Vazirani)

Sparsest Cut S O(log n) approximation [Leighton Rao ’88] O( p log n) approximation [A., Rao, Vazirani’04] O( p log n) approximation in O(n 2 ) time. (Actually, finds expander flows) [A., Hazan, Kale’05] The sparsest cut: S G) = 2/5

MW algorithm to find expander flows Events – { (s,w,z) | weights on vertices, edges, cuts} Experts – pairs of vertices (i,j) Payoff: (for weights d i,j on experts) Fact: If events are chosen optimally, the distribution on experts d i,j converges to a demand graph which is an “expander flow” [by results of Arora-Rao-Vazirani ’04 suffices to produce approx. sparsest cut] shortest path according to weights w e Cuts separating i and j

New: Online games with matrix payoffs (joint w/ Satyen Kale’06) Payoff is a matrix, and so is the “distribution” on experts! Uses matrix analogues of usual inequalities 1+ x · e x I + A · e A Can be used to give a new “primal-dual” view of SDP relaxations ( ) easier, faster approximation algorithms)

New: Faster algorithms for online learning and portfolio management (Agarwal-Hazan’06, Agarwal-Hazan-Kalai-Kale’06 ) Framework for online optimization inspired by Newton’s method (2 nd order optimization). (Note: MW ¼ gradient descent) Fast algorithms for Portfolio management and other online optimization problems

Open problems Better approaches to width management? Faster run times? Lower bounds? THANK YOU

Connection to Chernoff bounds and derandomization Deterministic approximation algorithms a la Raghavan-Thompson Packing/covering IP with variables x i = 0/1 9 ? x 2 P: 8 j 2 [m], f j (x) ¸ 0 Solve LP relaxation using variables y i 2 [0, 1] Randomized rounding: x i = Chernoff: O(log m) sampling iterations suffice 1 w.p. y i 0 w.p. 1 – y i

Derandomization [Young, ’95] Can derandomize the rounding using exp(t  j f j (x)) as a pessimistic estimator of failure probability By minimizing the estimator in every iteration, we mimic the random expt, so O(log m) iterations suffice The structure of the estimator obviates the need to solve the LP: Randomized rounding without solving the Linear Program Punchline: resulting algorithm is the MW algorithm!

Weighted majority [LW ‘94] If lost at t,  t+1 · (1-½  )  t At time T:  T · (1-½  ) #mistakes ¢  0 Overall: #mistakes · log(n)/  + (1+  ) m i #mistakes of expert i

Semidefinite programming Vectors a j and x: symmetric matrices in R n £ n x º 0 Assume: Tr(x) · 1 Set P = {x: x º 0, Tr(x) · 1} Oracle: max  j w j (a j ¢ x) over P Optimum: x = vv T where v is the largest eigenvector of  j w j a j

Efficiently implementing the oracle Optimum: x = vv T v is the largest eigenvector of some matrix C Suffices to find a vector v such that v T Cv ¸ 0 Lanczos algorithm with a random starting vector is ideal for this Advantage: uses only matrix-vector products Exploits sparsity (also: sparsification procedure) Use analysis of Kuczynski and Wozniakowski [’92]