1 Multiplicative Weights Update Method Boaz Kaminer Andrey Dolgin Based on: Aurora S., Hazan E. and Kale S., “The Multiplicative Weights Update Method:

Slides:



Advertisements
Similar presentations
Iterative Rounding and Iterative Relaxation
Advertisements

Primal Dual Combinatorial Algorithms Qihui Zhu May 11, 2009.
Introduction to Algorithms
1 EE5900 Advanced Embedded System For Smart Infrastructure Static Scheduling.
1 Introduction to Linear Programming. 2 Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. X1X2X3X4X1X2X3X4.
How Bad is Selfish Routing? By Tim Roughgarden Eva Tardos Presented by Alex Kogan.
Multicut Lower Bounds via Network Coding Anna Blasiak Cornell University.
FilterBoost: Regression and Classification on Large Datasets Joseph K. Bradley 1 and Robert E. Schapire 2 1 Carnegie Mellon University 2 Princeton University.
1 EL736 Communications Networks II: Design and Algorithms Class8: Networks with Shortest-Path Routing Yong Liu 10/31/2007.
Basic Feasible Solutions: Recap MS&E 211. WILL FOLLOW A CELEBRATED INTELLECTUAL TEACHING TRADITION.
Longin Jan Latecki Temple University
The loss function, the normal equation,
Visual Recognition Tutorial
Totally Unimodular Matrices Lecture 11: Feb 23 Simplex Algorithm Elliposid Algorithm.
Approximation Algorithm: Iterative Rounding Lecture 15: March 9.
1 Computing Nash Equilibrium Presenter: Yishay Mansour.
Network Optimization Models: Maximum Flow Problems In this handout: The problem statement Solving by linear programming Augmenting path algorithm.
Dasgupta, Kalai & Monteleoni COLT 2005 Analysis of perceptron-based active learning Sanjoy Dasgupta, UCSD Adam Tauman Kalai, TTI-Chicago Claire Monteleoni,
Near-Optimal Network Design with Selfish Agents By Elliot Anshelevich, Anirban Dasgupta, Eva Tardos, Tom Wexler STOC’03 Presented by Mustafa Suleyman CIFTCI.
CISS Princeton, March Optimization via Communication Networks Matthew Andrews Alcatel-Lucent Bell Labs.
EE 685 presentation Optimization Flow Control, I: Basic Algorithm and Convergence By Steven Low and David Lapsley Asynchronous Distributed Algorithm Proof.
Job Scheduling Lecture 19: March 19. Job Scheduling: Unrelated Multiple Machines There are n jobs, each job has: a processing time p(i,j) (the time to.
Introduction to Boosting Aristotelis Tsirigos SCLT seminar - NYU Computer Science.
Finite Mathematics & Its Applications, 10/e by Goldstein/Schneider/SiegelCopyright © 2010 Pearson Education, Inc. 1 of 68 Chapter 9 The Theory of Games.
Distributed Combinatorial Optimization
(work appeared in SODA 10’) Yuk Hei Chan (Tom)
Experts and Boosting Algorithms. Experts: Motivation Given a set of experts –No prior information –No consistent behavior –Goal: Predict as the best expert.
Yashar Ganjali High Performance Networking Group Stanford University September 17, 2003 Minimum-delay Routing.
Approximation Algorithms: Bristol Summer School 2008 Seffi Naor Computer Science Dept. Technion Haifa, Israel TexPoint fonts used in EMF. Read the TexPoint.
Experts Learning and The Minimax Theorem for Zero-Sum Games Maria Florina Balcan December 8th 2011.
Radial Basis Function Networks
1 Lecture 4 Maximal Flow Problems Set Covering Problems.
Multiplicative Weights Algorithms CompSci Instructor: Ashwin Machanavajjhala 1Lecture 13 : Fall 12.
C&O 355 Lecture 2 N. Harvey TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A.
Primal-Dual Meets Local Search: Approximating MST’s with Non-uniform Degree Bounds Author: Jochen Könemann R. Ravi From CMU CS 3150 Presentation by Dan.
Online Oblivious Routing Nikhil Bansal, Avrim Blum, Shuchi Chawla & Adam Meyerson Carnegie Mellon University 6/7/2003.
The Multiplicative Weights Update Method Based on Arora, Hazan & Kale (2005) Mashor Housh Oded Cats Advanced simulation methods Prof. Rubinstein.
Design Techniques for Approximation Algorithms and Approximation Classes.
Approximating Minimum Bounded Degree Spanning Tree (MBDST) Mohit Singh and Lap Chi Lau “Approximating Minimum Bounded DegreeApproximating Minimum Bounded.
Benk Erika Kelemen Zsolt
15.082J and 6.855J and ESD.78J Lagrangian Relaxation 2 Applications Algorithms Theory.
Chapter 1. Formulations 1. Integer Programming  Mixed Integer Optimization Problem (or (Linear) Mixed Integer Program, MIP) min c’x + d’y Ax +
Flows in Planar Graphs Hadi Mahzarnia. Outline O Introduction O Planar single commodity flow O Multicommodity flows for C 1 O Feasibility O Algorithm.
EE 685 presentation Optimization Flow Control, I: Basic Algorithm and Convergence By Steven Low and David Lapsley.
Maximum Flow Problem (Thanks to Jim Orlin & MIT OCW)
Linear Program Set Cover. Given a universe U of n elements, a collection of subsets of U, S = {S 1,…, S k }, and a cost function c: S → Q +. Find a minimum.
Quality of LP-based Approximations for Highly Combinatorial Problems Lucian Leahu and Carla Gomes Computer Science Department Cornell University.
15.082J and 6.855J March 4, 2003 Introduction to Maximum Flows.
Approximation Algorithms Department of Mathematics and Computer Science Drexel University.
Implicit Hitting Set Problems Richard M. Karp Erick Moreno Centeno DIMACS 20 th Anniversary.
1/74 Lagrangian Relaxation and Network Optimization Cheng-Ta Lee Department of Information Management National Taiwan University September 29, 2005.
1 Introduction to Linear Programming. 2 Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. X1X2X3X4X1X2X3X4.
Machine Learning Chapter 7. Computational Learning Theory Tom M. Mitchell.
TU/e Algorithms (2IL15) – Lecture 12 1 Linear Programming.
Common Intersection of Half-Planes in R 2 2 PROBLEM (Common Intersection of half- planes in R 2 ) Given n half-planes H 1, H 2,..., H n in R 2 compute.
Approximation Algorithms Duality My T. UF.
Approximation Algorithms based on linear programming.
1 Chapter 5 Branch-and-bound Framework and Its Applications.
Lap Chi Lau we will only use slides 4 to 19
Topics in Algorithms Lap Chi Lau.
Vapnik–Chervonenkis Dimension
Chapter 6. Large Scale Optimization
Instructor: Shengyu Zhang
CSCI B609: “Foundations of Data Science”
Introduction to Maximum Flows
Lecture 20 Linear Program Duality
Algorithms (2IL15) – Lecture 7
Flow Feasibility Problems
Chapter 1. Formulations.
Chapter 6. Large Scale Optimization
Presentation transcript:

1 Multiplicative Weights Update Method Boaz Kaminer Andrey Dolgin Based on: Aurora S., Hazan E. and Kale S., “The Multiplicative Weights Update Method: a Meta Algorithm and Applications” Unpublished

2 Motivation Meta-Algorithm Many Applications Useful Approximations Proofs

3 Content Basic Algorithm – Weighed Majority Generalized Weighted Majority Algorithm Applications:  General Guidelines  Linear Programming Approximations  Zero-Sum Games Approximations  Machine Learning Summary and Conclusions

4 The Basic Algorithm – Weighted Majority N ‘experts’, Each gives his estimation At each iteration, Each ‘expert’ weight is updated based on its prediction value N. Littlestone, and M.K. Warmuth. The Weighted Majority Algorithm. Information and Computation, 108(2):212–261,1994.

5 Approximation Theorem #1 Theorem #1:  After t steps, let m t i be the number of mistakes of ‘expert’ i and m t be the number of mistakes the entire algorithm has made. Then the following bound exists for every i:

6 Proof of Theorem #1 Induction: Define the potential function: Each mistake at least half the total weight decreases by 1-ε the potential function decreases by at least 1-ε/2: Induction: Since: => Using:

7 The Generalized Algorithm Denote P – The set of events/outcomes Assume matrix M: M(i,j) the penalty of ’expert’ i for outcome j.

8 The Generalized Algorithm – cont. The expected penalty M(D t,j t ) for outcome j t : The total loss after T rounds is

9 Approximation Theorem #2 Theorem #2:  Let ε  1 / 2. After T rounds, for every ‘expert’ i:

10 Theorem #2 - Proof Proof:  Define the potential function: Based on convexity of exponential function: (1-ε) x  (1-εx) for x  [0,1], (1+ε) -x  (1-εx) for x  [-1,0]

11 Proof of Theorem #2 – cont. Theorem #2: ε  1 / 2. T rounds, for every ‘expert’ i: Proof - cont:  After T rounds:  For every i:  Use:

12 Another possible weights update rule: For the error parameter  >0  #3: ε  min{  / 4 , 1 / 2 }, After T= 2  ln(n) / ε  rounds, the following bound on average expected loss for every i:  #4: ε=min{  / 4 , 1 / 2 }, After T=16  2 ln(n)/  2 rounds, the following bound on average expected loss for every i: Corollaries:

13 Gains instead of losses: In some situations, the entries of the matrix M may specify gains instead of losses. Now our goal is to gain as much as possible in comparison to the gain of the best expert. We can get an algorithm for this case simply by considering the matrix M’ = −M instead of M.  (1+ε) M(i,j)/ , when M (i j)  0  (1-ε) -M(i,j)/ , when M (i j) <0 #5: After T rounds, for every ‘expert’ i:

14 Applications General guidelines Approximately Solving Linear Programming Approximately Solving Zero-Sum games Machine Learning

15 General Guidelines for Applications Let each ‘expert’ represent constraint The penalty for an ‘expert’ is proportional to the satisfaction level of the constraint:  The penalty focuses on ‘experts’ whose constraints are poorly satisfied

16 Linear Programming (LP) Consider the following LP: Ax  b, x  P Consider c=  i p i A i, d=p i b i. for some distribution p 1,p 2,…,p m - find c T x  d, x  P. (Lagrangian Relaxation): solve c T x  d instead of Ax  b  We only need to check the feasibility of one constraint rather than m!  Using this “oracle” the following algorithm either: Yields an approximately feasible solution, i.e. A i x  b i -  for some small  >0 or failing that, proves that the system is infeasible. S. A. Plotkin, D. B. Shmoys, and E. Tardos. Fast approximation algorithm for fractional packing and covering problems. In Proceedings of the 32nd Annual IEEE Symposium on Foundations of Computer Science, pp. 495–504.

17 Lagrange Relaxation

18 General Framework for LP Each “expert” represents each of the m constraints. The penalty of the “expert” corresponding to constraint i for event x is A i x − b i. Assume that the oracle’s responses x  P which satisfy A i x−b i  [− ,  ] for all i, for some known parameter . Thus the penalties lie in [− ,  ]. Run the Multiplicative Weights Update algorithm for T steps, using ε=  /4 . The answer is:

19 Algorithm Results for LP If the oracle returns a feasible x for c T x  d in every iteration, then for every i: Thus we have an approximately feasible solution A i x  b i -  In some iteration, the oracle declares infeasibility of c T x  d In this case, we conclude that the original system Ax  b is infeasible. Number of iterations T proportional to  2.

20 Packing and Covering Problems Set packing problem:  Suppose we have a finite set S and a list of subsets of S.  Then, the set packing problem asks if some k subsets in the list are pairwise disjoint (in other words, no two of them intersect).   i x i  1 Set Covering:  Given a universe U and a family S of subsets of U, a cover C is a subfamily C  S of sets whose union is U.   i x i  1

21 Fractional Set Covering Fractional Set Packing For an error parameter ε>0: A point x is an approximate solution to:  The packing problem: if Ax  (1+ε)b  The covering problem: if Ax  (1-ε)b Reminder: Running time depends on ε -1 and on  -  =max i {max x  P (a i x/b i )}

22 Multi-commodity Flow Problems The multi-commodity flow problem is a network flow problem with multiple commodities (or goods) flowing through the network, with different source and sink nodes:  Capacity constraints:  Flow conservation constraints  Demand satisfaction constraints D2 S1 S2 D1

23 Multi-commodity Flow Problems Multi-commodity flow problems are represented by packing/covering LPs and thus can be approximately solved using the framework outlined above  LP formulation: max p  p f p, s.t.  e  p f p ≤c e  e  Solving K shortest-path for each commodity in each iteration  Unfortunately, the running time depends upon the edge capacities (as opposed to the logarithm of the capacities) and thus the algorithm is not even polynomial-time A modification:  Weight the edges (e)  The “event” consists of routing only as much flows as is allowed by the minimum capacity edge on the path (C p t )  The penalty incurred by edge e is M(e,p t ) = c p t /c e  Start with w e 1 = , w e t+1 =w e t (1+  ) Cpt/Ce, terminate when w e T ≥1

24 Applications in Zero-Sum Games Solutions Approximations Approximating the Zero-Sum game value: Let  > 0 be an error parameter. We wish to find mixed row and column strategies D final,P final such that: Each “expert” corresponds to a single pure strategy Thus a distribution on the experts corresponds to a mixed row strategy. Y. Freund and R. E. Schapire. Adaptive game playing using multiplicative weights. Games and Economic Behavior, 29:79–103, 1999.

25 Applications in Zero-Sum Games Solutions Approximations Set:  =  /4 Run for T= 16 ln(n)/  2 Iterations For any D (mixed strategy distribution) Specifically for D* => M(D,j)≤ * for any j Thus you get a  approximation to the gave value: For the mixed strategy who reached best results D final

26 Applications in Machine Learning Boosting:  A machine learning meta-algorithm for performing supervised learning. Boosting is based on the question posed by Kearns: can a set of weak learners create a single strong learner? Adaptive Boosting (Boosting by Sampling):  Fixed training set of N examples: The “experts” correspond to samples in the training set  Repeatedly run weak learning algorithm on different distributions defined on this set: “events” correspond to the set of all hypotheses that can be generated by the weak learning algorithm  The penalty for expert x is 1 or 0 depending on whether h(x) = c(x)  Final hypothesis has error  under the uniform distribution on the training set Yoav Freund and R. E. Schapire. A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences, 55(1):119–139, August 1997.

27 Summary, Conclusions, Way-ahead Approximation optimization algorithm for difficult constrained problems was presented Several convergence theorems were presented Several approximation applications areas were mentioned:  Combinatorial optimization  Game theory  Machine learning We believe this methodology can be used for solving constrained optimization problems through the Cross Entropy method