Much Faster Algorithms for Matrix Scaling

Slides:



Advertisements
Similar presentations
Primal Dual Combinatorial Algorithms Qihui Zhu May 11, 2009.
Advertisements

Fast Regression Algorithms Using Spectral Graph Theory Richard Peng.
Factorial Mixture of Gaussians and the Marginal Independence Model Ricardo Silva Joint work-in-progress with Zoubin Ghahramani.
Solving IPs – Cutting Plane Algorithm General Idea: Begin by solving the LP relaxation of the IP problem. If the LP relaxation results in an integer solution,
Bregman Iterative Algorithms for L1 Minimization with
The CORS method Selecting the roots of a system of polynomial equations with combinatorial optimization H. Bekker E.P. Braad B. Goldengorin University.
C&O 355 Mathematical Programming Fall 2010 Lecture 22 N. Harvey TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A A.
Online Social Networks and Media. Graph partitioning The general problem – Input: a graph G=(V,E) edge (u,v) denotes similarity between u and v weighted.
Graph Laplacian Regularization for Large-Scale Semidefinite Programming Kilian Weinberger et al. NIPS 2006 presented by Aggeliki Tsoli.
Basic Feasible Solutions: Recap MS&E 211. WILL FOLLOW A CELEBRATED INTELLECTUAL TEACHING TRADITION.
Unconstrained Optimization Rong Jin. Recap  Gradient ascent/descent Simple algorithm, only requires the first order derivative Problem: difficulty in.
Totally Unimodular Matrices Lecture 11: Feb 23 Simplex Algorithm Elliposid Algorithm.
Nonlinear Optimization for Optimal Control
1 Introduction to Linear and Integer Programming Lecture 9: Feb 14.
Methods For Nonlinear Least-Square Problems
Martin Burger Institut für Numerische und Angewandte Mathematik European Institute for Molecular Imaging CeNoS Total Variation and related Methods Numerical.
Princeton University COS 423 Theory of Algorithms Spring 2002 Kevin Wayne Reductions Some of these lecture slides are adapted from CLRS Chapter 31.5 and.
Computer Algorithms Integer Programming ECE 665 Professor Maciej Ciesielski By DFG.
Gradient Methods May Preview Background Steepest Descent Conjugate Gradient.
Why Function Optimization ?
Approximation Algorithms: Bristol Summer School 2008 Seffi Naor Computer Science Dept. Technion Haifa, Israel TexPoint fonts used in EMF. Read the TexPoint.

ECE 530 – Analysis Techniques for Large-Scale Electrical Systems Prof. Hao Zhu Dept. of Electrical and Computer Engineering University of Illinois at Urbana-Champaign.
Domain decomposition in parallel computing Ashok Srinivasan Florida State University COT 5410 – Spring 2004.
CS774. Markov Random Field : Theory and Application Lecture 08 Kyomin Jung KAIST Sep
ENCI 303 Lecture PS-19 Optimization 2
Theory of Computing Lecture 13 MAS 714 Hartmut Klauck.
296.3Page :Algorithms in the Real World Linear and Integer Programming II – Ellipsoid algorithm – Interior point methods.
Optimization - Lecture 4, Part 1 M. Pawan Kumar Slides available online
Speeding Up Enumeration Algorithms with Amortized Analysis Takeaki Uno (National Institute of Informatics, JAPAN)
Linear Programming Maximize Subject to Worst case polynomial time algorithms for linear programming 1.The ellipsoid algorithm (Khachian, 1979) 2.Interior.
Graph Partitioning using Single Commodity Flows
INTRO TO OPTIMIZATION MATH-415 Numerical Analysis 1.
Ankit Garg Princeton Univ. Joint work with Leonid Gurvits Rafael Oliveira CCNY Princeton Univ. Avi Wigderson IAS Noncommutative rational identity testing.
Linear Programming Chapter 9. Interior Point Methods  Three major variants  Affine scaling algorithm - easy concept, good performance  Potential.
Ch. Eick: Num. Optimization with GAs Numerical Optimization General Framework: objective function f(x 1,...,x n ) to be minimized or maximized constraints:
TU/e Algorithms (2IL15) – Lecture 12 1 Linear Programming.
Searching a Linear Subspace Lecture VI. Deriving Subspaces There are several ways to derive the nullspace matrix (or kernel matrix). ◦ The methodology.
OR Chapter 4. How fast is the simplex method  Efficiency of an algorithm : measured by running time (number of unit operations) with respect to.
TU/e Algorithms (2IL15) – Lecture 12 1 Linear Programming.
-- Interior-Point Methods for LP
Impact of Interference on Multi-hop Wireless Network Performance
The NP class. NP-completeness
Lap Chi Lau we will only use slides 4 to 19
Computation of the solutions of nonlinear polynomial systems
Linli Xu Martha White Dale Schuurmans University of Alberta
Chapter 1. Introduction Ex : Diet Problem
Sathya Ronak Alisha Zach Devin Josh
Topics in Algorithms Lap Chi Lau.
Computational Optimization
Cui Di Supervisor: Andrzej Lingas Lund University
Nonnegative polynomials and applications to learning
A Combinatorial, Primal-Dual Approach to Semidefinite Programs
Yuanzhi Li (Princeton)
NESTA: A Fast and Accurate First-Order Method for Sparse Recovery
Craig Schroeder October 26, 2004
Chapter 6. Large Scale Optimization
Constrained Bipartite Vertex Cover: The Easy Kernel is Essentially Tight Bart M. P. Jansen June 4th, WORKER 2015, Nordfjordeid, Norway.
Grouping.
Large Scale Support Vector Machines
Snakes, Shapes, and Gradient Vector Flow
Feature space tansformation methods
A Numerical Analysis Approach to Convex Optimization
Panorama of scaling problems and algorithms
Performance Optimization
Topics in Algorithms 2005 Max Cuts
Chapter 6. Large Scale Optimization
On Solving Linear Systems in Sublinear Time
Optimization on Graphs
Presentation transcript:

Much Faster Algorithms for Matrix Scaling Zeyuan Allen-Zhu, Yuanzhi Li, Rafael Oliveira, Avi Wigderson Matrix Scaling and Balancing via Box-Constrained Newton’s Method and Interior Point Methods Michael Cohen, Aleksander Mądry, Dimitris Tsipras, Adrian Vladu

Matrix Scaling M 1 = r MT1 = c X A Y M Matrix Balancing M1 = MT1 X A .5 .5 .5 1 = 1 1 2 .5 X A Y M Matrix Balancing M1 = MT1 1 2 1 1 2 1 = 1 1 2 .5 X A X-1 M

Per(A) = Per(XAY) /(Per(X) Per(Y)) Why Care? Preconditioning linear systems A z = b (XAY) Y-1z = Xb Approximating the permanent of nonnegative matrices Per(A) = Per(XAY) /(Per(X) Per(Y)) exp(-n) ≤ Per(XAY) ≤ 1 XAY doubly stochastic Detecting perfect matchings A : adjacency matrix of bipartite graph ∃ perfect matching  Per(A) ≠ 0

Why Care? Intensively studied in scientific computing literature [Wilkinson ’59], [Osborne ’60], [Sinkhorn ’64], [Parlett, Reinsch ’69], [Kalantari, Khachiyan ’15], [Schulman, Sinclair ’15], … Matrix balancing routines implemented in MATLAB, R Generalizations (operator scaling) are related to Polynomial Identity Testing [Gurvits ’04], [Garg, Gurvits, Oliveira, Wigderson ’17] , … Wilkinson - numerical analysis

Via Convex Optimization f(x) = ∑ij Aij exp(xi-xj) - ∑i di xi Generalized Matrix Balancing Via Convex Optimization Captures the problem’s difficulty Solves matrix scaling via simple reduction 2 1 rM = M 1 cM = MT1 1 1 2 1 = 1 1 2 .5 exp(X) X A exp(-X) X-1 M Goal: rM-cM=0 d f(x) = ∑ij Aij exp(xi-xj) - ∑i di xi nice convex function ∇f(x) = rM - cM - d

Equivalent Nonlinear Flow Problem “Nonlinear Ohm’s Law”: fuv = Auv exp(xu- xv) Ohm’s Law: fuv = Auv (xu- xv) 1 2 3 .5 .5 e e/2 3e/2 1 t s .5 -2e +2e 1.5 For those of you who like graph problems there is a very nice interpretation for this generalized matrix balancing problem. I;m going to place electric potentials on the graph's vertices. These potentials are going to induce flows according to Ohm's Law. And these flows route the demand. If instead I change Ohm's law with this nonlinear law, where the flows are proportional to the exponential of the change in potential across edges, then I obtain a problem that is equivalent to matrix balancing. This sort of intuition is actually very useful for deriving some of these results. 1 1 * edge weights = capacitances

Via Convex Optimization Generalized Matrix Balancing Via Convex Optimization Captures difficulty of both problems Solves matrix scaling via simple reduction 1 2 1 rM = M 1 cM = MT1 1 2 1 = 1 1 2 .5 exp(X) A exp(-X) M Goal: |rM-cM-d|≤ ε Goal: rM-cM=d f(x) = nice convex function ∇f(x) = rM - cM - d

Via Convex Optimization Generalized Matrix Balancing Via Convex Optimization f(x) = nice convex function ∇f(x) = r - c - d General Convex Optimization Framework: f(x + Δ) = f(x) + ∇f(x)TΔ + ½ ΔTHxΔ + … Δ = arg min|Δ|≤c … Δ = arg min|Δ|≤c … First order methods Second order methods [Ostrovsky, Rabani, Yousefi ’17] Matrix Balancing O(m+nε-2) Sinkhorn/Osborne iterations are instantiations of this framework (coordinate descent) [Kalantari, Khachiyan, Shokoufandeh ’97] Õ(n4 log ε-1)

Box-Constrained Newton Method Our Results [AZLOW ’17 ] [CMTV ’17 ] First Order Methods Second Order Methods Accelerated Gradient Descent O(mn1/3ε-2/3) Interior Point Method Õ(m3/2 log ε-1) Box-Constrained Newton Method New second-order framework in the two papers, we tackle both of these types of methods but the coolest result which appears in both of these works is a new framework for second order optimization which we call box constrained newton method (and it's essentially identical in both papers) (essentially identical in both papers) Õ((m+n4/3) log κ(X*)) Õ(m log κ(X*)) κ(X*) = condition number of matrix that yields perfect balancing

Via Convex Optimization Generalized Matrix Balancing Via Convex Optimization f(x) = nice convex function Can we use second order information to obtain a good solution in few iterations? ∇f(x) = rM - cM - d Hx)= diag(rM+cM) - (M+MT) f(x + Δ) ≈ f(x) + ∇f(x)TΔ + ½ ΔTHxΔ (*) Hessian matrix is a graph Laplacian Can compute Hx-1b in Õ(m) time [Spielman-Teng ’08, …] M = exp(X) A exp(-X) rM = M 1 cM = MT1 If |Δ|∞ ≤ 1 then Hx ≈O(1) Hx+Δ (* whenever the Hessian does not change too much along the line between x and x+Δ)

Box-Constrained Newton’s Method f(x + Δ) ≈ f(x) + ∇f(x)TΔ + ½ ΔTHxΔ Key idea: If |Δ|∞ ≤ 1 then Hx ≈O(1) Hx+Δ Suppose we can exactly minimize the second order approximation over |Δ|∞ ≤ 1 Goal: show that moving to minimizer inside box makes a lot of progress f(x+Δ)-f(x*) ≥ 1/10 (f(x+Δ*)-f(x*)) Minimizer of quadratic approximation in L∞ region Minimizer of f in L∞ region

R∞ = maxx:f(x)≤f(x0) |x-x*|∞ Box-Constrained Newton’s Method f(O)-f(O) ≥ f(O)-f(O) f(O)-f(O) ≥ (f(O)-f(O)) / |O-O|∞ absolute upper bound R ∞ arbitrarily close to O in Õ(R ∞) iterations

Box-Constrained Newton’s Method R∞ = maxx:f(x)≤f(x0) |x-x*|∞ Box-Constrained Newton’s Method f(x + Δ) ≈ f(x) + ∇f(x)TΔ + ½ ΔTHxΔ Key idea: If |Δ|∞ ≤ 1 then Hx ≈O(1) Hx+Δ Õ(R∞) box constrained quadratic minimizations Suppose we can exactly minimize the second order approximation over |Δ|∞ ≤ 1 f(x+Δ)-f(x*) ≥ 1/10 (f(x+Δ*)-f(x*)) Minimizer of quadratic approximation in L∞ region Minimizer of f in L∞ region

Box-Constrained Newton’s Method R∞ = maxx:f(x)≤f(x0) |x-x*|∞ Box-Constrained Newton’s Method f(x + Δ) ≈ f(x) + ∇f(x)TΔ + ½ ΔTHxΔ Key idea: If |Δ|∞ ≤ 1 then Hx ≈O(1) Hx+Δ Õ(kR∞) box constrained quadratic minimizations Õ(R∞) box constrained quadratic minimizations Suppose we can exactly minimize the second order approximation over |Δ|∞ ≤ 1 Unclear how to solve this fast  Instead, relax the L∞ constraint by a factor of k outsource to k-oracle

k-oracle Input: graph Laplacian L, vector b Ideally: output Instead: output [AZLOW ’17 ] [CMTV ’17 ] based on approximate max flow algorithm [CKMST ’11] based on Laplacian solver [LPS ’15] Õ(m+n4/3) Õ(m)

Conclusions and Future Outlook Nearly-linear time algorithms for matrix scaling and balancing New framework for second order optimization Used Hessian smoothness while avoiding self-concordance Can we use any of these ideas for faster interior point methods? Dependence in condition number log κ(X*) given by the R∞ bound If we want to detect perfect matchings, R∞ = Θ(n) Is there a way to improve this dependence? (log κ(X*))1/2 We saw an extension of Laplacian solving. What else is there? Better primitives for convex optimization? add slide before the conclusion, in particular we improved a lot of other problems

Thank You! add slide before the conclusion, in particular we improved a lot of other problems