Much Faster Algorithms for Matrix Scaling Zeyuan Allen-Zhu, Yuanzhi Li, Rafael Oliveira, Avi Wigderson Matrix Scaling and Balancing via Box-Constrained Newton’s Method and Interior Point Methods Michael Cohen, Aleksander Mądry, Dimitris Tsipras, Adrian Vladu
Matrix Scaling M 1 = r MT1 = c X A Y M Matrix Balancing M1 = MT1 X A .5 .5 .5 1 = 1 1 2 .5 X A Y M Matrix Balancing M1 = MT1 1 2 1 1 2 1 = 1 1 2 .5 X A X-1 M
Per(A) = Per(XAY) /(Per(X) Per(Y)) Why Care? Preconditioning linear systems A z = b (XAY) Y-1z = Xb Approximating the permanent of nonnegative matrices Per(A) = Per(XAY) /(Per(X) Per(Y)) exp(-n) ≤ Per(XAY) ≤ 1 XAY doubly stochastic Detecting perfect matchings A : adjacency matrix of bipartite graph ∃ perfect matching Per(A) ≠ 0
Why Care? Intensively studied in scientific computing literature [Wilkinson ’59], [Osborne ’60], [Sinkhorn ’64], [Parlett, Reinsch ’69], [Kalantari, Khachiyan ’15], [Schulman, Sinclair ’15], … Matrix balancing routines implemented in MATLAB, R Generalizations (operator scaling) are related to Polynomial Identity Testing [Gurvits ’04], [Garg, Gurvits, Oliveira, Wigderson ’17] , … Wilkinson - numerical analysis
Via Convex Optimization f(x) = ∑ij Aij exp(xi-xj) - ∑i di xi Generalized Matrix Balancing Via Convex Optimization Captures the problem’s difficulty Solves matrix scaling via simple reduction 2 1 rM = M 1 cM = MT1 1 1 2 1 = 1 1 2 .5 exp(X) X A exp(-X) X-1 M Goal: rM-cM=0 d f(x) = ∑ij Aij exp(xi-xj) - ∑i di xi nice convex function ∇f(x) = rM - cM - d
Equivalent Nonlinear Flow Problem “Nonlinear Ohm’s Law”: fuv = Auv exp(xu- xv) Ohm’s Law: fuv = Auv (xu- xv) 1 2 3 .5 .5 e e/2 3e/2 1 t s .5 -2e +2e 1.5 For those of you who like graph problems there is a very nice interpretation for this generalized matrix balancing problem. I;m going to place electric potentials on the graph's vertices. These potentials are going to induce flows according to Ohm's Law. And these flows route the demand. If instead I change Ohm's law with this nonlinear law, where the flows are proportional to the exponential of the change in potential across edges, then I obtain a problem that is equivalent to matrix balancing. This sort of intuition is actually very useful for deriving some of these results. 1 1 * edge weights = capacitances
Via Convex Optimization Generalized Matrix Balancing Via Convex Optimization Captures difficulty of both problems Solves matrix scaling via simple reduction 1 2 1 rM = M 1 cM = MT1 1 2 1 = 1 1 2 .5 exp(X) A exp(-X) M Goal: |rM-cM-d|≤ ε Goal: rM-cM=d f(x) = nice convex function ∇f(x) = rM - cM - d
Via Convex Optimization Generalized Matrix Balancing Via Convex Optimization f(x) = nice convex function ∇f(x) = r - c - d General Convex Optimization Framework: f(x + Δ) = f(x) + ∇f(x)TΔ + ½ ΔTHxΔ + … Δ = arg min|Δ|≤c … Δ = arg min|Δ|≤c … First order methods Second order methods [Ostrovsky, Rabani, Yousefi ’17] Matrix Balancing O(m+nε-2) Sinkhorn/Osborne iterations are instantiations of this framework (coordinate descent) [Kalantari, Khachiyan, Shokoufandeh ’97] Õ(n4 log ε-1)
Box-Constrained Newton Method Our Results [AZLOW ’17 ] [CMTV ’17 ] First Order Methods Second Order Methods Accelerated Gradient Descent O(mn1/3ε-2/3) Interior Point Method Õ(m3/2 log ε-1) Box-Constrained Newton Method New second-order framework in the two papers, we tackle both of these types of methods but the coolest result which appears in both of these works is a new framework for second order optimization which we call box constrained newton method (and it's essentially identical in both papers) (essentially identical in both papers) Õ((m+n4/3) log κ(X*)) Õ(m log κ(X*)) κ(X*) = condition number of matrix that yields perfect balancing
Via Convex Optimization Generalized Matrix Balancing Via Convex Optimization f(x) = nice convex function Can we use second order information to obtain a good solution in few iterations? ∇f(x) = rM - cM - d Hx)= diag(rM+cM) - (M+MT) f(x + Δ) ≈ f(x) + ∇f(x)TΔ + ½ ΔTHxΔ (*) Hessian matrix is a graph Laplacian Can compute Hx-1b in Õ(m) time [Spielman-Teng ’08, …] M = exp(X) A exp(-X) rM = M 1 cM = MT1 If |Δ|∞ ≤ 1 then Hx ≈O(1) Hx+Δ (* whenever the Hessian does not change too much along the line between x and x+Δ)
Box-Constrained Newton’s Method f(x + Δ) ≈ f(x) + ∇f(x)TΔ + ½ ΔTHxΔ Key idea: If |Δ|∞ ≤ 1 then Hx ≈O(1) Hx+Δ Suppose we can exactly minimize the second order approximation over |Δ|∞ ≤ 1 Goal: show that moving to minimizer inside box makes a lot of progress f(x+Δ)-f(x*) ≥ 1/10 (f(x+Δ*)-f(x*)) Minimizer of quadratic approximation in L∞ region Minimizer of f in L∞ region
R∞ = maxx:f(x)≤f(x0) |x-x*|∞ Box-Constrained Newton’s Method f(O)-f(O) ≥ f(O)-f(O) f(O)-f(O) ≥ (f(O)-f(O)) / |O-O|∞ absolute upper bound R ∞ arbitrarily close to O in Õ(R ∞) iterations
Box-Constrained Newton’s Method R∞ = maxx:f(x)≤f(x0) |x-x*|∞ Box-Constrained Newton’s Method f(x + Δ) ≈ f(x) + ∇f(x)TΔ + ½ ΔTHxΔ Key idea: If |Δ|∞ ≤ 1 then Hx ≈O(1) Hx+Δ Õ(R∞) box constrained quadratic minimizations Suppose we can exactly minimize the second order approximation over |Δ|∞ ≤ 1 f(x+Δ)-f(x*) ≥ 1/10 (f(x+Δ*)-f(x*)) Minimizer of quadratic approximation in L∞ region Minimizer of f in L∞ region
Box-Constrained Newton’s Method R∞ = maxx:f(x)≤f(x0) |x-x*|∞ Box-Constrained Newton’s Method f(x + Δ) ≈ f(x) + ∇f(x)TΔ + ½ ΔTHxΔ Key idea: If |Δ|∞ ≤ 1 then Hx ≈O(1) Hx+Δ Õ(kR∞) box constrained quadratic minimizations Õ(R∞) box constrained quadratic minimizations Suppose we can exactly minimize the second order approximation over |Δ|∞ ≤ 1 Unclear how to solve this fast Instead, relax the L∞ constraint by a factor of k outsource to k-oracle
k-oracle Input: graph Laplacian L, vector b Ideally: output Instead: output [AZLOW ’17 ] [CMTV ’17 ] based on approximate max flow algorithm [CKMST ’11] based on Laplacian solver [LPS ’15] Õ(m+n4/3) Õ(m)
Conclusions and Future Outlook Nearly-linear time algorithms for matrix scaling and balancing New framework for second order optimization Used Hessian smoothness while avoiding self-concordance Can we use any of these ideas for faster interior point methods? Dependence in condition number log κ(X*) given by the R∞ bound If we want to detect perfect matchings, R∞ = Θ(n) Is there a way to improve this dependence? (log κ(X*))1/2 We saw an extension of Laplacian solving. What else is there? Better primitives for convex optimization? add slide before the conclusion, in particular we improved a lot of other problems
Thank You! add slide before the conclusion, in particular we improved a lot of other problems