Algorithm Frameworks Based on Adaptive Sampling

Algorithm Frameworks Based on Adaptive Sampling
Richard Peng Georgia Tech 1

OUtline Sampling Input-Sparsity Time Regression
Connections with Graphs Sparsified Squaring

Randomized Algorithms
Original Compute on sample Bring answer back Sample Examples: Quick sort/select Geometric cuttings Nystrom method … Smaller sample Repeat this process: recursive routines that underlie many efficient algorithms

Example: Quick Select Problem: find kth element
Reduction: random pivot Recovery: bring result back Original Structure preserving lemma: Pivoting halves n in expectation Problem1 … More control: O(logn) random pivots halves the list w.h.p. Problem d List is 1-dimensional, focus of this talk: do this for more complex objects

Lx=b Adaptive Sampling A’ A This talk:
Regression in O(nnz + poly(d)) time O(mlogcn) time solvers for linear systems in graph Laplacians A sequence of randomized sampling steps, each of whose distributions are calculated based on the output of previous ones. Lx=b A’ A Runtime measured using O(), worst-case analysis

Compressing TalL & Thin Matrices
n-by-d matrix A, nnz non-zeros, nnz > n >> d, find A’ with fewer rows s.t. ║Ax║≈║A’x║ ∀ x ≈: relative error approximation Application: solve minx ║Ax-b║ by solving minx ║A’x-b’║ instead A’ A ║Ax║22 = xTATAx ATA: d-by-d matrix A’  QR factorization of ATA When n >> d, nnz > poly(d), cost dominated by computing ATA, O(nnz × d)

Row Sampling Go from n to n’ rows: Keep a row, ai, with probability pi
If picked, rescale to keep expectation Go from n to n’ rows: norm sampling: pi =n’ ║ai║22 / ║A║F2 Uniform sampling: pi = n’/n Issue: only one non-zero row Issue: column with one entry

Matrix-Chernoff Bounds
τ: L2 statistical leverage scores τi = aiT(ATA)+ai ai row i of A M+: pseudo-inverse Interpretation: ease of ‘reconstructing’ ai : τi = min ║y║2 s.t. yA = ai Joel Tropp Roman Vershynin Mark Rudelson [RV`07], [Tropp `12]: pi ≥ τiO( logd) gives A ≈ A’ w.h.p. [Foster `49] Σi τi = rank ≤ d  O(dlogd) rows

Sampling More [Talagrand `90] “Embedding subspaces of L1 into LN1”: non-linear, matrix, analog, O(dlogd) objects for d dimensions Michel Talagrand ║x║2 ║x║1 [CP `15] sampling probabilities: Lewis weights, p = 1: w s.t. wi2 = aiT(AtW1A)-1ai W = diag(w) Recursive definition, but can compute How to interpret?

How to Compute Leverage Scores?
τi = aiT(ATA)+ai Given A’ ≈ A with ≈ d rows , can estimate leverage scores of A in O(nnz(A) + dω+θ) time Chicken-and-egg problem Finding A’ ≈ A need leverage scores Efficient leverage scoring computation needs A’ ≈ A

Size Reduction Using Sketching
Ken Clarkson Petros Drineas Michael Mahoney Malik Magdon -Ismail Huy Ngyuen Jelani Nelson Christian Sohler David Woodruff Qin Zhang A’ = SA for S that’s easy to evaluate: [DMMW `12]: fast Hadamard transform [CW `13][NN `13]: count sketch for L2 [SW `11][MM `13][WZ `15]: L1 sketches

Adaptive Row Sampling [Avron `11]: sketch and precondition: reduce error via iterative methods Haim Avron Gary Miller Mu Li [LMP `13] Iterative row sampling: Find an approximation A’ Use A’ to find leverage scores of A Sample A to get result A”

Adaptive Row Sampling Pick random subset of rows/columns
Compute on subset Extend result onto the full matrix Uniform sampling does not give spectral approximations! Fix: can absorb this kind of error through post-processing

Uniform Sampling + Post-Process
Cam/ Chris Musco (s) Aaron Sidford Yin-Tat Lee Michael B. Cohen Structural theorem: pick half the rows as A’, then using A’ to estimate leverage scores for A gives expected total ≤ 2d Implication: O(nnz + dω+θ) time algorithm

Solution to Chicken and Egg
Structural theorem: pick half the rows as A’ gives expected total estimate ≤ 2d rowSample(A): A’  sample(A, 1/2) A”  rowSample(A’) return sample(A, computeProb(A, A”)) Reduction: random half Recovery: resample n Στ = O(d) n/2 Runtime recurrence: T(nnz) = T(nnz/2) + O(nnz + dω+θ) = O(nnz + dω+θ) Στ = O(d) n/4 …

This Framework also solves
Maxflow: max number of edge disjoint s-t paths [P `16]: 1+ε approximation in O(mlogcn ε-3) time s Maxflow t RecursiveApproxMaxflow(G) H  ultra-sparsify(G, O(logcn)) Gs  rake-compress(H) Rs  approximator(Gs) (via recursive calls) R  extend(G, Rs) return approximatorFlow(G, R) Reduction Recovery

graph sparsification Are dense graphs with m = n2 edges sometimes necessary? For connectivity: < n edges always ok

∅ Preserving More Andras Benczur David Karger
[BK `96]: for ANY G, can get H with O(nlogn) edges s.t. G ≈ H on all cuts ∅ Same as importance sampling How: keep edge e with probability pe, rescale if kept to maintain expectation

Algebraic Representation of Graphs
Edge-vertex incidence matrix: Beu = -1/1 if u is endpoint of e Graph Laplacian L Diagonal: degree Off-diagonal: -weights 1 1 n vertices m edges m rows n columns n rows / columns O(m) non-zeros L is the Gram matrix of B, L = BTB

Spectral Similarity LG ≈ LH ║yi║22 =Σi yi2
Beu = -1/1 if u is endpoint of e 0 otherwise ║BGx║2 ≈║BHx║2 ∀ x For edge e = uv, (Be:x) 2 = (xu – xv)2 x = {0, 1}V: (1-0)2=1 xv=0 xu=1 G ≈ H on all cuts (1-1)2=0 xz=1 ║BGx║22 = size of cut given by x

Leverage Scores on Graphs
Dan Spielman Nikhil Srivastava [RV `07], [Tropp `12], [SS `08]: sampling w.p. pe ≥ O( logn) × we × effective resistance produces a spectral approximation 1/n Path + clique: 1

Adaptive Sampling on Graphs
Phil Klein David Karger Bob Tarjan [KKT `94]: minimum spanning trees in O(m) expected time Contract vertices to form G Uniformly sample edges to form H Find MST on H (recursively), TH Use TH to trim G, get G’ Find MST on G’(recursively) Reduction Recovery

Difficulties With Graphs
Highly connected, need global steps Long paths / tree, need many steps Each easy on their own: Iterative method Blocking flow Gaussian elimination Data structures Efficient algorithms must handle both simultaneously

The Laplacian Paradigm
Dan Spielman Shanghua Teng Directly related: Elliptic systems Lx=b Few iterations: Eigenvectors, Heat kernels Many iterations / modify algorithm Graph problems Image processing

Use Matlab? Trilinos? Lapack?
Sequence of (adpatively) generated linear systems: : …we suggest rerunning the program a few times and/or using a different solver. An alternate solver based on incomplete Cholesky factorization is provided… Optimization Problem Linear System Solver Kevin Deweese

Simplification Rest of this talk: provably nearly-linear time solver for L = I – A, where A is a random walk Adjust/rescale so diagonal = I Add to diagonal to make full rank 29

Lower Bound Iterative methods in 1 line:
L-1 = (I – A)-1 = I + A + A2 + A3 +… Graph theoretic interpretation: each term  1 step walk b Ab A2b Adiameterb Need Ω(diameter) steps 30

[PS `14]: can obtain sparsifier of A2 in O(mlogcn) time
Repeated squaring A16 = ((((A2)2)2)2, 4 operations (I – A)-1 = I + A + A2 + A3 + … = (I + A) (I + A2) (I + A4)… A: step of random walk A2: 2 step random walk Dense matrix! Still a graph! [PS `14]: can obtain sparsifier of A2 in O(mlogcn) time 31

Sparsified squaring I - A0 I - A1 ≈ε I – A2 I – A2 ≈ε I – A12 … I – Ai ≈ε I – Ai-12 I - Ad ≈ I ≈: spectral/condition number, implies cut Convergence: (approximately) same as repeated squaring: d = O(log(mixing time)) suffices I - Ad≈ I

Similar To Approximate matrix multiplication Multiscale methods
NC algorithm for shortest path Logspace connectivity: [Reingold `05]/ Deterministic squaring: [RV`05] Connectivity Parallel Solver Iteration Ai+1 ≈ Ai2 Until |Ad| small Size Reduction Low degree Sparse graph Method Derandomized Randomized Solution transfer (I - Ai)xi = bi 33

[PS `14] Linear System Solver
x = Solve(I, A0, … Ad, b) For i =1 to d, bi  (I + Ai) bi-1. xd  bd. For i = d - 1 downto 0, xi  ½[bi+(I +Ai)xi+1]. Reduction: sparsified squaring Recovery: smoothing Runtime: O(mlogcnlog3(mixing time))

Analysis Of Parallel Solver
(I – Ai)-1 = ½ [I+(1 + Ai) (I – Ai2)-1 (1 + Ai)] Chain: (I – Ai+1)-1 ≈ε (I – Ai2)-1 Zi+1 ≈ α+ε (1 – Ai2)-1 Induction: Zi+1 ≈α (I – Ai+1) -1 Zi  ½ [I+(1 + Ai) Zi+1(I + Ai)] Composition: Zi ≈ α+ε (I – Ai)-1 Total error = dε= O(logκε)

Dense Intermediate Objects
Matrix powers Matrix inverse Transitive closures LU factorizations Cost-prohibitive to store / find But can access a sparse version 36

Higher Powers A: random walk Dehua Cheng Yu Cheng
Ak: k step random walk Shanghua Teng Yan Liu Sparse graph close to Ak [CCLPT `15: can compute this sparsifier in nearly-linear time Implication: more interpretable factorizations (I – A)-1 = (1 + A/2) (I – 3/4A2 – 1/4A2)-1 (1 + A/2) 37

Sparse Block Cholesky Yin-Tat Lee Dan Spielman Rasmus Kyng
Sushant Sachdeva [KLPRS`16]: Repeatedly eliminate some variables Sparsify intermediate matrices O(mlogn) time, extends to connection Laplacians, which can be viewed as having complex weights

Even more adaptive Rasmus Kyng Sushant Sachdeva
[KS `16] Per-entry pivoting, more or less ichol Running time bound: O(mlog3n), OPEN: improve this

Open Questions Other forms of adaptive sampling:
Low rank approximation (L1?) Linear programs? Nearly-linear time algorithms for: Wider classes of linear systems Directed maximum flow Intermediate questions: Squaring based flow algorithms Sparsify directed graphs?

Algorithm Frameworks Based on Adaptive Sampling

Similar presentations

Presentation on theme: "Algorithm Frameworks Based on Adaptive Sampling"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Algorithm Frameworks Based on Adaptive Sampling

Similar presentations

Presentation on theme: "Algorithm Frameworks Based on Adaptive Sampling"— Presentation transcript:

Similar presentations

About project

Feedback