n-pixel Hubble image (cropped)

Slides:



Advertisements
Similar presentations
1+eps-Approximate Sparse Recovery Eric Price MIT David Woodruff IBM Almaden.
Advertisements

Wavelet and Matrix Mechanism CompSci Instructor: Ashwin Machanavajjhala 1Lecture 11 : Fall 12.
Sparse Recovery Using Sparse (Random) Matrices Piotr Indyk MIT Joint work with: Radu Berinde, Anna Gilbert, Howard Karloff, Martin Strauss and Milan Ruzic.
Sparse Recovery Using Sparse (Random) Matrices Piotr Indyk MIT Joint work with: Radu Berinde, Anna Gilbert, Howard Karloff, Martin Strauss and Milan Ruzic.
Image acquisition using sparse (pseudo)-random matrices Piotr Indyk MIT.
Sparse Recovery (Using Sparse Matrices)
Fast Algorithms For Hierarchical Range Histogram Constructions
Computer vision: models, learning and inference Chapter 8 Regression.
CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.
Exploiting Sparse Markov and Covariance Structure in Multiresolution Models Presenter: Zhe Chen ECE / CMR Tennessee Technological University October 22,
Multi-Task Compressive Sensing with Dirichlet Process Priors Yuting Qi 1, Dehong Liu 1, David Dunson 2, and Lawrence Carin 1 1 Department of Electrical.
Online Social Networks and Media. Graph partitioning The general problem – Input: a graph G=(V,E) edge (u,v) denotes similarity between u and v weighted.
CS774. Markov Random Field : Theory and Application Lecture 17 Kyomin Jung KAIST Nov
Extensions of wavelets
1 Micha Feigin, Danny Feldman, Nir Sochen
Learning With Dynamic Group Sparsity Junzhou Huang Xiaolei Huang Dimitris Metaxas Rutgers University Lehigh University Rutgers University.
“Random Projections on Smooth Manifolds” -A short summary
Department of Computer Science, University of Maryland, College Park, USA TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.:
Approximating Sensor Network Queries Using In-Network Summaries Alexandra Meliou Carlos Guestrin Joseph Hellerstein.
Machine Learning Week 2 Lecture 2.
Iterative closest point algorithms
Semidefinite Programming
Distributed Regression: an Efficient Framework for Modeling Sensor Network Data Carlos Guestrin Peter Bodik Romain Thibaux Mark Paskin Samuel Madden.
Recent Development on Elimination Ordering Group 1.
Implicit Hitting Set Problems Richard M. Karp Harvard University August 29, 2011.
Greedy Algorithms Reading Material: Chapter 8 (Except Section 8.5)
Computing Sketches of Matrices Efficiently & (Privacy Preserving) Data Mining Petros Drineas Rensselaer Polytechnic Institute (joint.
Tirgul 13. Unweighted Graphs Wishful Thinking – you decide to go to work on your sun-tan in ‘ Hatzuk ’ beach in Tel-Aviv. Therefore, you take your swimming.
1 Separator Theorems for Planar Graphs Presented by Shira Zucker.
Greedy Algorithms Like dynamic programming algorithms, greedy algorithms are usually designed to solve optimization problems Unlike dynamic programming.
Pattern Recognition. Introduction. Definitions.. Recognition process. Recognition process relates input signal to the stored concepts about the object.
Fundamental Techniques
Computational aspects of stability in weighted voting games Edith Elkind (NTU, Singapore) Based on joint work with Leslie Ann Goldberg, Paul W. Goldberg,
Normalised Least Mean-Square Adaptive Filtering
Model-based Compressive Sensing
VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 5: Global Routing © KLMH Lienig 1 FLUTE: Fast Lookup Table Based RSMT Algorithm.
Machine Learning CUNY Graduate Center Lecture 3: Linear Regression.
1 Introduction to Approximation Algorithms. 2 NP-completeness Do your best then.
Recovery of Clustered Sparse Signals from Compressive Measurements
Cs: compressed sensing
Iterated Denoising for Image Recovery Onur G. Guleryuz To see the animations and movies please use full-screen mode. Clicking on.
Segmentation Course web page: vision.cis.udel.edu/~cv May 7, 2003  Lecture 31.
Sketching via Hashing: from Heavy Hitters to Compressive Sensing to Sparse Fourier Transform Piotr Indyk MIT.
Learning With Structured Sparsity
Graph Algorithms. Definitions and Representation An undirected graph G is a pair (V,E), where V is a finite set of points called vertices and E is a finite.
Model-Based Compressive Sensing Presenter: Jason David Bonior ECE / CMR Tennessee Technological University November 5, 2010 Reading Group (Richard G. Baraniuk,
LECTURE 13. Course: “Design of Systems: Structural Approach” Dept. “Communication Networks &Systems”, Faculty of Radioengineering & Cybernetics Moscow.
Christopher Moh 2005 Competition Programming Analyzing and Solving problems.
Approximation Algorithms for Prize-Collecting Forest Problems with Submodular Penalty Functions Chaitanya Swamy University of Waterloo Joint work with.
06/12/2015Applied Algorithmics - week41 Non-periodicity and witnesses  Periodicity - continued If string w=w[0..n-1] has periodicity p if w[i]=w[i+p],
Over-fitting and Regularization Chapter 4 textbook Lectures 11 and 12 on amlbook.com.
Strings Basic data type in computational biology A string is an ordered succession of characters or symbols from a finite set called an alphabet Sequence.
Exponential random graphs and dynamic graph algorithms David Eppstein Comp. Sci. Dept., UC Irvine.
Algorithms for hard problems Introduction Juris Viksna, 2015.
Low Rank Approximation and Regression in Input Sparsity Time David Woodruff IBM Almaden Joint work with Ken Clarkson (IBM Almaden)
TU/e Algorithms (2IL15) – Lecture 12 1 Linear Programming.
The geometric GMST problem with grid clustering Presented by 楊劭文, 游岳齊, 吳郁君, 林信仲, 萬高維 Department of Computer Science and Information Engineering, National.
Sketching complexity of graph cuts Alexandr Andoni joint work with: Robi Krauthgamer, David Woodruff.
Errol Lloyd Design and Analysis of Algorithms Approximation Algorithms for NP-complete Problems Bin Packing Networks.
Sparse RecoveryAlgorithmResults  Original signal x = x k + u, where x k has k large coefficients and u is noise.  Acquire measurements Ax = y. If |x|=n,
TU/e Algorithms (2IL15) – Lecture 11 1 Approximation Algorithms.
Learning With Dynamic Group Sparsity
Basic Algorithms Christina Gallner
Sudocodes Fast measurement and reconstruction of sparse signals
Optimal sparse representations in general overcomplete bases
Sudocodes Fast measurement and reconstruction of sparse signals
Topics in Algorithms 2005 Max Cuts
Non-Negative Matrix Factorization
Chapter 6. Large Scale Optimization
Outline Sparse Reconstruction RIP Condition
Presentation transcript:

Fast Algorithms for Structured Sparsity Piotr Indyk Joint work with C. Hegde and L. Schmidt (MIT), J. Kane and L. Lu and D. Hohl (Shell)

n-pixel Hubble image (cropped) Sparsity in data Data is often sparse In this talk, “data” means some x ∈ Rn n-pixel Hubble image (cropped) n-pixel seismic image Data can be specified by values and locations of their k large coefficients (2k numbers)

Sparsity in data Data is often sparsely expressed using a suitable linear transformation large wavelet coefficients Wavelet transform pixels Data can be specified by values and locations of their k large wavelet coefficients (2k numbers)

Sparse = good Applications: Compression: JPEG, JPEG 2000 and relatives De-noising: the “sparse” component is information, the “dense” component is noise Machine learning Compressive sensing - recovering data from few measurements (more later) …

Beyond sparsity Notion of sparsity captures simple primary structure But locations of large coefficients often exhibit rich secondary structure

Today Structured sparsity: Models Examples: Block sparsity,Tree sparsity, Constrained EMD, Clustered Sparsity Efficient algorithms: how to extract structured sparse representations quickly Applications: (Approximation-tolerant) model-based compressive sensing

Modeling sparse structure [Blumensath-Davies’09] Def: Specify a list of p allowable sparsity patterns M = {Ω1, . . . , Ωp } where Ωi ⊆ [n], |Ωi|≤k Then, a structured sparsity model is the space of signals supported on one of the patterns in M M = {x ∈ Rn | ∃ Ωi ∈ Ω : supp(x ) ⊆ Ωi } M n = 5, k = 2 p = 4 .

Model I: Block sparsity “Large coefficients cluster together” Parameters: k, b (block length) and l (number of blocks) The range {1…n} is partitioned into b-length blocks B1…Bn/b M contains all combinations of l blocks, i.e., M={ Bi1…Bil:i1,..,il{1..n/b} } Total sparsity: k=bl b=3, l=1, k=3

Model II: Tree-sparsity “Large coefficients cluster on a tree” Parameters: k,t Coefficients are nodes in a full t-ary tree M is the set of all rooted connected subtrees of size k

Model III: Graph sparsity Parameters: k, g, graph G Coefficients are nodes in G M contains all subgraphs with k nodes and clustered into g connected components

M(x) = argminΩ∈M ||x-xΩ||2 Algorithms ? Structured sparsity model specifies a hypothesis class for signals of interest For an arbitrary input signal x, model projection oracle extracts structure by returning the “closest” signal in model M(x) = argminΩ∈M ||x-xΩ||2

Algorithms for model projection Good news: several important models admit projection oracles with polynomial time complexity Bad news: Polynomial time is not enough. E.g., consider a ‘moderate’ problem: n = 10 million, k = 5% of n. Then, nk > 5 x 1012 For some models (e.g., graph sparsity), model projection is NP-hard Trees Dynamic programming rectangular time: O(nk) [Cartis-Thompson ’13] Blocks Block thresholding linear time: O(n)

Approximation to the rescue Instead of finding an exact solution to the projection M(x) = argminΩ∈M ||x-xΩ||2 we solve it approximately (and much faster) What does “approximately” mean ? (Tail) ||x-T(x)||≤ CT argminΩ∈M ||x-xΩ||2 (Head) ||H(x)||≥ CH argmaxΩ∈M ||xΩ||2 Choice depends on applications Tail: works great if approximation error is small Head: meaningful output even if approximation is not good For compressive sensing application we need both ! Note: T(x) and H(x) might not report k-sparse vectors But in all examples in this talk they do report O(k)-sparse vectors from the (slightly larger) model

We will see Tree sparsity Graph sparsity Exact: O(kn) [Cartis-Thompson, SPL’13] Approximate (H/T): O(n log2 n) [Hegde-Indyk-Schmidt, ICALP’14] Graph sparsity Approximate: O(n log4n) [Hegde-Indyk-Schmidt,ICML’15] (based on Goemans-Williamson’95)

Hegde-Indyk-Schmidt’14 Tree sparsity (Tail) ||x-T(x)||≤ CT argminΩ∈Tree ||x-xΩ||2 (Head) ||H(x)||≥ CH argmaxΩ∈Tree ||xΩ||2 Runtime Guarantee Bohanec-Bratko ‘94 O(n2) Exact Cartis-Thompson ‘13 O(nk) Baraniuk-Jones ‘94 O(n log n) ? Donoho ‘97 O(n) Hegde-Indyk-Schmidt’14 O(n log n + k log2 n) Approx. Head Approx. Tail

OPT[t,i] =max0≤s≤t-1OPT[s,left(i)]+OPT[t-1-s,right(i)]+xi2 Exact algorithm OPT[t,i] = max|Ω|≤t, Ω tree rooted at i ||xΩ||22 Recurrence: If t>0 then OPT[t,i] =max0≤s≤t-1OPT[s,left(i)]+OPT[t-1-s,right(i)]+xi2 If t=0 then OPT[0,i]=0 Space: On level l: 2l *n/2l = n Total: n log n Running time: On level l s.t. 2l<k: n*2l On level l s.t. 2l ≈k: nk On level l+1: nk/2 … Total: O(nk)

Approximate Algorithms Approximate “tail” oracle: Idea: Lagrange relaxation + Pareto curve analysis Approximate “head” oracle: Idea: Submodular maximization

Head approximation Want to approximate: Lemma: The optimal tree can be always broken up into disjoint trees of size O(log n) Greedy approach: compute exact projections with sparsity level O(log n), assemble pieces ||H(x)||≥ CH argmaxΩ∈Tree ||xΩ||2

Head approximation Want to approximate: Greedy approach: Pre-processing: execute DP for exact tree sparsity, sparsity O(log n) Repeat k/log n times: Select the best O(log n)-sparse tree Set the corresponding coordinates to zero Update the data structure ||H(x)||≥ CH argmaxΩ∈Tree ||xΩ||2

Graph sparsity Specification: Parameters: k, g, graph G Coefficients are nodes in G M contains all subgraphs with k nodes and clustered into g connected components Can be generalized by adding edge weight constraints NP-hard

Approximation Algorithms Consider g=1 (single component): min|Ω|≤k, Ω tree ||x[n]-Ω||22 Langrange relaxation: minΩ tree ||x[n]-Ω||22 +λ|Ω| Prize-Collecting Steiner Tree! 2-approximation algorithm [Goemans-Williamson’95] Can get nearly-linear running time using dynamic edge splitting idea [Cole, Hariharan, Lewenstein, Porat, 2001] [Hegde-Indyk-Schmidt, ICML’15]: Improve the runtime Use to solve head/tail formulation for the weighted graph sparsity model Leads to measurement-optimal compressive sensing scheme for a wide collection of models

Compressive sensing Setup: Data/signal in n-dimensional space : x E.g., x is an 256x256 image  n=65536 Goal: compress x into Ax , where A is a m x n “measurement” or “sketch” matrix, m << n Goal: want to recover an “approximation” x* of k-sparse x from Ax+e, i.e., ||x*-x|| C ||e|| Want: Good compression (small m=m(k,n)) Efficient algorithms for encoding and recovery = A x Ax m=O(k log (n/k)) [Candes-Romberg-Tao’04,….] O(n log n) [Needell-Tropp’08, Indyk-Ruzic’08, …]

Model-based compressive sensing Setup: Data/signal in n-dimensional space : x E.g., x is an 256x256 image  n=65536 Goal: compress x into Ax , where A is a m x n “measurement” or “sketch” matrix, m << n Goal: want to recover an “approximation” x* of k-sparse x from Ax+e, i.e., ||x*-x|| C ||e|| Want: Good compression (small m=m(k,n)) m=O(k log (n/k)) [Blumensath-Davies’09] Efficient algorithms for encoding and recovery: Exact model projection [Baraniuk-Cevher-Duarte-Hegde, IT’08] Approximate model projection [Hegde-Indyk-Schmidt, SODA’14] = A x Ax

Grah sparsity: experiments [Hegde-Indyk-Schmidt, ICML’15]

Running time

Conclusions/Open Problems We have seen: Approximation algorithms for structured sparsity in near-linear time Applications to compressive sensing There is more: “Fast Algorithms for Structured Sparsity”, EATCS Bulletin, 2015. Some open questions: Structured matrices: Our analysis assumes matrices with i.i.d. entries Our experiments use partial Fourier/Hadamard Would be nice to extend the analysis to partial Fourier/Hadamard or sparse matrices Hardness: Can we prove that exact tree-sparsity requires kn time (under 3SUM, SETH etc) ?