High-dimensional Error Analysis of Regularized M-Estimators Ehsan AbbasiChristos ThrampoulidisBabak Hassibi Allerton Conference Wednesday September 30,
Linear Regression Model Estimate unknown signal from noisy linear measurements: measurement/design matrix unknown signal noise vector 2
M-estimators For some convex loss function solve: Maximum Likelihood (ML) estimators ? least-squares, least-absolute deviations Huber-loss, etc… Fisher information, consistency, asymptotic normality, Cramer-Rao bound, ML, robust statistics, Huber loss, optimal loss … 3
Why revisit & what changes? Modern: n is increasingly large machine learning, image processing, sensor/social networks, DNA microarrays,... Structured signals: sparse, low-rank, block-sparse, low-varying … Regularized M-estimators Compressive sensing: Traditional: but the ambient dimension n is fixed Regularizer is structure inducing, convex, typically non-smooth L 1, nuclear, L 1 /L 2 norms, total variation … atomic norms 4
Classical question - Modern regime: New results & phenomena High-dimensional Proportional regime ? Question goes back to 50’s (Huber, Kolmogorov…) Only very recent advances, special instances, strict assumptions No general theory! has entries iid Gaussian Assumption: benchmark in CS/statistics theory universality 5
Contribution at a rate Assume has entries iid Gaussian mild regularity conditions on, p z, f, and p x0 Then, with probability one, where is the unique solution to a system of four nonlinear equations in four unknowns : 6
The Equations Let’s parse them, to get some insight … 7
The Explicit ones and appear in the equations explicitly. 8
The Loss and the Regularizer The loss function and the regularizer appear through their Moureau envelope approximations. In the traditional regime instead of the Moureau envelopes the functions themselves appear 9
The Distributions The convolution of the pdf of the noise with a gaussian is a completely new phenomenon compared to the traditional regime 10
The Expected Moureau Envelope The role of and is summarized in how they affect error performance of the M-estimator (strictly) convex and continuously differentiable even if is non-differentiable! generalizes the “Gaussian width” or “Gaussian distance squared” or “statistical dimension”. same for and 11
Reminder: Moureau Envelopes Moureau-Yoshida envelope of evaluated at with parameter : always underestimates f at x. The smaller the τ the closer to f smooth approximation always continuously differentiable in both x and τ ( even if f is non-differentiable ) jointly convex in x and τ optimal v is unique (proximal operator) everything extends to vector-valued function f 12
Examples 13
Set Indicator Function Gaussian width 14
Summarizing Key Features Squared error of general Regularized M-estimators Minimal and generic regularity assumptions – non-smooth, heavy-tails, non-separable, … Key role of Expected Moureau envelopes – strictly convex and smooth – generalize known geometric summary parameters Observation: fast solution by simple iterative scheme! 15
Simulations Optimal tuning? 16
Non-smooth losses 17
Non-smooth losses Optimal loss? 18
Non-smooth losses Consistent Estimators? 19
Heavy-tailed noise Huber loss function + noise iid Cauchy Robustness? 20
Non-separable loss Square-root LASSO 21
Beyond Gaussian Designs analysis framework directly applies to elliptically distributed For the LASSO we have extended ideas to IRO matrices Universality over iid entries (Empirical observation) modified equations 22
Convex Gaussian Min-max Theorem Apply CGMT to (PO) (AO) Theorem (CGMT) [TAH’15,TOH’15] 23
Proof Diagram M-estimator (PO) Duality (AO) (DO) Deterministic min-max Optimization in 4 variables CGMT The Equations First-order optimality conditions 24
Related Literature [El Karoui 2013,2015] Ridge regularization, smooth loss, no structured x 0 Ellpitical distributions iid entries beyond Gaussian [Donoho, Montanari 2013] No regularizer smooth+strongly convex, bounded noise 25
Conclusions Master Theorem for general M-estimators – Minimal assumptions – 4 nonlinear equations, unique solution, fast iterative solution (why?) – Summary parameters: Expected Moureau envelopes Opportunities, lots to be asked… Optimal loss-function? optimal Regularizer? When can we be consistent? Optimally tuning tuning parameter? LASSO: Linear = Non-linear[TAH’15 NIPS] CGMT framework is powerful non-linear measurements, y=g(Ax 0 ) Beyond squared error analysis… Apply CGMT for different set S… [TAYH’15 ICASSP] 26