1 Robust Nonparametric Regression by Controlling Sparsity Gonzalo Mateos and Georgios B. Giannakis ECE Department, University of Minnesota Acknowledgments:

Slides:



Advertisements
Similar presentations
Bayesian Learning & Estimation Theory
Advertisements

Pattern Recognition and Machine Learning
1 Closed-Form MSE Performance of the Distributed LMS Algorithm Gonzalo Mateos, Ioannis Schizas and Georgios B. Giannakis ECE Department, University of.
Distributed Nuclear Norm Minimization for Matrix Completion
Edge Preserving Image Restoration using L1 norm
Prediction with Regression
Pattern Recognition and Machine Learning
Kriging.
Computer vision: models, learning and inference Chapter 8 Regression.
Chap 10: Summarizing Data 10.1: INTRO: Univariate/multivariate data (random samples or batches) can be described using procedures to reveal their structures.
Manifold Sparse Beamforming
Model Assessment and Selection
Model Assessment, Selection and Averaging
Model assessment and cross-validation - overview
CMPUT 466/551 Principal Source: CMU
Chapter 2: Lasso for linear models
Data mining and statistical learning - lecture 6
Basis Expansion and Regularization
Coefficient Path Algorithms Karl Sjöstrand Informatics and Mathematical Modelling, DTU.
Curve-Fitting Regression
An Optimal Learning Approach to Finding an Outbreak of a Disease Warren Scott Warren Powell
Machine Learning CUNY Graduate Center Lecture 3: Linear Regression.
Support Vector Regression David R. Musicant and O.L. Mangasarian International Symposium on Mathematical Programming Thursday, August 10, 2000
Mehdi Ghayoumi MSB rm 132 Ofc hr: Thur, a Machine Learning.
1 An Introduction to Nonparametric Regression Ning Li March 15 th, 2004 Biostatistics 277.
Sparsity-Aware Adaptive Algorithms Based on Alternating Optimization and Shrinkage Rodrigo C. de Lamare* + and Raimundo Sampaio-Neto * + Communications.
Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.
1 Sparsity Control for Robustness and Social Data Analysis Gonzalo Mateos ECE Department, University of Minnesota Acknowledgments: Profs. Georgios B. Giannakis,
1 Unveiling Anomalies in Large-scale Networks via Sparsity and Low Rank Morteza Mardani, Gonzalo Mateos and Georgios Giannakis ECE Department, University.
Chapter 15 Modeling of Data. Statistics of Data Mean (or average): Variance: Median: a value x j such that half of the data are bigger than it, and half.
Perceptual and Sensory Augmented Computing Machine Learning, WS 13/14 Machine Learning – Lecture 14 Introduction to Regression Bastian Leibe.
Shrinkage Estimation of Vector Autoregressive Models Pawin Siriprapanukul 11 January 2010.
Aug. 27, 2003IFAC-SYSID2003 Functional Analytic Framework for Model Selection Masashi Sugiyama Tokyo Institute of Technology, Tokyo, Japan Fraunhofer FIRST-IDA,
1 Sparsity Control for Robust Principal Component Analysis Gonzalo Mateos and Georgios B. Giannakis ECE Department, University of Minnesota Acknowledgments:
Perceptual and Sensory Augmented Computing Advanced Machine Learning Winter’12 Advanced Machine Learning Lecture 3 Linear Regression II Bastian.
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 3: LINEAR MODELS FOR REGRESSION.
Curve-Fitting Regression
Overview of Supervised Learning Overview of Supervised Learning2 Outline Linear Regression and Nearest Neighbors method Statistical Decision.
SUPA Advanced Data Analysis Course, Jan 6th – 7th 2009 Advanced Data Analysis for the Physical Sciences Dr Martin Hendry Dept of Physics and Astronomy.
The Group Lasso for Logistic Regression Lukas Meier, Sara van de Geer and Peter Bühlmann Presenter: Lu Ren ECE Dept., Duke University Sept. 19, 2008.
Basis Expansions and Regularization Part II. Outline Review of Splines Wavelet Smoothing Reproducing Kernel Hilbert Spaces.
AAAI 2011, San Francisco Trajectory Regression on Road Networks Tsuyoshi Idé (IBM Research – Tokyo) Masashi Sugiyama (Tokyo Institute of Technology)
Inference of Poisson Count Processes using Low-rank Tensor Data Juan Andrés Bazerque, Gonzalo Mateos, and Georgios B. Giannakis May 29, 2013 SPiNCOM, University.
Regularization and Feature Selection in Least-Squares Temporal Difference Learning J. Zico Kolter and Andrew Y. Ng Computer Science Department Stanford.
High-dimensional Error Analysis of Regularized M-Estimators Ehsan AbbasiChristos ThrampoulidisBabak Hassibi Allerton Conference Wednesday September 30,
CHAPTER 17 O PTIMAL D ESIGN FOR E XPERIMENTAL I NPUTS Organization of chapter in ISSO –Background Motivation Finite sample and asymptotic (continuous)
Multi-area Nonlinear State Estimation using Distributed Semidefinite Programming Hao Zhu October 15, 2012 Acknowledgements: Prof. G.
Chapter1: Introduction Chapter2: Overview of Supervised Learning
Rank Minimization for Subspace Tracking from Incomplete Data
Machine Learning 5. Parametric Methods.
Ridge Regression: Biased Estimation for Nonorthogonal Problems by A.E. Hoerl and R.W. Kennard Regression Shrinkage and Selection via the Lasso by Robert.
1 Consensus-Based Distributed Least-Mean Square Algorithm Using Wireless Ad Hoc Networks Gonzalo Mateos, Ioannis Schizas and Georgios B. Giannakis ECE.
EE 551/451, Fall, 2006 Communication Systems Zhu Han Department of Electrical and Computer Engineering Class 15 Oct. 10 th, 2006.
Machine Learning CUNY Graduate Center Lecture 6: Linear Regression II.
Basis Expansions and Generalized Additive Models Basis expansion Piecewise polynomials Splines Generalized Additive Model MARS.
Introduction to several works and Some Ideas Songcan Chen
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
1 C.A.L. Bailer-Jones. Machine Learning. Data exploration and dimensionality reduction Machine learning, pattern recognition and statistical data modelling.
Ch3: Model Building through Regression
Workshop on Methods for Genomic Selection (El Batán, July 15, 2013) Paulino Pérez & Gustavo de los Campos.
Machine learning, pattern recognition and statistical data modelling
Regularized risk minimization
USPACOR: Universal Sparsity-Controlling Outlier Rejection
Sparse Regression-based Hyperspectral Unmixing
Biointelligence Laboratory, Seoul National University
CRISP: Consensus Regularized Selection based Prediction
Introduction to Sensor Interpretation
Introduction to Sensor Interpretation
Presentation transcript:

1 Robust Nonparametric Regression by Controlling Sparsity Gonzalo Mateos and Georgios B. Giannakis ECE Department, University of Minnesota Acknowledgments: NSF grants no. CCF , EECS , May 24, 2011

2 2 Nonparametric regression If one trusts data more than any parametric model  Then go nonparametric regression:  lives in a (possibly -dimensional) space of “ smooth ’’ functions Our focus  Nonparametric regression robust against outliers  Robustness by controlling sparsity Ill-posed problem  Workaround: regularization [Tikhonov ’ 77], [Wahba ’ 90]  RKHS with reproducing kernel and norm Given, function estimation allows predicting  Estimate unknown from a training data set

3 3 Our work in context Robust nonparametric regression  Huber ’ s function [Zhu et al ’ 08]  No systematic way to select thresholds Robustness and sparsity in linear (parametric) regression  Huber ’ s M-type estimator as Lasso [Fuchs ‘ 99]; contamination model  Bayesian framework [Jin-Rao ‘ 10][Mitra et al ’ 10]; rigid choice of Noteworthy applications  Load curve data cleansing [Chen et al ’ 10]  Spline-based PSD cartography [Bazerque et al ’ 09]

4 4 Variational LTS Least-trimmed squares (LTS) regression [Rousseeuw ’ 87] Variational (V)LTS counterpart  is the -th order statistic among Simple but intractable beyond small problems (VLTS)  residuals discarded Q: How should we go about minimizing ? (VLTS) is nonconvex; existence of minimizer(s)? A : Try all subsamples of size, solve, and pick the best

5 5 Modeling outliers  Nominal data obey ; outliers something else Remarks  Both and are unknown  If outliers sporadic, then vector is sparse! Natural (but intractable) nonconvex estimator Outlier variables s.t. outlier otherwise

6 6 VLTS as sparse regression Lagrangian form (P0) The equivalence  Formally justifies the regression model and its estimator (P0)  Ties sparse regression with robust estimation  Tuning parameter controls sparsity in number of outliers Proposition 1: If solves (P0) with chosen s.t., then solves (VLTS) too.

7 7 Just relax! (P1)  (P1) convex, and thus efficiently solved  Role of sparsity controlling is central Q: Does (P1) yield robust estimates ? A: Yap! Huber estimator is a special case where (P0) is NP-hard relax

8 8 Alternating minimization (P1) jointly convex in AM solver Remarks  Single Cholesky factorization of  Soft-thresholding  Reveals the intertwining between Outlier identification Function estimation with outlier compensated data (P1)

9 9 Lassoing outliers Enables effective methods to select  Lasso solvers return entire robustification path (RP) Cross-validation (CV) fails with multiple outliers [Hampel ’ 86] Proposition 2: as and, with Minimizers of (P1) are fully determined by w/ Alternative to AM solve Lasso [Tibshirani ’ 94]

10 Robustification paths Lasso path of solutions is piecewise linear  LARS returns whole RP [Efron ’ 03]  Same cost of a single LS fit ( ) Lasso is simple in the scalar case  Coordinate descent is fast! [Friedman ‘ 07]  Exploits warm starts, sparsity  Other solvers: SpaRSA [Wright et al ’ 09], SPAMS [Mairal et al ’ 10] Coeffs. Leverage these solvers consider 2-D grid  values of  For each, values of

11 Selecting and  Number of outliers known: from RP, obtain range of s.t.. Discard outliers (known), and use CV to determine Relies on RP and knowledge on the data model  Variance of the nominal noise known: from RP, for each on the grid, obtain an entry of the sample variance matrix as The best are s.t.  Variance of the nominal noise unknown: replace above with a robust estimate, e.g., median absolute deviation (MAD)

12 Nonconvex regularization Nonconvex penalty terms approximate better in (P0) Options: SCAD [Fan-Li ’ 01], or sum-of-logs [Candes et al ’ 08] Iterative linearization-minimization of around Remarks  Initialize with, use and  Bias reduction (cf. adaptive Lasso [Zou ’ 06])

13 Robust thin-plate splines Specialize to thin-plate splines [Duchon ’ 77], [Wahba ’ 80] Smoothing penalty only a seminorm in Still, Proposition 2 holds for appropriate Solution:  Radial basis function  Augment w/ member of the nullspace of  Given, unknowns found in closed form

14 Simulation setup Training set : noisy samples of Gaussian mixture examples, i.i.d. Outliers: i.i.d. for True function Data Nominal: w/ i.i.d. ( known)

15 Robustification paths Grid parameters:  grid: Outlier Inlier Paths obtained using SpaRSA [Wright et al ’ 09]

16 Results True function Nonrobust predictions Robust predictionsRefined predictions Effectiveness in rejecting outliers is apparent

17 Generalization capability In all cases, 100% outlier identification success rate Figures of merit  Training error:  Test error: Nonconvex refinement leads to consistently lower

18 Load curve data cleansing Load curve: electric power consumption recorded periodically Reliable data: key to realize smart grid vision Uruguay ’ s aggregate power consumption (MW) Deviation from nominal models (outliers) Faulty meters, communication errors Unscheduled maintenance, strikes, sporting events B-splines for load curve prediction and denoising [Chen et al ’ 10]

19 Real data tests Nonrobust predictions Robust predictionsRefined predictions

20 Concluding summary Robust nonparametric regression  VLTS as -(pseudo)norm regularized regression (NP-hard)  Convex relaxation variational M-type estimator Lasso Real data tests for load curve cleansing Controlling sparsity amounts to controlling number of outliers  Sparsity controlling role of is central  Selection of using the Lasso robustification paths  Different options dictated by available knowledge on the data model Refinement via nonconvex penalty terms  Bias reduction and improved generalization capability