1 Sparsity Control for Robustness and Social Data Analysis Gonzalo Mateos ECE Department, University of Minnesota Acknowledgments: Profs. Georgios B. Giannakis,

Slides:



Advertisements
Similar presentations
Bayesian Belief Propagation
Advertisements

1 Closed-Form MSE Performance of the Distributed LMS Algorithm Gonzalo Mateos, Ioannis Schizas and Georgios B. Giannakis ECE Department, University of.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: The Linear Prediction Model The Autocorrelation Method Levinson and Durbin.
Distributed Nuclear Norm Minimization for Matrix Completion
Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.
Pixel Recovery via Minimization in the Wavelet Domain Ivan W. Selesnick, Richard Van Slyke, and Onur G. Guleryuz *: Polytechnic University, Brooklyn, NY.
A Fast and Compact Method for Unveiling Significant Patterns in High-Speed Networks Tian Bu 1, Jin Cao 1, Aiyou Chen 1, Patrick P. C. Lee 2 Bell Labs,
Exploiting Sparse Markov and Covariance Structure in Multiresolution Models Presenter: Zhe Chen ECE / CMR Tennessee Technological University October 22,
Model assessment and cross-validation - overview
CMPUT 466/551 Principal Source: CMU
Shape From Light Field meets Robust PCA
Chapter 2: Lasso for linear models
Brian Baingana, Gonzalo Mateos and Georgios B. Giannakis Dynamic Structural Equation Models for Tracking Cascades over Social Networks Acknowledgments:
Bayesian Robust Principal Component Analysis Presenter: Raghu Ranganathan ECE / CMR Tennessee Technological University January 21, 2011 Reading Group (Xinghao.
Robust Network Compressive Sensing Lili Qiu UT Austin NSF Workshop Nov. 12, 2014.
1 Morteza Mardani, Gonzalo Mateos and Georgios Giannakis ECE Department, University of Minnesota Acknowledgment: AFOSR MURI grant no. FA
Kernel Methods for De-noising with Neuroimaging Application
Probabilistic Aggregation in Distributed Networks Ling Huang, Ben Zhao, Anthony Joseph and John Kubiatowicz {hling, ravenben, adj,
Principal Component Analysis
Distributed Regression: an Efficient Framework for Modeling Sensor Network Data Carlos Guestrin Peter Bodik Romain Thibaux Mark Paskin Samuel Madden.
Context Compression: using Principal Component Analysis for Efficient Wireless Communications Christos Anagnostopoulos & Stathes Hadjiefthymiades Pervasive.
Dimensional reduction, PCA
Machine Learning CUNY Graduate Center Lecture 3: Linear Regression.
Basic Concepts and Definitions Vector and Function Space. A finite or an infinite dimensional linear vector/function space described with set of non-unique.
Support Vector Regression David R. Musicant and O.L. Mangasarian International Symposium on Mathematical Programming Thursday, August 10, 2000
Mehdi Ghayoumi MSB rm 132 Ofc hr: Thur, a Machine Learning.
Laurent Itti: CS599 – Computational Architectures in Biological Vision, USC Lecture 7: Coding and Representation 1 Computational Architectures in.
Sparsity-Aware Adaptive Algorithms Based on Alternating Optimization and Shrinkage Rodrigo C. de Lamare* + and Raimundo Sampaio-Neto * + Communications.
1 Decentralized Jointly Sparse Optimization by Reweighted Lq Minimization Qing Ling Department of Automation University of Science and Technology of China.
1 Unveiling Anomalies in Large-scale Networks via Sparsity and Low Rank Morteza Mardani, Gonzalo Mateos and Georgios Giannakis ECE Department, University.
1 Exact Recovery of Low-Rank Plus Compressed Sparse Matrices Morteza Mardani, Gonzalo Mateos and Georgios Giannakis ECE Department, University of Minnesota.
Perceptual and Sensory Augmented Computing Machine Learning, WS 13/14 Machine Learning – Lecture 14 Introduction to Regression Bastian Leibe.
Cs: compressed sensing
Handling Outliers and Missing Data in Statistical Data Models Kaushik Mitra Date: 17/1/2011 ECSU Seminar, ISI.
Soft Sensor for Faulty Measurements Detection and Reconstruction in Urban Traffic Department of Adaptive systems, Institute of Information Theory and Automation,
EMIS 8381 – Spring Netflix and Your Next Movie Night Nonlinear Programming Ron Andrews EMIS 8381.
Machine Learning Seminar: Support Vector Regression Presented by: Heng Ji 10/08/03.
1 Sparsity Control for Robust Principal Component Analysis Gonzalo Mateos and Georgios B. Giannakis ECE Department, University of Minnesota Acknowledgments:
Chengjie Sun,Lei Lin, Yuan Chen, Bingquan Liu Harbin Institute of Technology School of Computer Science and Technology 1 19/11/ :09 PM.
Statistical Sampling-Based Parametric Analysis of Power Grids Dr. Peng Li Presented by Xueqian Zhao EE5970 Seminar.
Algorithms for Wireless Sensor Networks Marcela Boboila, George Iordache Computer Science Department Stony Brook University.
Dimension reduction : PCA and Clustering Slides by Agnieszka Juncker and Chris Workman modified by Hanne Jarmer.
Gap-filling and Fault-detection for the life under your feet dataset.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: ML and Simple Regression Bias of the ML Estimate Variance of the ML Estimate.
Inference of Poisson Count Processes using Low-rank Tensor Data Juan Andrés Bazerque, Gonzalo Mateos, and Georgios B. Giannakis May 29, 2013 SPiNCOM, University.
Multi-area Nonlinear State Estimation using Distributed Semidefinite Programming Hao Zhu October 15, 2012 Acknowledgements: Prof. G.
B. Baingana, E. Dall’Anese, G. Mateos and G. B. Giannakis Acknowledgments: NSF Grants , , , , ARO W911NF
Rank Minimization for Subspace Tracking from Incomplete Data
DISTIN: Distributed Inference and Optimization in WSNs A Message-Passing Perspective SCOM Team
1 Robust Nonparametric Regression by Controlling Sparsity Gonzalo Mateos and Georgios B. Giannakis ECE Department, University of Minnesota Acknowledgments:
CS Statistical Machine learning Lecture 12 Yuan (Alan) Qi Purdue CS Oct
1 Consensus-Based Distributed Least-Mean Square Algorithm Using Wireless Ad Hoc Networks Gonzalo Mateos, Ioannis Schizas and Georgios B. Giannakis ECE.
Energy-Efficient Signal Processing and Communication Algorithms for Scalable Distributed Fusion.
Logistic Regression & Elastic Net
Collaborative Filtering via Euclidean Embedding M. Khoshneshin and W. Street Proc. of ACM RecSys, pp , 2010.
Smart Sleeping Policies for Wireless Sensor Networks Venu Veeravalli ECE Department & Coordinated Science Lab University of Illinois at Urbana-Champaign.
Computational Sensing = Modeling + Optimization CENS seminar Jan 28, 2005 Miodrag Potkonjak Key Contributors: Bradley Bennet, Alberto.
Collaborative filtering applied to real- time bidding.
Introduction to several works and Some Ideas Songcan Chen
1 C.A.L. Bailer-Jones. Machine Learning. Data exploration and dimensionality reduction Machine learning, pattern recognition and statistical data modelling.
Jeremy Watt and Aggelos Katsaggelos Northwestern University
LECTURE 09: BAYESIAN ESTIMATION (Cont.)
Computing and Compressive Sensing in Wireless Sensor Networks
ROBUST SUBSPACE LEARNING FOR VISION AND GRAPHICS
Outlier Processing via L1-Principal Subspaces
Regularized risk minimization
USPACOR: Universal Sparsity-Controlling Outlier Rejection
Machine Learning Feature Creation and Selection
Probabilistic Models with Latent Variables
Presentation transcript:

1 Sparsity Control for Robustness and Social Data Analysis Gonzalo Mateos ECE Department, University of Minnesota Acknowledgments: Profs. Georgios B. Giannakis, M. Kaveh G. Sapiro, N. Sidiropoulos, and N. Waller MURI (AFOSR FA ) grant Minneapolis, MN December 9, 2011

2 2 Learning from “ Big Data ” `Data are widely available, what is scarce is the ability to extract wisdom from them ’ Hal Varian, Google ’ s chief economist BIG Fast Productive Revealing Ubiquitous Smart K. Cukier, ``Harnessing the data deluge,'' Nov Messy

3 3 Social-Computational Systems The means: leverage dual role of sparsity  Complexity control through variable selection  Robustness to outliers Complex systems of people and computers The vision: preference measurement (PM), analysis, management  Understand and engineer SoCS

4 4 Conjoint analysis Marketing, healthcare, psychology [Green-Srinivasan ‘ 78] Success story [Wind et al ’ 89] Attributes: room size, TV options, restaurant, transportation Goal: learn consumer ’ s utility function from preference data  Linear utilities: `How much is each part worth? ’ Optimal design and positioning of new products  Strategy: describe products by a set of attributes, `parts ’

5 5 Modeling preliminaries Respondents (e.g., consumers)  Rate profiles Each comprises attributes Linear utility: estimate vector of partworths Conjoint data collection formats (M1) Metric ratings: (M2) Choice-based conjoint data: Online SoCS-based preference data exponentially increases  Inconsistent/corrupted/irrelevant data Outliers

6  residuals discarded 6 Robustifying PM Least-trimmed squares [Rousseeuw ’ 87] (LTS) Q: How should we go about minimizing nonconvex (LTS)? A: Try all subsets of size, solve, and pick the best  is the -th order statistic among G. Mateos, V. Kekatos, and G. B. Giannakis, ``Exploiting sparsity in model residuals for robust conjoint analysis,'' Marketing Sci., Dec (submitted). Simple but intractable beyond small problems  Near optimal solvers [Rousseeuw ’ 06], RANSAC [Fischler-Bolles ’ 81]

7 7 Modeling outliers Outlier variables s.t. outlier otherwise  Both and unknown, typically sparse! Natural (but intractable) nonconvex estimator  Nominal ratings obey (M1); outliers something else -contamination [Fuchs ’ 99], Bayesian model [Jin-Rao ’ 10]

8 8 LTS as sparse regression Lagrangian form  Tuning parameter controls sparsity in number of outliers (P0)  Formally justifies the preference model and its estimator (P0)  Ties sparse regression with robust estimation Proposition 1: If solves (P0) with chosen s.t., then in (LTS).

9 9 Just relax! (P1)  (P1) convex, and thus efficiently solved  Role of sparsity-controlling is central Q: Does (P1) yield robust estimates ? A: Yap! Huber estimator is a special case where (P0) is NP-hard relax e.g., [Tropp ’ 06]

10 Lassoing outliers Suffices to solve Lasso [Tibshirani ’ 94] Data-driven methods to select  Lasso solvers return entire robustification path (RP) Proposition 2:, Minimizers of (P1) are Coeffs. Decreasing

11 Nonconvex regularization Nonconvex penalty terms approximate better in (P0) Options: SCAD [Fan-Li ’ 01], or sum-of-logs [Candes et al ’ 08] Iterative linearization-minimization of around  Initialize with, use  Bias reduction (cf. adaptive Lasso [Zou ’ 06])

12 Comparison with RANSAC, i.i.d. Nominal: Outliers:

13 Nonparametric regression If one trusts data more than any parametric model  Go nonparametric regression:  lives in a space of “ smooth ’’ functions Ill-posed problem  Workaround: regularization [Tikhonov ’ 77], [Wahba ’ 90]  RKHS with kernel and norm Interactions among attributes?  Not captured by  Driven by complex mechanisms hard to model

14 Function approximation True function Nonrobust predictions Robust predictionsRefined predictions  Effectiveness in rejecting outliers is apparent G. Mateos and G. B. Giannakis, ``Robust nonparametric regression via sparsity control with application to load curve data cleansing,'' IEEE Trans. Signal Process., 2012

15 Load curve data cleansing Load curve: electric power consumption recorded periodically  Reliable data: key to realize smart grid vision [Hauser ’ 09] Uruguay ’ s power consumption (MW)  Faulty meters, communication errors  Unscheduled maintenance, strikes, sport events B-splines for load curve prediction and denoising [Chen et al ’ 10]

16 NorthWrite data Data: courtesy of NorthWrite Energy Group, provided by Prof. V. Cherkassky  Outliers: “ Building operational transition shoulder periods ”  No manual labeling of outliers [Chen et al ’ 10] Energy consumption of a government building ( ’ 05- ’ 10)  Robust smoothing spline estimator, hours

17 Principal Component Analysis Our goal: robustify PCA by controlling outlier sparsity Motivation: (statistical) learning from high-dimensional data Principal component analysis (PCA) [Pearson ’ 1901]  Extraction of low-dimensional data structure  Data compression and reconstruction  PCA is non-robust to outliers [Jolliffe ’ 86] DNA microarray Traffic surveillance

18 Our work in context Robust PCA  Robust covariance matrix estimators [Campbell ’ 80], [Huber ’ 81]  Computer vision [Xu-Yuille ’ 95], [De la Torre-Black ’ 03]  Low-rank matrix recovery from sparse errors, e.g., [Wright et al ’ 09] Contemporary applications tied to SoCS  Anomaly detection in IP networks [Huang et al ’ 07], [Kim et al ’ 09]  Video surveillance, e.g., [Oliver et al ’ 99]  Matrix completion for collaborative filtering, e.g., [Candes et al ’ 09]

19 PCA formulations Training data Minimum reconstruction error  Compression operator  Reconstruction operator Maximum variance Component analysis model Solution:

20 Robustifying PCA Outlier-aware model G. Mateos and G. B. Giannakis, ``Robust PCA as bilinear decomposition with outlier sparsity regularization,'' IEEE Trans. Signal Process., Nov (submitted).  Interpret: blind preference model with latent profiles (P2)  -norm counterpart tied to (LTS PCA)  (P2) subsumes optimal (vector) Huber  -norm regularization for entry-wise outliers

21 Alternating minimization (P2)  update: SVD of outlier-compensated data  update: row-wise vector soft-thresholding Proposition 3: Alg. 1 ’ s iterates converge to a stationary point of (P2). 1

22 Video surveillance Data: OriginalPCARobust PCA `Outliers ’

23 Big Five personality factors Five dimensions of personality traits [Goldberg ’ 93][Costa-McRae ’ 92]  Measure the Big Five  Short-questionnaire (44 items)  Rate 1-5, e.g., `I see myself as someone who … … is talkative ’ … is full of energy ’ Big Five Inventory (BFI) Handbook of personality: Theory and research, O. P. John, R. W. Robins, and L. A. Pervin, Eds. New York, NY: Guilford Press,  Discovered through factor analysis  WEIRD subjects

24 BFI data 24 Robust PCA identifies 8 outlying subjects  Validated via `inconsistency ’ scores, e.g., VRIN [Tellegen ’ 88] Eugene-Springfield community sample [Goldberg ’ 08]  subjects, item responses, factors Data: courtesy of Prof. L. Goldberg, provided by Prof. N. Waller

25 Online robust PCA Motivation: Real-time data and memory limitations Exponentially-weighted robust PCA  At time, do not re-estimate

26 Online PCA in action Outliers: Nominal:

27 Robust kernel PCA Kernel (K)PCA [ Scholkopf ‘ 97 ]  Challenge: -dimensional Kernel trick: Input space Feature space Related to spectral clustering

28 Unveiling communities Data: Network: NCAA football teams (nodes), F ’ 00 games (edges)  teams, kernel  Identified exactly: Big 10, Big 12, ACC, SEC, Big East  Outliers: Independent teams ARI=0.8967

29 Spectrum cartography Goal: find s.t. is the spectrum at position Approach: Basis expansion model for, nonparametric basis pursuit Idea: collaborate to form a spatial map of the spectrum SPECTRUM MAPSPECTRUM MAP OriginalEstimated J. A. Bazerque, G. Mateos, and G. B. Giannakis, ``Group-Lasso on splines for spectrum cartography,'' IEEE Trans. Signal Process., Oct

30 Technical Approaches:  Consensus-based in-network operation in ad hoc WSNs  Distributed optimization using alternating-direction methods  Online learning of statistics using stochastic approximation  Performance analysis via stochastic averaging Distributed adaptive algorithms Issues and Significance:  Fast varying (non-)stationary processes  Unavailability of statistical information  Online incorporation of sensor data  Noisy communication links Improved learning through cooperation G. Mateos, I. D. Schizas, and G. B. Giannakis, ``Distributed recursive least-squares for consensus-based in-network adaptive estimation,' ‘ IEEE Trans. Signal Process., Nov Wireless sensor

31 Unveiling network anomalies Anomalies across flows and timeEnhanced detection capabilities Approach: Flag anomalies across flows and time via sparsity and low rank Payoff: Ensure high performance, QoS, and security in IP networks M. Mardani, G. Mateos, and G. B. Giannakis, ``Unveiling network anomalies across flows and time via sparsity and low rank,'' IEEE Trans. Inf. Theory, Dec 2011 (submitted).

32 OUTLIER-RESILIENT ESTIMATION SIGNAL PROCESSING LASSO 32 Concluding summary Research issues addressed  Sparsity control for robust metric and choice-based PM  Kernel-based nonparametric utility estimation  Robust (kernel) principal component analysis  Scalable distributed real-time implementations Control sparsity in model residuals for robust learning Application domains  Preference measurement and conjoint analysis  Psychometrics, personality assessment  Video surveillance  Social and power networks Experimental validation with GPIPP personality ratings ( ~ 6M) Gosling-Potter Internet Personality Project (GPIPP) -