Download presentation
Presentation is loading. Please wait.
Published byAndrea Barker Modified over 9 years ago
1
Handling Outliers and Missing Data in Statistical Data Models Kaushik Mitra Date: 17/1/2011 ECSU Seminar, ISI
2
Statistical Data Models Goal: Find structure in data Applications – Finance – Engineering – Sciences Biological – Wherever we deal with data Some examples – Regression – Matrix factorization Challenges: Outliers and Missing data
3
Outliers Are Quite Common Google search results for `male faces’
4
Need to Handle Outliers Properly Noisy imageGaussian filtered image Desired result Removing salt-and-pepper (outlier) noise
5
Missing Data Problem Completing missing tracks Incomplete tracks Completed tracks by a sub-optimal method Desired result Missing tracks in structure from motion
6
Our Focus Outliers in regression – Linear regression – Kernel regression Matrix factorization in presence of missing data
7
Robust Linear Regression for High Dimension Problems
8
What is Regression? Regression – Find functional relation between y and x x: independent variable y: dependent variable – Given data: (y i,x i ) pairs Model y = f(x, w)+n – Estimate w – Predict y for a new x
9
Robust Regression Real world data corrupted with outliers Outliers make estimates unreliable Robust regression – Unknown Parameter, w Outliers – Combinatorial problem N data and k outliers C(N,k) ways
10
Prior Work Combinatorial algorithms – Random sample consensus (RANSAC) – Least Median Squares (LMedS) Exponential in dimension M-estimators – Robust cost functions – local minima
11
Robust Linear Regression model Linear regression model : y i =x i T w+e i – e i, Gaussian noise Proposed robust model: e i =n i +s i – n i, inlier noise (Gaussian) – s i, outlier noise (sparse) Matrix-vector form – y=Xw+n+s Estimate w, s y1y2..yNy1y2..yN x1Tx2T..xNTx1Tx2T..xNT n 1 n 2. n N s 1 s 2. s N = + + w 1 w 2. w D
12
Simplification Objective (RANSAC): Find w that minimizes the number of outliers Eliminate w Model: y=Xw+n+s Premultiple by C : CX=0, N ≥ D – Cy=CXw+Cs+Cn – z=Cs+g – g Gaussian Problem becomes: Solve for s -> identify outliers -> LS -> w
13
Relation to Sparse Learning Solve: – Combinatorial problem Sparse Basis Selection/ Sparse Learning Two approaches : – Basis Pursuit (Chen, Donoho, Saunder 1995) – Bayesian Sparse Learning (Tipping 2001)
14
Basis Pursuit Robust regression (BPRR) Solve – Basis Pursuit Denoising (Chen et. al. 1995) – Convex problem – Cubic complexity : O(N 3 ) From Compressive Sensing theory (Candes 2005) – Equivalent to original problem if s is sparse C satisfy Restricted Isometry Property (RIP) Isometry: ||s 1 - s 2 || = ||C(s 1 – s 2 )|| Restricted: to the class of sparse vectors In general, no guarantees for our problem
15
Bayesian Sparse Robust Regression (BSRR) Sparse Bayesian learning technique (Tipping 2001) – Puts a sparsity promoting prior on s : – Likelihood : p(z/s)=Ν(Cs,εI) – Solves the MAP problem p(s/z) – Cubic Complexity : O(N 3 )
16
Setup for Empirical Studies Synthetically generated data Performance criteria – Angle between ground truth and estimated hyper-planes
17
Vary Outlier Fraction BSRR performs well in all dimensions Combinatorial algorithms like RANSAC, MSAC, LMedS not practical in high dimensions Dimension = 2 Dimension = 8 Dimension = 32
18
Facial Age Estimation Fgnet dataset : 1002 images of 82 subjects Regression – y : Age – x: Geometric feature vector
19
Outlier Removal by BSRR Label data as inliers and outliers Detected 177 outliers in 1002 images BSRR Inlier MAE3.73 Outlier MAE19.14 Overall MAE6.45 Leave-one-out testing
20
Summary for Robust Linear Regression Modeled outliers as sparse variable Formulated robust regression as Sparse Learning problem – BPRR and BSRR BSRR gives the best performance Limitation: linear regression model – Kernel model
21
Robust RVM Using Sparse Outlier Model
22
Relevance Vector Machine (RVM) RVM model: – : kernel function Examples of kernels – k(x i, x j ) = (x i T x j ) 2 : polynomial kernel – k(x i, x j ) = exp( -||x i - x j || 2 /2σ 2 ) : Gaussian kernel Kernel trick: k(x i,x j ) = ψ(x i ) T ψ(x j ) – Map x i to feature space ψ(x i )
23
RVM: A Bayesian Approach Bayesian approach – Prior distribution : p(w) – Likelihood : Prior specification – p(w) : sparsity promoting prior p(w i ) = 1/|w i | – Why sparse? Use a smaller subset of training data for prediction Support vector machine Likelihood – Gaussian noise Non-robust : susceptible to outliers
24
Robust RVM model Original RVM model – e, Gaussian noise Explicitly model outliers, e i = n i + s i – n i, inlier noise (Gaussian) – s i, outlier noise (sparse and heavy-tailed) Matrix vector form – y = Kw + n + s Parameters to be estimated: w and s
25
Robust RVM Algorithms y = [K|I]w s + n – w s = [w T s T ] T : sparse vector Two approaches – Bayesian – Optimization
26
Robust Bayesian RVM (RB-RVM) Prior specification – w and s independent : p(w, s) = p(w)p(s) – Sparsity promoting prior for s: p(s i )= 1/|s i | Solve for posterior p(w, s|y) Prediction: use w inferred above Computation: a bigger RVM – w s instead of w – [K|I] instead of K
27
Basis Pursuit RVM (BP-RVM) Optimization approach – Combinatorial Closest convex approximation From compressive sensing theory – Same solution if [K|I] satisfies RIP In general, can not guarantee
28
Experimental Setup
29
Prediction : Asymmetric Outliers Case
30
Image Denoising Salt and pepper noise – Outliers Regression formulation – Image as a surface over 2D grid y: Intensity x: 2D grid Denoised image obtained by prediction
31
Salt and Pepper Noise
32
Some More Results RVMRB-RVMMedian Filter
33
Age Estimation from Facial Images RB-RVM detected 90 outliers Leave-one-person-out testing
34
Summary for Robust RVM Modeled outliers as sparse variables Jointly estimated parameter and outliers Bayesian approach gives very good result
35
Limitations of Regression Regression: y = f(x,w)+n – Noise in only “y” – Not always reasonable All variables have noise – M = [x 1 x 2 … x N ] – Principal component analysis (PCA) [x 1 x 2 … x N ] = AB T – A: principal components – B: coefficients – M = AB T : matrix factorization (our next topic)
36
Matrix Factorization in the presence of Missing Data
37
Applications in Computer Vision Matrix factorization: M=AB T Applications: build 3-D models from images – Geometric approach (Multiple views) – Photometric approach (Multiple Lightings) 37 Structure from Motion (SfM) Photometric stereo
38
Matrix Factorization Applications in Vision – Affine Structure from Motion (SfM) – Photometric stereo Solution: SVD – M=USV T – Truncate S to rank r A=US 0.5, B=VS 0.5 38 M = x ij y ij = CS T Rank 4 matrix M = NS T, rank = 3
39
Missing Data Scenario Missed feature tracks in SfM Specularities and shadow in photometric stereo 39 Incomplete feature tracks
40
Challenges in Missing Data Scenario Can’t use SVD Solve: W: binary weight matrix, λ: regularization parameter Challenges – Non-convex problem – Newton’s method based algorithm (Buchanan et. al. 2005) Very slow Design algorithm – Fast (handle large scale data) – Flexible enough to handle additional constraints Ortho-normality constraints in ortho-graphic SfM
41
Proposed Solution Formulate matrix factorization as a low-rank semidefinite program (LRSDP) – LRSDP: fast implementation of SDP (Burer, 2001) Quasi-Newton algorithm Advantages of the proposed formulation: – Solve large-scale matrix factorization problem – Handle additional constraints 41
42
Low-rank Semidefinite Programming (LRSDP) Stated as: Variable: R Constants C: cost A l, b l : constants Challenge Formulating matrix factorization as LRSDP Designing C, A l, b l
43
Matrix factorization as LRSDP: Noiseless Case We want to formulate: As: LRSDP formulation: C identity matrix, A l indicator matrix
44
Affine SfM Dinosaur sequence MF-LRSDP gives the best reconstruction 72% missing data
45
Photometric Stereo Face sequence MF-LRSDP and damped Newton gives the best result 42% missing data
46
Additional Constraints: Orthographic Factorization Dinosaur sequence
47
Summary Formulated missing data matrix factorization as LRSDP – Large scale problems – Handle additional constraints Overall summary – Two statistical data models Regression in presence of outliers – Role of sparsity Matrix factorization in presence of missing data – Low rank semidefinite program
48
Thank you! Questions? 48
49
Robust Bayesian RVM (RB-RVM) Prior specification – w and s independent : p(w, s) = p(w)p(s) – Hierarchical prior for w and s, α i uniformly distributed, β i uniformly distributed True nature of prior p(s i ) = 1/|s i |, sparsity promoting
50
RB-RVM: Inference and Prediction First estimate α, β and σ Prior and conditional are Gaussian – Posterior, p(w, s|y) is Gaussian Specified by mean and covariance Prediction – Use w inferred above – y also Gaussian
51
RB-RVM: Fast Algorithm Bigger RVM – w s instead of w – [K|I] instead of K Fast implementation (Tipping et. al., 2003)
52
Vary Outlier Fraction
53
Uniqueness and Global Minima When is the factorization unique? – Matrix completion theory (Candes, 2008) Enough observed entries (O(rn 1.2 logn)) M dense Global minima – Under above conditions (empirical)
54
Empirical Evaluations Synthetic data – Generate random matrices A,B of size n×r – Obtain M = AB T – Reveal a fraction of elements – Add noise Ν(0,σ 2 )
55
Noiseless case n=500, r=5 Damped Newton too slow to run Reconstruction successful if MF-LRSP gives better reconstruction results followed by OptSpace and alternation
56
Vary Size, Rank and Noise Variance MF-LRSDP gives better reconstruction results for different sizes and ranks Noise performance of all algorithms are similar
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.