Presentation is loading. Please wait.

Presentation is loading. Please wait.

Handling Outliers and Missing Data in Statistical Data Models Kaushik Mitra Date: 17/1/2011 ECSU Seminar, ISI.

Similar presentations


Presentation on theme: "Handling Outliers and Missing Data in Statistical Data Models Kaushik Mitra Date: 17/1/2011 ECSU Seminar, ISI."— Presentation transcript:

1 Handling Outliers and Missing Data in Statistical Data Models Kaushik Mitra Date: 17/1/2011 ECSU Seminar, ISI

2 Statistical Data Models Goal: Find structure in data Applications – Finance – Engineering – Sciences Biological – Wherever we deal with data Some examples – Regression – Matrix factorization Challenges: Outliers and Missing data

3 Outliers Are Quite Common Google search results for `male faces’

4 Need to Handle Outliers Properly Noisy imageGaussian filtered image Desired result Removing salt-and-pepper (outlier) noise

5 Missing Data Problem Completing missing tracks Incomplete tracks Completed tracks by a sub-optimal method Desired result Missing tracks in structure from motion

6 Our Focus Outliers in regression – Linear regression – Kernel regression Matrix factorization in presence of missing data

7 Robust Linear Regression for High Dimension Problems

8 What is Regression? Regression – Find functional relation between y and x x: independent variable y: dependent variable – Given data: (y i,x i ) pairs Model y = f(x, w)+n – Estimate w – Predict y for a new x

9 Robust Regression Real world data corrupted with outliers Outliers make estimates unreliable Robust regression – Unknown Parameter, w Outliers – Combinatorial problem N data and k outliers C(N,k) ways

10 Prior Work Combinatorial algorithms – Random sample consensus (RANSAC) – Least Median Squares (LMedS) Exponential in dimension M-estimators – Robust cost functions – local minima

11 Robust Linear Regression model Linear regression model : y i =x i T w+e i – e i, Gaussian noise Proposed robust model: e i =n i +s i – n i, inlier noise (Gaussian) – s i, outlier noise (sparse) Matrix-vector form – y=Xw+n+s Estimate w, s y1y2..yNy1y2..yN x1Tx2T..xNTx1Tx2T..xNT n 1 n 2. n N s 1 s 2. s N = + + w 1 w 2. w D

12 Simplification Objective (RANSAC): Find w that minimizes the number of outliers Eliminate w Model: y=Xw+n+s Premultiple by C : CX=0, N ≥ D – Cy=CXw+Cs+Cn – z=Cs+g – g Gaussian Problem becomes: Solve for s -> identify outliers -> LS -> w

13 Relation to Sparse Learning Solve: – Combinatorial problem Sparse Basis Selection/ Sparse Learning Two approaches : – Basis Pursuit (Chen, Donoho, Saunder 1995) – Bayesian Sparse Learning (Tipping 2001)

14 Basis Pursuit Robust regression (BPRR) Solve – Basis Pursuit Denoising (Chen et. al. 1995) – Convex problem – Cubic complexity : O(N 3 ) From Compressive Sensing theory (Candes 2005) – Equivalent to original problem if s is sparse C satisfy Restricted Isometry Property (RIP) Isometry: ||s 1 - s 2 || = ||C(s 1 – s 2 )|| Restricted: to the class of sparse vectors In general, no guarantees for our problem

15 Bayesian Sparse Robust Regression (BSRR) Sparse Bayesian learning technique (Tipping 2001) – Puts a sparsity promoting prior on s : – Likelihood : p(z/s)=Ν(Cs,εI) – Solves the MAP problem p(s/z) – Cubic Complexity : O(N 3 )

16 Setup for Empirical Studies Synthetically generated data Performance criteria – Angle between ground truth and estimated hyper-planes

17 Vary Outlier Fraction  BSRR performs well in all dimensions  Combinatorial algorithms like RANSAC, MSAC, LMedS not practical in high dimensions Dimension = 2 Dimension = 8 Dimension = 32

18 Facial Age Estimation Fgnet dataset : 1002 images of 82 subjects Regression – y : Age – x: Geometric feature vector

19 Outlier Removal by BSRR Label data as inliers and outliers Detected 177 outliers in 1002 images BSRR Inlier MAE3.73 Outlier MAE19.14 Overall MAE6.45 Leave-one-out testing

20 Summary for Robust Linear Regression Modeled outliers as sparse variable Formulated robust regression as Sparse Learning problem – BPRR and BSRR BSRR gives the best performance Limitation: linear regression model – Kernel model

21 Robust RVM Using Sparse Outlier Model

22 Relevance Vector Machine (RVM) RVM model: – : kernel function Examples of kernels – k(x i, x j ) = (x i T x j ) 2 : polynomial kernel – k(x i, x j ) = exp( -||x i - x j || 2 /2σ 2 ) : Gaussian kernel Kernel trick: k(x i,x j ) = ψ(x i ) T ψ(x j ) – Map x i to feature space ψ(x i )

23 RVM: A Bayesian Approach Bayesian approach – Prior distribution : p(w) – Likelihood : Prior specification – p(w) : sparsity promoting prior p(w i ) = 1/|w i | – Why sparse? Use a smaller subset of training data for prediction Support vector machine Likelihood – Gaussian noise Non-robust : susceptible to outliers

24 Robust RVM model Original RVM model – e, Gaussian noise Explicitly model outliers, e i = n i + s i – n i, inlier noise (Gaussian) – s i, outlier noise (sparse and heavy-tailed) Matrix vector form – y = Kw + n + s Parameters to be estimated: w and s

25 Robust RVM Algorithms y = [K|I]w s + n – w s = [w T s T ] T : sparse vector Two approaches – Bayesian – Optimization

26 Robust Bayesian RVM (RB-RVM) Prior specification – w and s independent : p(w, s) = p(w)p(s) – Sparsity promoting prior for s: p(s i )= 1/|s i | Solve for posterior p(w, s|y) Prediction: use w inferred above Computation: a bigger RVM – w s instead of w – [K|I] instead of K

27 Basis Pursuit RVM (BP-RVM) Optimization approach – Combinatorial Closest convex approximation From compressive sensing theory – Same solution if [K|I] satisfies RIP In general, can not guarantee

28 Experimental Setup

29 Prediction : Asymmetric Outliers Case

30 Image Denoising Salt and pepper noise – Outliers Regression formulation – Image as a surface over 2D grid y: Intensity x: 2D grid Denoised image obtained by prediction

31 Salt and Pepper Noise

32 Some More Results RVMRB-RVMMedian Filter

33 Age Estimation from Facial Images RB-RVM detected 90 outliers Leave-one-person-out testing

34 Summary for Robust RVM Modeled outliers as sparse variables Jointly estimated parameter and outliers Bayesian approach gives very good result

35 Limitations of Regression Regression: y = f(x,w)+n – Noise in only “y” – Not always reasonable All variables have noise – M = [x 1 x 2 … x N ] – Principal component analysis (PCA) [x 1 x 2 … x N ] = AB T – A: principal components – B: coefficients – M = AB T : matrix factorization (our next topic)

36 Matrix Factorization in the presence of Missing Data

37 Applications in Computer Vision Matrix factorization: M=AB T Applications: build 3-D models from images – Geometric approach (Multiple views) – Photometric approach (Multiple Lightings) 37 Structure from Motion (SfM) Photometric stereo

38 Matrix Factorization Applications in Vision – Affine Structure from Motion (SfM) – Photometric stereo Solution: SVD – M=USV T – Truncate S to rank r A=US 0.5, B=VS 0.5 38 M = x ij y ij = CS T Rank 4 matrix M = NS T, rank = 3

39 Missing Data Scenario Missed feature tracks in SfM Specularities and shadow in photometric stereo 39 Incomplete feature tracks

40 Challenges in Missing Data Scenario Can’t use SVD Solve: W: binary weight matrix, λ: regularization parameter Challenges – Non-convex problem – Newton’s method based algorithm (Buchanan et. al. 2005) Very slow Design algorithm – Fast (handle large scale data) – Flexible enough to handle additional constraints Ortho-normality constraints in ortho-graphic SfM

41 Proposed Solution Formulate matrix factorization as a low-rank semidefinite program (LRSDP) – LRSDP: fast implementation of SDP (Burer, 2001) Quasi-Newton algorithm Advantages of the proposed formulation: – Solve large-scale matrix factorization problem – Handle additional constraints 41

42 Low-rank Semidefinite Programming (LRSDP) Stated as: Variable: R Constants C: cost A l, b l : constants Challenge Formulating matrix factorization as LRSDP Designing C, A l, b l

43 Matrix factorization as LRSDP: Noiseless Case We want to formulate: As: LRSDP formulation: C identity matrix, A l indicator matrix

44 Affine SfM Dinosaur sequence MF-LRSDP gives the best reconstruction 72% missing data

45 Photometric Stereo Face sequence MF-LRSDP and damped Newton gives the best result 42% missing data

46 Additional Constraints: Orthographic Factorization Dinosaur sequence

47 Summary Formulated missing data matrix factorization as LRSDP – Large scale problems – Handle additional constraints Overall summary – Two statistical data models Regression in presence of outliers – Role of sparsity Matrix factorization in presence of missing data – Low rank semidefinite program

48 Thank you! Questions? 48

49 Robust Bayesian RVM (RB-RVM) Prior specification – w and s independent : p(w, s) = p(w)p(s) – Hierarchical prior for w and s, α i uniformly distributed, β i uniformly distributed True nature of prior p(s i ) = 1/|s i |, sparsity promoting

50 RB-RVM: Inference and Prediction First estimate α, β and σ Prior and conditional are Gaussian – Posterior, p(w, s|y) is Gaussian Specified by mean and covariance Prediction – Use w inferred above – y also Gaussian

51 RB-RVM: Fast Algorithm Bigger RVM – w s instead of w – [K|I] instead of K Fast implementation (Tipping et. al., 2003)

52 Vary Outlier Fraction

53 Uniqueness and Global Minima When is the factorization unique? – Matrix completion theory (Candes, 2008) Enough observed entries (O(rn 1.2 logn)) M dense Global minima – Under above conditions (empirical)

54 Empirical Evaluations Synthetic data – Generate random matrices A,B of size n×r – Obtain M = AB T – Reveal a fraction of elements – Add noise Ν(0,σ 2 )

55 Noiseless case n=500, r=5 Damped Newton too slow to run Reconstruction successful if MF-LRSP gives better reconstruction results followed by OptSpace and alternation

56 Vary Size, Rank and Noise Variance MF-LRSDP gives better reconstruction results for different sizes and ranks Noise performance of all algorithms are similar


Download ppt "Handling Outliers and Missing Data in Statistical Data Models Kaushik Mitra Date: 17/1/2011 ECSU Seminar, ISI."

Similar presentations


Ads by Google