Estimating Intrinsic Dimension Justin Eberhardt UMD, Mathematics and Statistics Advisor: Dr. Kang James.

Slides:



Advertisements
Similar presentations
Aggregating local image descriptors into compact codes
Advertisements

Copula Regression By Rahul A. Parsa Drake University &
Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) ETHEM ALPAYDIN © The MIT Press, 2010
Data Mining Anomaly Detection Lecture Notes for Chapter 10 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach, Kumar Introduction to.
A Geometric Perspective on Machine Learning 何晓飞 浙江大学计算机学院 1.
Mining Distance-Based Outliers in Near Linear Time with Randomization and a Simple Pruning Rule Stephen D. Bay 1 and Mark Schwabacher 2 1 Institute for.
Manifold Learning Dimensionality Reduction. Outline Introduction Dim. Reduction Manifold Isomap Overall procedure Approximating geodesic dist. Dijkstra’s.
1 Methods of Experimental Particle Physics Alexei Safonov Lecture #21.
Learning Inhomogeneous Gibbs Models Ce Liu
Presented by: Mingyuan Zhou Duke University, ECE April 3, 2009
Face Recognition and Biometric Systems
Clustering (1) Clustering Similarity measure Hierarchical clustering Model-based clustering Figures from the book Data Clustering by Gan et al.
Instructor: Dr. G. Bebis Reza Amayeh Fall 2005
Application of Statistical Techniques to Neural Data Analysis Aniket Kaloti 03/07/2006.
Dimensional reduction, PCA
Entropic graphs: Applications Alfred O. Hero Dept. EECS, Dept BME, Dept. Statistics University of Michigan - Ann Arbor
Incremental Learning of Temporally-Coherent Gaussian Mixture Models Ognjen Arandjelović, Roberto Cipolla Engineering Department, University of Cambridge.
Maximum Likelihood We have studied the OLS estimator. It only applies under certain assumptions In particular,  ~ N(0, 2 ) But what if the sampling distribution.
Data Mining Anomaly Detection Lecture Notes for Chapter 10 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach, Kumar Introduction to.
Anomaly Detection. Anomaly/Outlier Detection  What are anomalies/outliers? The set of data points that are considerably different than the remainder.
Manifold learning and pattern matching with entropic graphs Alfred O. Hero Dept. EECS, Dept Biomed. Eng., Dept. Statistics University of Michigan - Ann.
Kernel Methods Part 2 Bing Han June 26, Local Likelihood Logistic Regression.
A Global Geometric Framework for Nonlinear Dimensionality Reduction Joshua B. Tenenbaum, Vin de Silva, John C. Langford Presented by Napat Triroj.
Pattern Recognition. Introduction. Definitions.. Recognition process. Recognition process relates input signal to the stored concepts about the object.
Atul Singh Junior Undergraduate CSE, IIT Kanpur.  Dimension reduction is a technique which is used to represent a high dimensional data in a more compact.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 6-1 Chapter 6 The Normal Distribution and Other Continuous Distributions.
NonLinear Dimensionality Reduction or Unfolding Manifolds Tennenbaum|Silva|Langford [Isomap] Roweis|Saul [Locally Linear Embedding] Presented by Vikas.
Principles of the Global Positioning System Lecture 10 Prof. Thomas Herring Room A;
Nonlinear Dimensionality Reduction Approaches. Dimensionality Reduction The goal: The meaningful low-dimensional structures hidden in their high-dimensional.
Manifold learning: Locally Linear Embedding Jieping Ye Department of Computer Science and Engineering Arizona State University
Matlab: Statistics Probability distributions Hypothesis tests
Graph Embedding: A General Framework for Dimensionality Reduction Dong XU School of Computer Engineering Nanyang Technological University
Overview of Supervised Learning Overview of Supervised Learning2 Outline Linear Regression and Nearest Neighbors method Statistical Decision.
Computer Vision Lab. SNU Young Ki Baik Nonlinear Dimensionality Reduction Approach (ISOMAP, LLE)
Likelihood Methods in Ecology November 16 th – 20 th, 2009 Millbrook, NY Instructors: Charles Canham and María Uriarte Teaching Assistant Liza Comita.
2.4 Nonnegative Matrix Factorization  NMF casts matrix factorization as a constrained optimization problem that seeks to factor the original matrix into.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: ML and Simple Regression Bias of the ML Estimate Variance of the ML Estimate.
GRASP Learning a Kernel Matrix for Nonlinear Dimensionality Reduction Kilian Q. Weinberger, Fei Sha and Lawrence K. Saul ICML’04 Department of Computer.
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Data Mining Anomaly Detection Lecture Notes for Chapter 10 Introduction to Data Mining by.
Data Mining Anomaly/Outlier Detection Lecture Notes for Chapter 10 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach, Kumar Introduction.
Manifold learning: MDS and Isomap
Nonlinear Dimensionality Reduction Approach (ISOMAP)
Project 11: Determining the Intrinsic Dimensionality of a Distribution Okke Formsma, Nicolas Roussis and Per Løwenborg.
Quadratic Classifiers (QC) J.-S. Roger Jang ( 張智星 ) CS Dept., National Taiwan Univ Scientific Computing.
Jakob Verbeek December 11, 2009
Classification Problem GivenGiven Predict class label of a given queryPredict class label of a given query
Principal Manifolds and Probabilistic Subspaces for Visual Recognition Baback Moghaddam TPAMI, June John Galeotti Advanced Perception February 12,
EE4-62 MLCV Lecture Face Recognition – Subspace/Manifold Learning Tae-Kyun Kim 1 EE4-62 MLCV.
Tony Jebara, Columbia University Advanced Machine Learning & Perception Instructor: Tony Jebara.
Data Mining Anomaly/Outlier Detection Lecture Notes for Chapter 10 Introduction to Data Mining by Tan, Steinbach, Kumar.
Real-Time Tracking with Mean Shift Presented by: Qiuhua Liu May 6, 2005.
Chapter1: Introduction Chapter2: Overview of Supervised Learning
Data Mining Course 2007 Eric Postma Clustering. Overview Three approaches to clustering 1.Minimization of reconstruction error PCA, nlPCA, k-means clustering.
Math 285 Project Diffusion Maps Xiaoyan Chong Department of Mathematics and Statistics San Jose State University.
1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.
Part 3: Estimation of Parameters. Estimation of Parameters Most of the time, we have random samples but not the densities given. If the parametric form.
Spectral Methods for Dimensionality
Nonlinear Dimensionality Reduction
MECH 373 Instrumentation and Measurements
Statistics 350 Lecture 3.
Quadratic Classifiers (QC)
Ch3: Model Building through Regression
Unsupervised Riemannian Clustering of Probability Density Functions
Outline Nonlinear Dimension Reduction Brief introduction Isomap LLE
Object Modeling with Layers
POINT ESTIMATOR OF PARAMETERS
EC 331 The Theory of and applications of Maximum Likelihood Method
NonLinear Dimensionality Reduction or Unfolding Manifolds
Test #1 Thursday September 20th
Maximum likelihood estimation of intrinsic dimension
Presentation transcript:

Estimating Intrinsic Dimension Justin Eberhardt UMD, Mathematics and Statistics Advisor: Dr. Kang James

Outline Introduction Nearest Neighborhood Estimators  Regression Estimator  Maximum Likelihood Estimator  Revised Maximum Likelihood Estimator Comparison Summary 2

Intrinsic Dimension Definition The least number of parameters required to generate a dataset Minimum number of dimensions that describes a dataset without significant loss of feature 3

Ex 1: Intrinsic Dimension Flatten (Unroll) x z y x y Int Dim = 2 4

Ex 2: Intrinsic Dimension 28 X 28 One Image: 784 Dimensional

Ex 2: Intrinsic Dimension Int Dim = 2 [Isomap Project, J. Tenenbaum & J. Langford, Stanford] 6 Top & Bottom Loop No Loop

Applications Biometrics  Facial Recognition, Fingerprints, Iris Genetics 7

Why do we need to reduce dimensionality? Low dimensional datasets are more efficient Not even supercomputers can handle very high-dimensional matrices Data in 1,2 and 3 dimensions can be visualized 8

Ex: Facial Recognition in MN 5 Million People 2 Images per Person (Front and Profile) 1028 X 1028 Pixels per Image (1 Megapixel) Total Memory Required:  n = 5,000,000  p = (2)(1028)(1028)= 2.11 Million Dimensions  Matrix Size: (5 x 10 6 )(2.11 x 10 6 ) = 10 billion cells  Memory: 2(10 x ) = 20 x = 20 Terabytes

Intrinsic Dimension Estimators Objective: To find a simple formula that uses nearest neighbor (NN) information to quickly estimate intrinsic dimension 10

Intrinsic Dimension Estimators Project Description: Through simulation, we will compare the effectiveness of three proposed NN intrinsic dimension estimators. 11

Intrinsic Dimension Estimators Note: Traditional methods for estimating Intrinsic Dimension, such as PCA, fail on non-linear manifolds. 12

Intrinsic Dimension Estimators Nearest-Neighbor Methods Regression Estimator K. Pettis, T. Bailey, A. Jain & R. Dubes, 1979 Maximum Likelihood Estimator E. Levina, & P. Bickel, 2005 D. MacKay and Z. Ghahramani,

The distance from x 2 to x 3 Distance Matrix N 1 0d 1,2 d 1,3 d 1,n 2 d 2,1 0d 2,3 d 2,n 3 d 3,1 d 3,2 0d 3,n N d n,1 d n,2 d n, D i,j : Euclidean distance from x i to x j 14

The distance between x 2 and the k th NN to x 2 Nearest Neighbor Matrix N 1 0t 1,2 t 1,3 t 1,n 2 0t 2,2 t 2,3 t 2,n 3 0t 3,2 t 3,3 t 3,n N 0t n,2 t n,3... t n,n T i,k : Euclidean distance between x i and the k th NN to x i 15

Notation m: Intrinsic Dimension p: Dimension of the Raw Dataset n: Number of Observations f(x): density pdf for observation x T x,k or T k : distance from observation x to k th NN N(t,x): # obs within dist t of observation x 16

t N(t,x) = 3 Notation t1t1 t3t3 t2t2 x p = 2 m = 1 N = 12 17

NN Regression Estimator Density of Distance to k th NN (Single Observation, appx as Poisson) 1 Expected Distance to k th NN (Single Observation) 2a Sample-Averaged Distance to k th NN 2b Expected Distance to Sample-Averaged k th NN 3

Regression Estimator 19 Trinomial Distribution Binomial Distribution Distance to K th NN pdf Assumptions f(x) is constant n is large f(x)V t is small

Regression Estimator Approximate as Poisson Expected distance to K th NN

CnCn G k,m Estimate m using simple linear regression 21

Ex: Swiss Roll Dataset 22 m=0.49

Datasets Faces: Raw Dimension = 4096, Int Dim ~ 3 to 5 Gaussian Sphere Raw Dim = 3 Int Dim = 3 Swiss Roll Raw Dim = 3 Int Dim = 2 Dbl Swiss Roll Raw Dim = 3 Int Dim = 2 23

Results Regression Estimator FACES ~ 3.0 ~ 2.0 ~ K = N / 100

NN Maximum Likelihood Estimator Counting Process Binomial (appx as Poisson) 1 Joint Counting Probability Joint Occurrence Density 2 Log-likelihood Function 3 4

Maximum Likelihood Estimator 26 N(t,x) = # Counts within Distance t of x # Counts btw Distance r and s is BIN

Maximum Likelihood Estimator

28 Joint pdf of Distances to K NN

29 Log-Likelihood Function

Averaging over N observations Averaging inverses over N observations (Using MLE) E. Levina & P. Bickel D. MacKay & Z. Ghahramani 30

Results MLE Estimator (Revised MacKay & Ghahramani) FACES ~ 3.0 ~ 2.0 ~ 2.1 ~ K = N / 100

Comparison 32

Comparison 33

Comparison 34

Comparison 35

Comparison 36

Comparison 37

Isomap 38

Summary The regression and revised MLE estimators share similar characteristics when intrinsic dimension is small As intrinsic dimension increases, the estimators become more dependent on K Distribution type does not appear to be highly influential when the intrinsic dimension is small 39

Thank You! Dr. Kang James & Dr. Barry James Dr. Steve Trogdon

Example Swiss Roll Data Int Dim = 2