Maximal Data Piling Visual similarity of & ? Can show (Ahn & Marron 2009), for d < n: I.e. directions are the same! How can this be? Note lengths are different.

Slides:

Advertisements

Similar presentations

EigenFaces and EigenPatches Useful model of variation in a region –Region must be fixed shape (eg rectangle) Developed for face recognition Generalised.

Advertisements

ECG Signal processing (2)

STOR 892 Object Oriented Data Analysis Radial Distance Weighted Discrimination Jie Xiong Advised by Prof. J.S. Marron Department of Statistics and Operations.

Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.

Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?

Classification / Regression Support Vector Machines

Support Vector Machines

SVM—Support Vector Machines

Machine learning continued Image source:

Principal Component Analysis CMPUT 466/551 Nilanjan Ray.

Rutgers CS440, Fall 2003 Support vector machines Reading: Ch. 20, Sec. 6, AIMA 2 nd Ed.

Object Orie’d Data Analysis, Last Time Mildly Non-Euclidean Spaces Strongly Non-Euclidean Spaces –Tree spaces –No Tangent Plane Classification - Discrimination.

SVMs Finalized. Where we are Last time Support vector machines in grungy detail The SVM objective function and QP Today Last details on SVMs Putting it.

Lecture outline Support vector machines. Support Vector Machines Find a linear hyperplane (decision boundary) that will separate the data.

Lecture 10: Support Vector Machines

Greg GrudicIntro AI1 Support Vector Machine (SVM) Classification Greg Grudic.

SVMs, cont’d Intro to Bayesian learning. Quadratic programming Problems of the form Minimize: Subject to: are called “quadratic programming” problems.

Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)

Statistics – O. R. 892 Object Oriented Data Analysis J. S. Marron Dept. of Statistics and Operations Research University of North Carolina.

Ch. Eick: Support Vector Machines: The Main Ideas Reading Material Support Vector Machines: 1.Textbook 2. First 3 columns of Smola/Schönkopf article on.

This week: overview on pattern recognition (related to machine learning)

Object Orie’d Data Analysis, Last Time OODA in Image Analysis –Landmarks, Boundary Rep ’ ns, Medial Rep ’ ns Mildly Non-Euclidean Spaces –M-rep data on.

Object Orie’d Data Analysis, Last Time HDLSS Discrimination –MD much better Maximal Data Piling –HDLSS space is a strange place Kernel Embedding –Embed.

Object Orie’d Data Analysis, Last Time

CS 8751 ML & KDDSupport Vector Machines1 Support Vector Machines (SVMs) Learning mechanism based on linear programming Chooses a separating plane based.

Support Vector Machine (SVM) Based on Nello Cristianini presentation

Return to Big Picture Main statistical goals of OODA: Understanding population structure –Low dim ’ al Projections, PCA … Classification (i. e. Discrimination)

SVM Support Vector Machines Presented by: Anas Assiri Supervisor Prof. Dr. Mohamed Batouche.

Support Vector Machines Reading: Ben-Hur and Weston, “A User’s Guide to Support Vector Machines” (linked from class web page)

Classification Course web page: vision.cis.udel.edu/~cv May 12, 2003  Lecture 33.

Local Fisher Discriminant Analysis for Supervised Dimensionality Reduction Presented by Xianwang Wang Masashi Sugiyama.

Object Orie’d Data Analysis, Last Time Classification / Discrimination Classical Statistical Viewpoint –FLD “good” –GLR “better” –Conclude always do GLR.

1 UNC, Stat & OR Hailuoto Workshop Object Oriented Data Analysis, II J. S. Marron Dept. of Statistics and Operations Research, University of North Carolina.

GRASP Learning a Kernel Matrix for Nonlinear Dimensionality Reduction Kilian Q. Weinberger, Fei Sha and Lawrence K. Saul ICML’04 Department of Computer.

CSE 185 Introduction to Computer Vision Face Recognition.

CISC667, F05, Lec22, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) Support Vector Machines I.

CS 478 – Tools for Machine Learning and Data Mining SVM.

Object Orie’d Data Analysis, Last Time Classical Discrimination (aka Classification) –FLD & GLR very attractive –MD never better, sometimes worse HDLSS.

CZ5225: Modeling and Simulation in Biology Lecture 7, Microarray Class Classification by Machine learning Methods Prof. Chen Yu Zong Tel:

Return to Big Picture Main statistical goals of OODA: Understanding population structure –Low dim ’ al Projections, PCA … Classification (i. e. Discrimination)

Final Exam Review CS479/679 Pattern Recognition Dr. George Bebis 1.

Elements of Pattern Recognition CNS/EE Lecture 5 M. Weber P. Perona.

Dimensionality reduction

Object Orie’d Data Analysis, Last Time PCA Redistribution of Energy - ANOVA PCA Data Representation PCA Simulation Alternate PCA Computation Primal – Dual.

1 UNC, Stat & OR Hailuoto Workshop Object Oriented Data Analysis, I J. S. Marron Dept. of Statistics and Operations Research, University of North Carolina.

Object Orie’d Data Analysis, Last Time PCA Redistribution of Energy - ANOVA PCA Data Representation PCA Simulation Alternate PCA Computation Primal – Dual.

LDA (Linear Discriminant Analysis) ShaLi. Limitation of PCA The direction of maximum variance is not always good for classification.

Classification on Manifolds Suman K. Sen joint work with Dr. J. S. Marron & Dr. Mark Foskey.

GWAS Data Analysis. L1 PCA Challenge: L1 Projections Hard to Interpret (i.e. Little Data Insight) Solution: 1)Compute PC Directions Using L1 2)Compute.

1 UNC, Stat & OR Hailuoto Workshop Object Oriented Data Analysis, III J. S. Marron Dept. of Statistics and Operations Research, University of North Carolina.

Return to Big Picture Main statistical goals of OODA: Understanding population structure –Low dim ’ al Projections, PCA … Classification (i. e. Discrimination)

PCA Data Represent ’ n (Cont.). PCA Simulation Idea: given Mean Vector Eigenvectors Eigenvalues Simulate data from Corresponding Normal Distribution.

Kernel Embedding Polynomial Embedding, Toy Example 3: Donut FLD Good Performance (Slice of Paraboloid)

A Brief Introduction to Support Vector Machine (SVM) Most slides were from Prof. A. W. Moore, School of Computer Science, Carnegie Mellon University.

Day 17: Duality and Nonlinear SVM Kristin P. Bennett Mathematical Sciences Department Rensselaer Polytechnic Institute.

SUPPORT VECTOR MACHINES

Return to Big Picture Main statistical goals of OODA:

PREDICT 422: Practical Machine Learning

Object Orie’d Data Analysis, Last Time

Support Vector Machines (SVM)

Participant Presentations

Maximal Data Piling MDP in Increasing Dimensions:

Classification Discriminant Analysis

HDLSS Discrimination Mean Difference (Centroid) Method Same Data, Movie over dim’s.

Classification Discriminant Analysis

Principal Component Analysis

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.

Machine Learning Support Vector Machine Supervised Learning

Presentation transcript:

Maximal Data Piling Visual similarity of & ? Can show (Ahn & Marron 2009), for d < n: I.e. directions are the same! How can this be? Note lengths are different … Study from transformation viewpoint

Maximal Data Piling Recall Transfo ’ view of FLD:

Maximal Data Piling Include Corres- ponding MDP Transfo ’ : Both give Same Result!

Maximal Data Piling Details: FLD, sep ’ ing plane normal vector Within Class, PC1 PC2 Global, PC1 PC2

Maximal Data Piling Acknowledgement: This viewpoint I.e. insight into why FLD = MDP (for low dim ’ al data) Suggested by Daniel Pe ñ a

Maximal Data Piling Fun e.g: rotate from PCA to MDP dir ’ ns

Maximal Data Piling MDP for other class labellings: Always exists Separation bigger for natural clusters Could be used for clustering –Consider all directions –Find one that makes largest gap –Very hard optimization problem –Over 2 n-2 possible directions

Maximal Data Piling A point of terminology (Ahn & Marron 2009): MDP is “ maximal ” in 2 senses: 1.# of data piled 2.Size of gap (within subspace gen ’ d by data)

Maximal Data Piling Recurring, over-arching, issue: HDLSS space is a weird place

Maximal Data Piling Usefulness for Classification?  First Glance: Terrible, not generalizable  HDLSS View: Maybe OK?  Simulation Result: Good Performance for Autocorrelated Errors  Reason: not known

Kernel Embedding Aizerman, Braverman and Rozoner (1964) Motivating idea: Extend scope of linear discrimination, By adding nonlinear components to data (embedding in a higher dim ’ al space) Better use of name: nonlinear discrimination?

Kernel Embedding Toy Examples: In 1d, linear separation splits the domain into only 2 parts

Kernel Embedding But in the “ quadratic embedded domain ”, linear separation can give 3 parts

Kernel Embedding But in the quadratic embedded domain Linear separation can give 3 parts original data space lies in 1d manifold very sparse region of curvature of manifold gives: better linear separation can have any 2 break points (2 points line)

Kernel Embedding Stronger effects for higher order polynomial embedding: E.g. for cubic, linear separation can give 4 parts (or fewer)

Kernel Embedding Stronger effects - high. ord. poly. embedding: original space lies in 1-d manifold, even sparser in higher d curvature gives: improved linear separation can have any 3 break points (3 points plane)? Note: relatively few “ interesting separating planes ”

Kernel Embedding General View: for original data matrix: add rows: i.e. embed in Higher Dimensional space

Kernel Embedding Embedded Fisher Linear Discrimination: Choose Class 1, for any when: in embedded space. Image of class boundaries in original space is nonlinear Allows more complicated class regions Can also do Gaussian Lik. Rat. (or others) Compute image by classifying points from original space

Kernel Embedding Visualization for Toy Examples: Have Linear Disc. In Embedded Space Study Effect in Original Data Space Via Implied Nonlinear Regions Approach: Use Test Set in Original Space (dense equally spaced grid) Apply embedded discrimination Rule Color Using the Result

Kernel Embedding Polynomial Embedding, Toy Example 1: Parallel Clouds

Kernel Embedding Polynomial Embedding, Toy Example 1: Parallel Clouds PC1 Original Data

Kernel Embedding Polynomial Embedding, Toy Example 1: Parallel Clouds PC1

Kernel Embedding Polynomial Embedding, Toy Example 1: Parallel Clouds PC1

Kernel Embedding Polynomial Embedding, Toy Example 1: Parallel Clouds PC1

Kernel Embedding Polynomial Embedding, Toy Example 1: Parallel Clouds PC1

Kernel Embedding Polynomial Embedding, Toy Example 1: Parallel Clouds PC1 Note: Embedding Always Bad Don’t Want Greatest Variation

Kernel Embedding Polynomial Embedding, Toy Example 1: Parallel Clouds FLD Original Data

Kernel Embedding Polynomial Embedding, Toy Example 1: Parallel Clouds FLD

Kernel Embedding Polynomial Embedding, Toy Example 1: Parallel Clouds FLD

Kernel Embedding Polynomial Embedding, Toy Example 1: Parallel Clouds FLD

Kernel Embedding Polynomial Embedding, Toy Example 1: Parallel Clouds FLD All Stable & Very Good

Kernel Embedding Polynomial Embedding, Toy Example 1: Parallel Clouds GLR Original Data

Kernel Embedding Polynomial Embedding, Toy Example 1: Parallel Clouds GLR

Kernel Embedding Polynomial Embedding, Toy Example 1: Parallel Clouds GLR

Kernel Embedding Polynomial Embedding, Toy Example 1: Parallel Clouds GLR

Kernel Embedding Polynomial Embedding, Toy Example 1: Parallel Clouds GLR

Kernel Embedding Polynomial Embedding, Toy Example 1: Parallel Clouds GLR Unstable Subject to Overfitting