Classification on Manifolds Suman K. Sen joint work with Dr. J. S. Marron & Dr. Mark Foskey.

Slides:



Advertisements
Similar presentations
STOR 892 Object Oriented Data Analysis Radial Distance Weighted Discrimination Jie Xiong Advised by Prof. J.S. Marron Department of Statistics and Operations.
Advertisements

Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
SVM - Support Vector Machines A new classification method for both linear and nonlinear data It uses a nonlinear mapping to transform the original training.
Independent Component Analysis Personal Viewpoint: Directions that maximize independence Motivating Context: Signal Processing “Blind Source Separation”
Support vector machine
Machine learning continued Image source:
Discriminative and generative methods for bags of features
Support Vector Machines H. Clara Pong Julie Horrocks 1, Marianne Van den Heuvel 2,Francis Tekpetey 3, B. Anne Croy 4. 1 Mathematics & Statistics, University.
Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?
Support Vector Machines (SVMs) Chapter 5 (Duda et al.)
Semi-Supervised Classification by Low Density Separation Olivier Chapelle, Alexander Zien Student: Ran Chang.
L15:Microarray analysis (Classification) The Biological Problem Two conditions that need to be differentiated, (Have different treatments). EX: ALL (Acute.
Announcements See Chapter 5 of Duda, Hart, and Stork. Tutorial by Burge linked to on web page. “Learning quickly when irrelevant attributes abound,” by.
Support Vector Machines Based on Burges (1998), Scholkopf (1998), Cristianini and Shawe-Taylor (2000), and Hastie et al. (2001) David Madigan.
L15:Microarray analysis (Classification). The Biological Problem Two conditions that need to be differentiated, (Have different treatments). EX: ALL (Acute.
Support Vector Machines and Kernel Methods
What is Learning All about ?  Get knowledge of by study, experience, or being taught  Become aware by information or from observation  Commit to memory.
Linear Discriminant Functions Chapter 5 (Duda et al.)
Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)
METU Informatics Institute Min 720 Pattern Classification with Bio-Medical Applications PART 2: Statistical Pattern Classification: Optimal Classification.
EE513 Audio Signals and Systems Statistical Pattern Classification Kevin D. Donohue Electrical and Computer Engineering University of Kentucky.
Linear hyperplanes as classifiers Usman Roshan. Hyperplane separators.
This week: overview on pattern recognition (related to machine learning)
Object Orie’d Data Analysis, Last Time HDLSS Discrimination –MD much better Maximal Data Piling –HDLSS space is a strange place Kernel Embedding –Embed.
ADVANCED CLASSIFICATION TECHNIQUES David Kauchak CS 159 – Fall 2014.
1 SUPPORT VECTOR MACHINES İsmail GÜNEŞ. 2 What is SVM? A new generation learning system. A new generation learning system. Based on recent advances in.
IEEE TRANSSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE
Return to Big Picture Main statistical goals of OODA: Understanding population structure –Low dim ’ al Projections, PCA … Classification (i. e. Discrimination)
An Introduction to Support Vector Machine (SVM) Presenter : Ahey Date : 2007/07/20 The slides are based on lecture notes of Prof. 林智仁 and Daniel Yeung.
A Challenging Example Male Pelvis –Bladder – Prostate – Rectum.
CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.
Classifiers Given a feature representation for images, how do we learn a model for distinguishing features from different classes? Zebra Non-zebra Decision.
Ch 4. Linear Models for Classification (1/2) Pattern Recognition and Machine Learning, C. M. Bishop, Summarized and revised by Hee-Woong Lim.
Computational Intelligence: Methods and Applications Lecture 23 Logistic discrimination and support vectors Włodzisław Duch Dept. of Informatics, UMK Google:
1 UNC, Stat & OR Hailuoto Workshop Object Oriented Data Analysis, II J. S. Marron Dept. of Statistics and Operations Research, University of North Carolina.
Visual Categorization With Bags of Keypoints Original Authors: G. Csurka, C.R. Dance, L. Fan, J. Willamowski, C. Bray ECCV Workshop on Statistical Learning.
Object Orie’d Data Analysis, Last Time SiZer Analysis –Zooming version, -- Dependent version –Mass flux data, -- Cell cycle data Image Analysis –1 st Generation.
Support Vector Machines Project מגישים : גיל טל ואורן אגם מנחה : מיקי אלעד נובמבר 1999 הטכניון מכון טכנולוגי לישראל הפקולטה להנדסת חשמל המעבדה לעיבוד וניתוח.
Ohad Hageby IDC Support Vector Machines & Kernel Machines IP Seminar 2008 IDC Herzliya.
Linear hyperplanes as classifiers Usman Roshan. Hyperplane separators.
Lecture 4 Linear machine
1Ellen L. Walker Category Recognition Associating information extracted from images with categories (classes) of objects Requires prior knowledge about.
An Introduction to Support Vector Machine (SVM)
Linear Models for Classification
Maximal Data Piling Visual similarity of & ? Can show (Ahn & Marron 2009), for d < n: I.e. directions are the same! How can this be? Note lengths are different.
Common Property of Shape Data Objects: Natural Feature Space is Curved I.e. a Manifold (from Differential Geometry) Shapes As Data Objects.
CSSE463: Image Recognition Day 14 Lab due Weds, 3:25. Lab due Weds, 3:25. My solutions assume that you don't threshold the shapes.ppt image. My solutions.
CZ5225: Modeling and Simulation in Biology Lecture 7, Microarray Class Classification by Machine learning Methods Prof. Chen Yu Zong Tel:
Support Vector Machines. Notation Assume a binary classification problem. –Instances are represented by vector x   n. –Training examples: x = (x 1,
Final Exam Review CS479/679 Pattern Recognition Dr. George Bebis 1.
Classification Course web page: vision.cis.udel.edu/~cv May 14, 2003  Lecture 34.
Feature Selction for SVMs J. Weston et al., NIPS 2000 오장민 (2000/01/04) Second reference : Mark A. Holl, Correlation-based Feature Selection for Machine.
1 UNC, Stat & OR U. C. Davis, F. R. G. Workshop Object Oriented Data Analysis J. S. Marron Dept. of Statistics and Operations Research, University of North.
CSE182 L14 Mass Spec Quantitation MS applications Microarray analysis.
Linear Classifiers Dept. Computer Science & Engineering, Shanghai Jiao Tong University.
1 Kernel Machines A relatively new learning methodology (1992) derived from statistical learning theory. Became famous when it gave accuracy comparable.
Linear Discriminant Functions Chapter 5 (Duda et al.) CS479/679 Pattern Recognition Dr. George Bebis.
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 1: INTRODUCTION.
CS 9633 Machine Learning Support Vector Machines
PREDICT 422: Practical Machine Learning
Object Orie’d Data Analysis, Last Time
LECTURE 10: DISCRIMINANT ANALYSIS
LECTURE 03: DECISION SURFACES
Support Vector Machines
LINEAR AND NON-LINEAR CLASSIFICATION USING SVM and KERNELS
Pattern Recognition CS479/679 Pattern Recognition Dr. George Bebis
LECTURE 09: DISCRIMINANT ANALYSIS
Linear Discrimination
SVMs for Document Ranking
Presentation transcript:

Classification on Manifolds Suman K. Sen joint work with Dr. J. S. Marron & Dr. Mark Foskey

2 Outline Introduction  Introduction to M-reps  Goal Classification  Popular methods of classification  Why do M-reps need special treatment ? Proposed Approach Results Future Research

3 2 groups: Patients (56) and Controls (26). Can we say anything about the shape variation between the 2 groups?? Styner et al. (2004) m-rep model of hippocampus

4 Hippocampus

5 Visualization: Change of shape along separating direction Blue : rightly classified Magenta: misclassified +’s & 0’s are the two classes X-axis : scores along separating direction Right panel: projected model

6 DWD in Face Recognition, (cont.) Registered Data Shifts and scale Manually chosen To align eyes and mouth Still large variation See males vs. females???

7 DWD in Face Recognition, (cont.) DWD Direction Good separation Images “make sense” Garbage at ends? (extrapolation effects?)

8 Visualization: Change of shape along separating direction Separation better shape change – flattening at the top

9 M-reps Medial Atom

10 Mreps

11 Definitions Geodesic: Curve locally minimizing distance between points Exponential Map Exp p (X): Maps point X T p M on to the Manifold M along geodesics

12 ExpMaps and LogMaps X T p M, 9  x (t) with X as its initial velocity  x (t) = Exp p (tX). ||  x /  t||(t) = ||X||, therefore preserves distance LogMap Inverse of ExpMap d(x,y)= ||Log x (y)|| = ||Log y (x)||

13 Literature Review on M-reps Medial Locus, proposed by Blum (1967). Property studied in 2D by Blum and Nagel (1978) and in 3D by Nackman and Pizer (1985). Pizer et al. (1999) describes discrete M-reps. Yushkevich et al. (2003) describes continuous M-reps. Terriberry and Gerig (2006) treats continuous M-reps with branching.

14 Classification X i : attributes describing individuals (i=1…n) n: # of individuals Y i : class label Є {1,2,…,K} K: # of classes NOTE: We work with only two groups; and for some mathematical convenience Y Є {1,-1}. Goal of Classification: Given a set of (X i, Y i ), find a rule f(x) that assigns a new individual to a group on the basis of its attributes X.

15 Classification: Popular Methods Mean Difference (MD): assigns new observation to the class whose mean is closest to it. Fisher (1936): improvement over MD; optimal rule when the two classes are from Normal distribution & have same covariance matrix. Now called Fisher Linear Discrimination (FLD). Vapnik (1982, 1995): Support Vector Machine (SVM). Also see Burges (1998). Marron et al. (2004): Distance Weighted Discrimination (DWD); unlike SVM it does not suffer from “data piling” and improves generalizability in High Dimension Low Sample Size (HDLSS) situations. Kernel Embedding: Linear Classification done after embedding data in higher dimensional space. See Sch Ö lkopf and Smola (2002). : list of publications on kernel methodshttp://

16 Classification These methods give us: a)Separating plane b)Normal vector (separating direction) c) Projections of data on the separating direction

17 Different approach in Manifolds ? Difficult to describe separating surfaces No inner products Easier to calculate distances

18 Approaches generally taken ApproachesDrawbacks Flatten; e.g., in case of a cylinder (R×S 1 ). Geodesic distance is not used. Doing Euclidean statistics on tangent plane at the overall mean (Fletcher et al., 2003, 2004) The choice of base points where tangent plane is created is important. Consider points as if data are embedded in higher dimensional Euclidean space R d. Separating surface is not contained in the manifold. Moreover, the projected data on the separating directions are not interpretable.

19 Importance of Geodesic Distance jjjjjjj

20 Choice of Base Point Black and blue points represent different groups. Figure shows that choice of base point has a significant effect.

21 Meaningful Projections

22 It’s Important To Work On Manifold

23 Proposed Approach in Manifolds Key concept – control points (representative of a class). Use distance from control points.

24 Proposed Approach in Manifolds Key concept – control points (representative of a class). Use distance from control points.

25 Proposed Approach in Manifolds

26 Proposed Approach in Manifolds Key concept – control points (representative of a class). Use distance from control points. Goal: find “good” control points. For e.g., in the sphere, control points corr. to red boundary separates the data better.

27 Decision Function f(x) = d 2 (c -1,x) – d 2 (c 1,x) If f(x) > 0, then x  1, else x  -1 If yf(x) > 0, correct decision ( y is the class label (+/-1) ) < 0, wrong NOTE: H={x: f(x)=0} : the separating boundary. Level set for f(x) = 0 f(x) = 1 c C -1 o c1c1 oH

28 Proposed Methods 1) Geodesic Mean Difference (GMD) Analogous to Mean Difference Method, we take the two control points as the geodesic mean of the two classes. 2) Iterative Tangent Plane SVM (ITanSVM) Standard SVM done on tangent plane, with the base point carefully chosen through iterative steps. 3) Manifold SVM (MSVM) A generalization of the SVM criterion to Manifolds.

29 ITanSVM: The Algorithm 1)Calculate mean of the 2 classes, and then compute their mean (c). Construct the tangent plane at c.

30 ITanSVM: The Algorithm 2) Compute the SVM decision line on the tangent plane

31 ITanSVM: The Algorithm 3) Find the pair of points so that a) SVM line is the perpendicular bisector of the line joining the points, and b) The distance between the new points to the old points are minimum.

32 ITanSVM: The Algorithm 4) Map these 2 new points back to the manifold.

33 Manifold SVM (MSVM): the setup Decision fn: = Distance of point x i from the separating plane given by c 1 and c -1 Goal: find c 1 and c -1 such that: maximize the min distance of the training points to H (one of the ways to look at SVM that generalizes to manifolds).

34 SVM Separating rule (w,b) between 2 groups w.x + b =0

35 Results: Hippocampi Data

36 Results: Hippocampi Data Separation shown by different methods GMD MSVM TSVM ITanSVM

37 Results: Generated Ellipsoids 25 randomly distorted (bending, twisting, tapering) ellipsoids. Two groups –11 with negative twist parameter. –14 with positive twist parameter.

38 Results: Generated Ellipsoids

39 Results: Generated Ellipsoids

40 Future Research Extend DWD for Manifold data. Marron et al. (2004): Distance Weighted Discrimination (DWD): unlike SVM it does not suffer from “data piling” and improves generalizability in HDLSS situations.

41 Future Research Application to Diffusion Tensor Imaging data (at each voxel data observed is a 3X3 positive definite matrix). Develop MSVM for multi-category case.

42 THANKS

43 Results: Hippocampi Data d(c 1,c -1 ) vs λ