1 Pattern Recognition: Statistical and Neural Lonnie C. Ludeman Lecture 14 Oct 14, 2005 Nanjing University of Science & Technology.

Slides:



Advertisements
Similar presentations
STOR 892 Object Oriented Data Analysis Radial Distance Weighted Discrimination Jie Xiong Advised by Prof. J.S. Marron Department of Statistics and Operations.
Advertisements

Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?
An Introduction of Support Vector Machine
CHAPTER 10: Linear Discrimination
Support vector machine
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Lecture 8 – Nonlinear Programming Models Topics General formulations Local vs. global solutions Solution characteristics Convexity and convex programming.
Discriminative and generative methods for bags of features
Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?
Bayesian Decision Theory Chapter 2 (Duda et al.) – Sections
Lecture 20 Object recognition I
Chapter 5: Linear Discriminant Functions
Prénom Nom Document Analysis: Linear Discrimination Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
CES 514 – Data Mining Lecture 8 classification (contd…)
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
SVMs Finalized. Where we are Last time Support vector machines in grungy detail The SVM objective function and QP Today Last details on SVMs Putting it.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Lecture outline Support vector machines. Support Vector Machines Find a linear hyperplane (decision boundary) that will separate the data.
Support Vector Machines
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Linear Discriminant Functions Chapter 5 (Duda et al.)
METU Informatics Institute Min 720 Pattern Classification with Bio-Medical Applications PART 2: Statistical Pattern Classification: Optimal Classification.
1 Linear Methods for Classification Lecture Notes for CMPUT 466/551 Nilanjan Ray.
Ch. Eick: Support Vector Machines: The Main Ideas Reading Material Support Vector Machines: 1.Textbook 2. First 3 columns of Smola/Schönkopf article on.
This week: overview on pattern recognition (related to machine learning)
Artificial Intelligence Lecture No. 28 Dr. Asad Ali Safi ​ Assistant Professor, Department of Computer Science, COMSATS Institute of Information Technology.
Principles of Pattern Recognition
1 Pattern Recognition: Statistical and Neural Lonnie C. Ludeman Lecture 13 Oct 14, 2005 Nanjing University of Science & Technology.
CS 8751 ML & KDDSupport Vector Machines1 Support Vector Machines (SVMs) Learning mechanism based on linear programming Chooses a separating plane based.
ECE 8443 – Pattern Recognition LECTURE 03: GAUSSIAN CLASSIFIERS Objectives: Normal Distributions Whitening Transformations Linear Discriminants Resources.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 03: GAUSSIAN CLASSIFIERS Objectives: Whitening.
1 Pattern Recognition: Statistical and Neural Lonnie C. Ludeman Lecture 20 Oct 26, 2005 Nanjing University of Science & Technology.
1 Pattern Recognition: Statistical and Neural Lonnie C. Ludeman Lecture 21 Oct 28, 2005 Nanjing University of Science & Technology.
Classifiers Given a feature representation for images, how do we learn a model for distinguishing features from different classes? Zebra Non-zebra Decision.
1 E. Fatemizadeh Statistical Pattern Recognition.
1 Pattern Recognition: Statistical and Neural Lonnie C. Ludeman Lecture 24 Nov 2, 2005 Nanjing University of Science & Technology.
Computational Intelligence: Methods and Applications Lecture 23 Logistic discrimination and support vectors Włodzisław Duch Dept. of Informatics, UMK Google:
Optimal Bayes Classification
Kernel Methods: Support Vector Machines Maximum Margin Classifiers and Support Vector Machines.
1 Pattern Recognition: Statistical and Neural Lonnie C. Ludeman Lecture 12 Sept 30, 2005 Nanjing University of Science & Technology.
1 Pattern Recognition: Statistical and Neural Lonnie C. Ludeman Lecture 8 Sept 23, 2005 Nanjing University of Science & Technology.
Lecture 4 Linear machine
An Introduction to Support Vector Machine (SVM)
Linear Models for Classification
1 Pattern Recognition: Statistical and Neural Lonnie C. Ludeman Lecture 29 Nov 11, 2005 Nanjing University of Science & Technology.
Bayesian Decision Theory Basic Concepts Discriminant Functions The Normal Density ROC Curves.
CSE4334/5334 DATA MINING CSE4334/5334 Data Mining, Fall 2014 Department of Computer Science and Engineering, University of Texas at Arlington Chengkai.
Linear Methods for Classification Based on Chapter 4 of Hastie, Tibshirani, and Friedman David Madigan.
METU Informatics Institute Min720 Pattern Classification with Bio-Medical Applications Part 7: Linear and Generalized Discriminant Functions.
Lecture notes for Stat 231: Pattern Recognition and Machine Learning 1. Stat 231. A.L. Yuille. Fall Perceptron Rule and Convergence Proof Capacity.
Classification Course web page: vision.cis.udel.edu/~cv May 14, 2003  Lecture 34.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 04: GAUSSIAN CLASSIFIERS Objectives: Whitening.
1 Pattern Recognition: Statistical and Neural Lonnie C. Ludeman Lecture 28 Nov 9, 2005 Nanjing University of Science & Technology.
METU Informatics Institute Min720 Pattern Classification with Bio-Medical Applications Part 9: Review.
Kernel Methods: Support Vector Machines Maximum Margin Classifiers and Support Vector Machines.
Giansalvo EXIN Cirrincione unit #4 Single-layer networks They directly compute linear discriminant functions using the TS without need of determining.
1 Pattern Recognition: Statistical and Neural Lonnie C. Ludeman Lecture 2 Nanjing University of Science & Technology.
ECE 471/571 – Lecture 3 Discriminant Function and Normal Density 08/27/15.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
1 Kernel Machines A relatively new learning methodology (1992) derived from statistical learning theory. Became famous when it gave accuracy comparable.
Linear Discriminant Functions Chapter 5 (Duda et al.) CS479/679 Pattern Recognition Dr. George Bebis.
Objectives: Loss Functions Risk Min. Error Rate Class. Resources: DHS – Chap. 2 (Part 1) DHS – Chap. 2 (Part 2) RGO - Intro to PR MCE for Speech MCE for.
PREDICT 422: Practical Machine Learning
LECTURE 03: DECISION SURFACES
Pattern Recognition: Statistical and Neural
McCulloch–Pitts Neuronal Model :
LECTURE 11: Exam No. 1 Review
Linear Discrimination
Hairong Qi, Gonzalez Family Professor
Hairong Qi, Gonzalez Family Professor
Presentation transcript:

1 Pattern Recognition: Statistical and Neural Lonnie C. Ludeman Lecture 14 Oct 14, 2005 Nanjing University of Science & Technology

2 Lecture 14 Topics 1. Review structures of Optimal Classifier 2. Define Linear functions, hyperplanes, boundaries, unit normals,various distances 3. Use of Linear Discriminant functions for defining classifiers- Examples

3 Motivation!

4 if - (x – M 1 ) T K 1 -1 (x – M 1 ) + (x – M 2 ) T K 2 -1 (x – M 2 ) > < C1C1 C2C2 T1T1 Optimum Decision Rules: 2-class Gaussian Quadratic Processing if ( M 1 – M 2 ) T K -1 x > < C1C1 C2C2 T2T2 Case 2: K 1 = K 2 = K Case 1: K 1 = K 2 Linear Processing Review 1

5 if ( M 1 – M 2 ) T x > < C1C1 C2C2 T3T3 Case 3: K 1 = K 2 = K = s 2 I Optimum Decision Rules: 2-class Gaussian (cont) Linear Processing Review 2

6 Q i (x) = (x – M j ) T K j -1 (x – M j ) } – 2 ln P(C j ) + ln | K i | M-Class General Gaussian MPE and MAP Select Class C j if Q j (x) is MINIMUM Select Class C j if L j (x) is MAXIMUM L j (x) = M j T K -1 x – ½ M j T K -1 M j + lnP(C j ) Case 2: K 1 = K 2 = … = K M = K Case 1: K 1 = K 2 Review 3

7 Bayes decision rule is determined form a set of y i (x) defined by p(x|C k ) = 1 ( 2 ) N/2 KkKk ½ exp(- ½ (x – M k ) T K k -1 (x – M k ) ) C k : X ~ N( M k, K k ), P(C k ) M-Class General Gaussian: Bayes where Review 4

8 (2 ) N/2 KjKj ½ C ij exp(- ½ (x – M j ) T K j -1 (x – M j )) P(C j ) y i (x) = j=1 M Taking the ln of the y i (x) for this case does not simplify to a linear or quadratic processor The structure of the optimum classifier uses a sum of exp( quadratic forms) and thus is a special form of nonlinear processing using quadratic forms. Review 5

9 Gaussian assumptions Quadratic processing Linear and Reasons for studying linear, quadratic and other special forms of non linear processing If Gaussian we can find or learn a usable decision rule and the rule is optimum If non-Gaussian case we can find or learn a usable decision rule; however the rule is NOT necessarily optimum

10 Linear functions f(x 1 ) = w 1 x 1 + w 2 One Variable f(x 1, x 2 ) = w 1 x 1 + w 2 x 2 + w 3 Two Variables f(x 1, x 2, x 3 ) = w 1 x 1 + w 2 x 2 + w 2 x 2 + w 3 Three Variables

11 w 1 x 1 + w 2 = 0 w 1 x 1 + w 2 x 2 + w 3 = 0 w 1 x 1 + w 2 x 2 + w 3 x 3 + w 4 = 0 Constant Line Plane w 1 x 1 + w 2 x 2 + w 3 x 3 + w 4 x 4 + w 5 = 0 ? Answer = Hyperplane

12 Hyperplanes w 1 x 1 + w 2 x 2 + … + w n x n + w n+1 = 0 x = [ x 1, x 2,…, x n ] T w 0 = [ w 1, w 2,…, w n ] T w 0 x + w n+1 = 0 Define An alternative representation of a Hyperplane is n-dimensional Hyperplane T

13 Hyperplanes as boundaries for Regions R + = { x : } R - = { x: } Positive side of Hyperplane boundary Negative side of Hyperplane boundary w 0 x + w n+1 = 0 Hyperplane boundary

14

15 Definitions (1) Unit Normal u

16 (2) Distance from a point y to the hyperplane (3) Distance from the origin to the hyperplane

17 (4) Linear Discriminate Functions where Augmented Pattern Vector Weight vector

18 Linear Decision Rule: 2-Class Case using single linear discriminant function No claim of optimality !!! for a vector x if given: d(x)=w 1 x 1 + w 2 x 2 + … + w n x n + w n+1

19

20 Linear Decision Rule: 2-Class Case using two linear discriminant function except on boundaries d 1 (x) = 0 and d 2 (x) = 0 where we decide randomly between C 1 and C 2 given two discriminant functions define decision rule by

21 Decision regions (2-class case) using two linear discriminant functions and AND logic

22 Decision regions (2-class case) using two linear discriminant functions(continued)

23 Decision regions (2-class case) alternative formulation using two linear discriminant functions

24 Decision regions (2-class case) using alternative form of two linear discriminant functions equivalent to

25 Decision regions (3-class case) using two linear discriminant functions

26 Decision regions (4-class case) using two linear discriminant functions

27 Decision region R 1 (M-class case) using K linear discriminant functions

28 Example: Piecewise linear boundaries Given the following discriminant functions

29 If d 1 (x) > 0 AND d 2 (x) > 0 Define the following decision rule Show the decision regions in the two dimensional pattern space OR d 3 (x) > 0 AND d 4 (x) > 0 AND d 5 (x) > 0 AND d 6 (x) > 0 then decide x comes from class C 1, on the boundaries decide randomly, otherwise decide C 2 Example Continued

30 Solution:

31 Lecture 14 Summary 1. Reviewed structures of Optimal Classifier 2. Defined Linear functions, hyperplanes, boundaries, unit normals,various distances 3. Used Linear Discriminant functions for defining classifiers- Examples

32 End of Lecture 14