Bayesian Classification

Slides:



Advertisements
Similar presentations
Face Recognition Ying Wu Electrical and Computer Engineering Northwestern University, Evanston, IL
Advertisements

2 – In previous chapters: – We could design an optimal classifier if we knew the prior probabilities P(wi) and the class- conditional probabilities P(x|wi)
Pattern Classification. Chapter 2 (Part 1): Bayesian Decision Theory (Sections ) Introduction Bayesian Decision Theory–Continuous Features.
Pattern Classification Chapter 2 (Part 2)0 Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Bayesian Decision Theory Chapter 2 (Duda et al.) – Sections
Prénom Nom Document Analysis: Parameter Estimation for Pattern Recognition Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
0 Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Radial Basis Function Networks
Today Wrap up of probability Vectors, Matrices. Calculus
METU Informatics Institute Min 720 Pattern Classification with Bio-Medical Applications PART 2: Statistical Pattern Classification: Optimal Classification.
EE513 Audio Signals and Systems Statistical Pattern Classification Kevin D. Donohue Electrical and Computer Engineering University of Kentucky.
1 Linear Methods for Classification Lecture Notes for CMPUT 466/551 Nilanjan Ray.
Machine Learning Queens College Lecture 3: Probability and Statistics.
Principles of Pattern Recognition
1 Pattern Recognition: Statistical and Neural Lonnie C. Ludeman Lecture 13 Oct 14, 2005 Nanjing University of Science & Technology.
Speech Recognition Pattern Classification. 22 September 2015Veton Këpuska2 Pattern Classification  Introduction  Parametric classifiers  Semi-parametric.
ECE 8443 – Pattern Recognition LECTURE 03: GAUSSIAN CLASSIFIERS Objectives: Normal Distributions Whitening Transformations Linear Discriminants Resources.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 03: GAUSSIAN CLASSIFIERS Objectives: Whitening.
ECE 8443 – Pattern Recognition LECTURE 07: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Class-Conditional Density The Multivariate Case General.
Learning Theory Reza Shadmehr Linear and quadratic decision boundaries Kernel estimates of density Missing data.
1 E. Fatemizadeh Statistical Pattern Recognition.
Computational Intelligence: Methods and Applications Lecture 23 Logistic discrimination and support vectors Włodzisław Duch Dept. of Informatics, UMK Google:
: Chapter 3: Maximum-Likelihood and Baysian Parameter Estimation 1 Montri Karnjanadecha ac.th/~montri.
Chapter 3: Maximum-Likelihood Parameter Estimation l Introduction l Maximum-Likelihood Estimation l Multivariate Case: unknown , known  l Univariate.
Bayesian Decision Theory Basic Concepts Discriminant Functions The Normal Density ROC Curves.
Chapter 20 Classification and Estimation Classification – Feature selection Good feature have four characteristics: –Discrimination. Features.
Elements of Pattern Recognition CNS/EE Lecture 5 M. Weber P. Perona.
Machine Learning CUNY Graduate Center Lecture 2: Math Primer.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 04: GAUSSIAN CLASSIFIERS Objectives: Whitening.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Pattern Classification Chapter 2(Part 3) 0 Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O.
Linear Classifier Team teaching.
Part 3: Estimation of Parameters. Estimation of Parameters Most of the time, we have random samples but not the densities given. If the parametric form.
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 1: INTRODUCTION.
Lecture 2. Bayesian Decision Theory
Chapter 3: Maximum-Likelihood Parameter Estimation
LECTURE 04: DECISION SURFACES
LECTURE 09: BAYESIAN ESTIMATION (Cont.)
LECTURE 10: DISCRIMINANT ANALYSIS
Probability theory retro
LECTURE 03: DECISION SURFACES
Parameter Estimation 主講人:虞台文.
CH 5: Multivariate Methods
Comp328 tutorial 3 Kai Zhang
LECTURE 05: THRESHOLD DECODING
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
REMOTE SENSING Multispectral Image Classification
REMOTE SENSING Multispectral Image Classification
LECTURE 05: THRESHOLD DECODING
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
Pattern Recognition and Image Analysis
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
EE513 Audio Signals and Systems
The basics of Bayes decision theory
Pattern Recognition and Machine Learning
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
LECTURE 09: DISCRIMINANT ANALYSIS
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Parametric Methods Berlin Chen, 2005 References:
Multivariate Methods Berlin Chen
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Mathematical Foundations of BME
LECTURE 05: THRESHOLD DECODING
Multivariate Methods Berlin Chen, 2005 References:
LECTURE 11: Exam No. 1 Review
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Linear Discrimination
Hairong Qi, Gonzalez Family Professor
Presentation transcript:

Bayesian Classification A reference

Copyright, G. A. Tagliarini, PhD Example1: 2/24/2019 Copyright, G. A. Tagliarini, PhD

Copyright, G. A. Tagliarini, PhD Example 1 (continued) Overall objective: count the number of people on the beach Intermediate objectives: Reduce the search space Segment the image into three zones (classes) Surf, Beach, and Building 2/24/2019 Copyright, G. A. Tagliarini, PhD

Copyright, G. A. Tagliarini, PhD Example 1 (continued) Consider a randomly selected pixel x from the image Suppose the a priori probabilities with respect to the three classes are: P(x is in the building area)  0.17 P(x is in the beach area)  0.58 P(x is in the surf area)  0.25 What decision rule minimizes error? 2/24/2019 Copyright, G. A. Tagliarini, PhD

Copyright, G. A. Tagliarini, PhD Example 1: Suppose additional information regarding a property (such as color, brightness, or variability) of the pixel (or its neighborhood) is available. Can such knowledge aid classification? What is p(the pixel x came from the beach area given the pixel is red), i.e., p(x | red)? 2/24/2019 Copyright, G. A. Tagliarini, PhD

Copyright, G. A. Tagliarini, PhD Example 1: Consider the hypothetical, regional color distributions h 2/24/2019 Copyright, G. A. Tagliarini, PhD

Copyright, G. A. Tagliarini, PhD Example 1: The joint probability that a randomly selected pixel is from the beach area and has a hue h, written p(beach, h) = p(h|beach) P(beach) = P(beach|h) p(h) Solving for P(beach|h) we get P(beach|h) = p(h|beach) P(beach) / p(h) p(h) = p(h|building) P(building) + p(h|beach) P(beach) + p(h|surf) P(surf) 1. With a priori probabilities consider p(card is red and a king) then p(red and king)=2/52 = p(red|king)*p(king)=2/4 * 4/52 = 1/26 =p(king|red)*p(red) = 2/26 * 26/52 = 2/52. 2. p(h) merely accumulates a weighted average of the occurrences of hue h, since the areas are assume to be independent, which is simply non-overlapping in this example. 2/24/2019 Copyright, G. A. Tagliarini, PhD

Copyright, G. A. Tagliarini, PhD A General Formulation 2/24/2019 Copyright, G. A. Tagliarini, PhD

Copyright, G. A. Tagliarini, PhD A Casual Formulation The prior probability reflects knowledge of the relative frequency of instances of a class The likelihood is a measure of the probability that a measurement value occurs in a class The evidence is a scaling term 2/24/2019 Copyright, G. A. Tagliarini, PhD

Copyright, G. A. Tagliarini, PhD Forming a Classifier Create discriminant functions gi(x) for each class i = 1,…,c Not unique Partition measurement space with crisp boundaries Assign x to class k if gk(x) > gj(x) for all k ≠ j For a minimum error classifier, gi(x)=P(i|x) 2/24/2019 Copyright, G. A. Tagliarini, PhD

Equivalent Discriminants If f is monotone increasing, the collection hi(x) = f(gi(x)), i = 1,…,c forms an equivalent family of discriminant functions, e.g., 2/24/2019 Copyright, G. A. Tagliarini, PhD

Gaussian Distributions 2/24/2019 Copyright, G. A. Tagliarini, PhD

Gaussian Distributions Details 2/24/2019 Copyright, G. A. Tagliarini, PhD

Discriminants for Normal Density Recall the classifier functions Assuming the measurements are normally distributed, we have 2/24/2019 Copyright, G. A. Tagliarini, PhD

Some Algebra to Simplify the Discriminants Since We take the natural logarithm to re-write the first term 2/24/2019 Copyright, G. A. Tagliarini, PhD

Some Algebra to Simplify the Discriminants (continued) 2/24/2019 Copyright, G. A. Tagliarini, PhD

The Discriminants (Finally!!) 2/24/2019 Copyright, G. A. Tagliarini, PhD

Copyright, G. A. Tagliarini, PhD Special Case 1: i = 2I 2/24/2019 Copyright, G. A. Tagliarini, PhD

Copyright, G. A. Tagliarini, PhD Special Case 1: i = 2I If the classes are equally likely, the discriminants depend only upon the distances to the means A diagonal covariance matrix implies the parameters are statistically independent A constant diagonal implies the class measurements have identical variability in each dimension and hence, they are spherical in d space The discriminant functions define hyperplanes orthogonal to the line segments joining the distribution means 2/24/2019 Copyright, G. A. Tagliarini, PhD

Copyright, G. A. Tagliarini, PhD Special Case 1: i = 2I 2/24/2019 Copyright, G. A. Tagliarini, PhD

Copyright, G. A. Tagliarini, PhD Special Case 2: i =  2/24/2019 Copyright, G. A. Tagliarini, PhD

Copyright, G. A. Tagliarini, PhD Special Case 2: i =  Since may possess nonzero, off-diagonal elements and varying diagonal elements the measurement distributions lie in hyper-ellipsoids The discriminant hyperplanes are often not orthogonal to the segments joining the class means 2/24/2019 Copyright, G. A. Tagliarini, PhD

Copyright, G. A. Tagliarini, PhD Special Case 2: i =  The quadratic term is independent of i and may be eliminated. 2/24/2019 Copyright, G. A. Tagliarini, PhD

Copyright, G. A. Tagliarini, PhD Case 3: i = arbitrary This is quadratic in x The discriminant decision surfaces can arise from hyperplanes, hyperparabloids, hyperellipsoids, hyperspheres, or combinations of these!!! 2/24/2019 Copyright, G. A. Tagliarini, PhD

Copyright, G. A. Tagliarini, PhD Example 2: A Problem Exemplars (transposed) For w1 = {(2, 6), (3, 4), (3, 8), (4, 6)} For w2 = {(1, -2), (3, 0), (3, -4), (5, -2)} Calculated means (transposed) m1 = (3, 6) m2 = (3, -2) 2/24/2019 Copyright, G. A. Tagliarini, PhD

Example 2: Covariance Matrices 2/24/2019 Copyright, G. A. Tagliarini, PhD

Example 2: Covariance Matrices 2/24/2019 Copyright, G. A. Tagliarini, PhD

Example 2: Inverse and Determinant for Each of the Covariance Matrices 2/24/2019 Copyright, G. A. Tagliarini, PhD

Example 2: A Discriminant Function for Class 1 2/24/2019 Copyright, G. A. Tagliarini, PhD

Copyright, G. A. Tagliarini, PhD Example 2 2/24/2019 Copyright, G. A. Tagliarini, PhD

Example 2: A Discriminant Function for Class 2 2/24/2019 Copyright, G. A. Tagliarini, PhD

Copyright, G. A. Tagliarini, PhD Example 2 2/24/2019 Copyright, G. A. Tagliarini, PhD

Example 2: The Class Boundary 2/24/2019 Copyright, G. A. Tagliarini, PhD

Example 2: A Quadratic Separator 2/24/2019 Copyright, G. A. Tagliarini, PhD

Example 2: Plot of the Discriminant 2/24/2019 Copyright, G. A. Tagliarini, PhD

Summary Steps for Building a Bayesian Classifier Collect class exemplars Estimate class a priori probabilities Estimate class means Form covariance matrices, find the inverse and determinant for each Form the discriminant function for each class 2/24/2019 Copyright, G. A. Tagliarini, PhD

Copyright, G. A. Tagliarini, PhD Using the Classifier Obtain a measurement vector x Evaluate the discriminant function gi(x) for each class i = 1,…,c Decide x is in the class j if gj(x) > gi(x) for all i  j 2/24/2019 Copyright, G. A. Tagliarini, PhD