Bayesian Classification

Slides:

Advertisements

Similar presentations

Face Recognition Ying Wu Electrical and Computer Engineering Northwestern University, Evanston, IL

Advertisements

2 – In previous chapters: – We could design an optimal classifier if we knew the prior probabilities P(wi) and the class- conditional probabilities P(x|wi)

Pattern Classification. Chapter 2 (Part 1): Bayesian Decision Theory (Sections ) Introduction Bayesian Decision Theory–Continuous Features.

Pattern Classification Chapter 2 (Part 2)0 Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O.

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.

Bayesian Decision Theory Chapter 2 (Duda et al.) – Sections

Prénom Nom Document Analysis: Parameter Estimation for Pattern Recognition Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.

0 Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.

Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.

Radial Basis Function Networks

Today Wrap up of probability Vectors, Matrices. Calculus

METU Informatics Institute Min 720 Pattern Classification with Bio-Medical Applications PART 2: Statistical Pattern Classification: Optimal Classification.

EE513 Audio Signals and Systems Statistical Pattern Classification Kevin D. Donohue Electrical and Computer Engineering University of Kentucky.

1 Linear Methods for Classification Lecture Notes for CMPUT 466/551 Nilanjan Ray.

Machine Learning Queens College Lecture 3: Probability and Statistics.

Principles of Pattern Recognition

1 Pattern Recognition: Statistical and Neural Lonnie C. Ludeman Lecture 13 Oct 14, 2005 Nanjing University of Science & Technology.

Speech Recognition Pattern Classification. 22 September 2015Veton Këpuska2 Pattern Classification  Introduction  Parametric classifiers  Semi-parametric.

ECE 8443 – Pattern Recognition LECTURE 03: GAUSSIAN CLASSIFIERS Objectives: Normal Distributions Whitening Transformations Linear Discriminants Resources.

ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 03: GAUSSIAN CLASSIFIERS Objectives: Whitening.

ECE 8443 – Pattern Recognition LECTURE 07: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Class-Conditional Density The Multivariate Case General.

Learning Theory Reza Shadmehr Linear and quadratic decision boundaries Kernel estimates of density Missing data.

1 E. Fatemizadeh Statistical Pattern Recognition.

Computational Intelligence: Methods and Applications Lecture 23 Logistic discrimination and support vectors Włodzisław Duch Dept. of Informatics, UMK Google:

: Chapter 3: Maximum-Likelihood and Baysian Parameter Estimation 1 Montri Karnjanadecha ac.th/~montri.

Chapter 3: Maximum-Likelihood Parameter Estimation l Introduction l Maximum-Likelihood Estimation l Multivariate Case: unknown , known  l Univariate.

Bayesian Decision Theory Basic Concepts Discriminant Functions The Normal Density ROC Curves.

Chapter 20 Classification and Estimation Classification – Feature selection Good feature have four characteristics: –Discrimination. Features.

Elements of Pattern Recognition CNS/EE Lecture 5 M. Weber P. Perona.

Machine Learning CUNY Graduate Center Lecture 2: Math Primer.

ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 04: GAUSSIAN CLASSIFIERS Objectives: Whitening.

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.

Pattern Classification Chapter 2(Part 3) 0 Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O.

Linear Classifier Team teaching.

Part 3: Estimation of Parameters. Estimation of Parameters Most of the time, we have random samples but not the densities given. If the parametric form.

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 1: INTRODUCTION.

Lecture 2. Bayesian Decision Theory

Chapter 3: Maximum-Likelihood Parameter Estimation

LECTURE 04: DECISION SURFACES

LECTURE 09: BAYESIAN ESTIMATION (Cont.)

LECTURE 10: DISCRIMINANT ANALYSIS

Probability theory retro

LECTURE 03: DECISION SURFACES

Parameter Estimation 主講人：虞台文.

CH 5: Multivariate Methods

Comp328 tutorial 3 Kai Zhang

LECTURE 05: THRESHOLD DECODING

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.

REMOTE SENSING Multispectral Image Classification

REMOTE SENSING Multispectral Image Classification

LECTURE 05: THRESHOLD DECODING

Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.

Pattern Recognition and Image Analysis

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.

EE513 Audio Signals and Systems

The basics of Bayes decision theory

Pattern Recognition and Machine Learning

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.

LECTURE 09: DISCRIMINANT ANALYSIS

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.

Parametric Methods Berlin Chen, 2005 References:

Multivariate Methods Berlin Chen

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.

Mathematical Foundations of BME

LECTURE 05: THRESHOLD DECODING

Multivariate Methods Berlin Chen, 2005 References:

LECTURE 11: Exam No. 1 Review

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.

Linear Discrimination

Hairong Qi, Gonzalez Family Professor

Presentation transcript:

Bayesian Classification A reference

Copyright, G. A. Tagliarini, PhD Example1: 2/24/2019 Copyright, G. A. Tagliarini, PhD

Copyright, G. A. Tagliarini, PhD Example 1 (continued) Overall objective: count the number of people on the beach Intermediate objectives: Reduce the search space Segment the image into three zones (classes) Surf, Beach, and Building 2/24/2019 Copyright, G. A. Tagliarini, PhD

Copyright, G. A. Tagliarini, PhD Example 1 (continued) Consider a randomly selected pixel x from the image Suppose the a priori probabilities with respect to the three classes are: P(x is in the building area)  0.17 P(x is in the beach area)  0.58 P(x is in the surf area)  0.25 What decision rule minimizes error? 2/24/2019 Copyright, G. A. Tagliarini, PhD

Copyright, G. A. Tagliarini, PhD Example 1: Suppose additional information regarding a property (such as color, brightness, or variability) of the pixel (or its neighborhood) is available. Can such knowledge aid classification? What is p(the pixel x came from the beach area given the pixel is red), i.e., p(x | red)? 2/24/2019 Copyright, G. A. Tagliarini, PhD

Copyright, G. A. Tagliarini, PhD Example 1: Consider the hypothetical, regional color distributions h 2/24/2019 Copyright, G. A. Tagliarini, PhD

Copyright, G. A. Tagliarini, PhD Example 1: The joint probability that a randomly selected pixel is from the beach area and has a hue h, written p(beach, h) = p(h|beach) P(beach) = P(beach|h) p(h) Solving for P(beach|h) we get P(beach|h) = p(h|beach) P(beach) / p(h) p(h) = p(h|building) P(building) + p(h|beach) P(beach) + p(h|surf) P(surf) 1. With a priori probabilities consider p(card is red and a king) then p(red and king)=2/52 = p(red|king)*p(king)=2/4 * 4/52 = 1/26 =p(king|red)*p(red) = 2/26 * 26/52 = 2/52. 2. p(h) merely accumulates a weighted average of the occurrences of hue h, since the areas are assume to be independent, which is simply non-overlapping in this example. 2/24/2019 Copyright, G. A. Tagliarini, PhD

Copyright, G. A. Tagliarini, PhD A General Formulation 2/24/2019 Copyright, G. A. Tagliarini, PhD

Copyright, G. A. Tagliarini, PhD A Casual Formulation The prior probability reflects knowledge of the relative frequency of instances of a class The likelihood is a measure of the probability that a measurement value occurs in a class The evidence is a scaling term 2/24/2019 Copyright, G. A. Tagliarini, PhD

Copyright, G. A. Tagliarini, PhD Forming a Classifier Create discriminant functions gi(x) for each class i = 1,…,c Not unique Partition measurement space with crisp boundaries Assign x to class k if gk(x) > gj(x) for all k ≠ j For a minimum error classifier, gi(x)=P(i|x) 2/24/2019 Copyright, G. A. Tagliarini, PhD

Equivalent Discriminants If f is monotone increasing, the collection hi(x) = f(gi(x)), i = 1,…,c forms an equivalent family of discriminant functions, e.g., 2/24/2019 Copyright, G. A. Tagliarini, PhD

Gaussian Distributions 2/24/2019 Copyright, G. A. Tagliarini, PhD

Gaussian Distributions Details 2/24/2019 Copyright, G. A. Tagliarini, PhD

Discriminants for Normal Density Recall the classifier functions Assuming the measurements are normally distributed, we have 2/24/2019 Copyright, G. A. Tagliarini, PhD

Some Algebra to Simplify the Discriminants Since We take the natural logarithm to re-write the first term 2/24/2019 Copyright, G. A. Tagliarini, PhD

Some Algebra to Simplify the Discriminants (continued) 2/24/2019 Copyright, G. A. Tagliarini, PhD

The Discriminants (Finally!!) 2/24/2019 Copyright, G. A. Tagliarini, PhD

Copyright, G. A. Tagliarini, PhD Special Case 1: i = 2I 2/24/2019 Copyright, G. A. Tagliarini, PhD

Copyright, G. A. Tagliarini, PhD Special Case 1: i = 2I If the classes are equally likely, the discriminants depend only upon the distances to the means A diagonal covariance matrix implies the parameters are statistically independent A constant diagonal implies the class measurements have identical variability in each dimension and hence, they are spherical in d space The discriminant functions define hyperplanes orthogonal to the line segments joining the distribution means 2/24/2019 Copyright, G. A. Tagliarini, PhD

Copyright, G. A. Tagliarini, PhD Special Case 1: i = 2I 2/24/2019 Copyright, G. A. Tagliarini, PhD

Copyright, G. A. Tagliarini, PhD Special Case 2: i =  2/24/2019 Copyright, G. A. Tagliarini, PhD

Copyright, G. A. Tagliarini, PhD Special Case 2: i =  Since may possess nonzero, off-diagonal elements and varying diagonal elements the measurement distributions lie in hyper-ellipsoids The discriminant hyperplanes are often not orthogonal to the segments joining the class means 2/24/2019 Copyright, G. A. Tagliarini, PhD

Copyright, G. A. Tagliarini, PhD Special Case 2: i =  The quadratic term is independent of i and may be eliminated. 2/24/2019 Copyright, G. A. Tagliarini, PhD

Copyright, G. A. Tagliarini, PhD Case 3: i = arbitrary This is quadratic in x The discriminant decision surfaces can arise from hyperplanes, hyperparabloids, hyperellipsoids, hyperspheres, or combinations of these!!! 2/24/2019 Copyright, G. A. Tagliarini, PhD

Copyright, G. A. Tagliarini, PhD Example 2: A Problem Exemplars (transposed) For w1 = {(2, 6), (3, 4), (3, 8), (4, 6)} For w2 = {(1, -2), (3, 0), (3, -4), (5, -2)} Calculated means (transposed) m1 = (3, 6) m2 = (3, -2) 2/24/2019 Copyright, G. A. Tagliarini, PhD

Example 2: Covariance Matrices 2/24/2019 Copyright, G. A. Tagliarini, PhD

Example 2: Covariance Matrices 2/24/2019 Copyright, G. A. Tagliarini, PhD

Example 2: Inverse and Determinant for Each of the Covariance Matrices 2/24/2019 Copyright, G. A. Tagliarini, PhD

Example 2: A Discriminant Function for Class 1 2/24/2019 Copyright, G. A. Tagliarini, PhD

Copyright, G. A. Tagliarini, PhD Example 2 2/24/2019 Copyright, G. A. Tagliarini, PhD

Example 2: A Discriminant Function for Class 2 2/24/2019 Copyright, G. A. Tagliarini, PhD

Copyright, G. A. Tagliarini, PhD Example 2 2/24/2019 Copyright, G. A. Tagliarini, PhD

Example 2: The Class Boundary 2/24/2019 Copyright, G. A. Tagliarini, PhD

Example 2: A Quadratic Separator 2/24/2019 Copyright, G. A. Tagliarini, PhD

Example 2: Plot of the Discriminant 2/24/2019 Copyright, G. A. Tagliarini, PhD

Summary Steps for Building a Bayesian Classifier Collect class exemplars Estimate class a priori probabilities Estimate class means Form covariance matrices, find the inverse and determinant for each Form the discriminant function for each class 2/24/2019 Copyright, G. A. Tagliarini, PhD

Copyright, G. A. Tagliarini, PhD Using the Classifier Obtain a measurement vector x Evaluate the discriminant function gi(x) for each class i = 1,…,c Decide x is in the class j if gj(x) > gi(x) for all i  j 2/24/2019 Copyright, G. A. Tagliarini, PhD