Advanced Pattern Recognition

Slides:

Advertisements

Similar presentations

AP Statistics Chapter 7 – Random Variables. Random Variables Random Variable – A variable whose value is a numerical outcome of a random phenomenon. Discrete.

Advertisements

ECE 8443 – Pattern Recognition LECTURE 05: MAXIMUM LIKELIHOOD ESTIMATION Objectives: Discrete Features Maximum Likelihood Resources: D.H.S: Chapter 3 (Part.

2 – In previous chapters: – We could design an optimal classifier if we knew the prior probabilities P(wi) and the class- conditional probabilities P(x|wi)

AP Statistics Chapter 7 Notes. Random Variables Random Variable –A variable whose value is a numerical outcome of a random phenomenon. Discrete Random.

Pattern Classification. Chapter 2 (Part 1): Bayesian Decision Theory (Sections ) Introduction Bayesian Decision Theory–Continuous Features.

Bayesian Decision Theory

Pattern Classification Chapter 2 (Part 2)0 Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O.

Visual Recognition Tutorial

Bayesian Decision Theory Chapter 2 (Duda et al.) – Sections

Probability Review 1 CS479/679 Pattern Recognition Dr. George Bebis.

Lecture 20 Object recognition I

Prénom Nom Document Analysis: Parameter Estimation for Pattern Recognition Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.

0 Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.

Chapter 2: Bayesian Decision Theory (Part 1) Introduction Bayesian Decision Theory–Continuous Features All materials used in this course were taken from.

Visual Recognition Tutorial1 Random variables, distributions, and probability density functions Discrete Random Variables Continuous Random Variables.

CHAPTER 4: Parametric Methods. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Parametric Estimation Given.

Computer vision: models, learning and inference Chapter 3 Common probability distributions.

MACHINE LEARNING 6. Multivariate Methods 1. Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Motivating Example  Loan.

Pattern Recognition Topic 2: Bayes Rule Expectant mother:

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.

METU Informatics Institute Min 720 Pattern Classification with Bio-Medical Applications PART 2: Statistical Pattern Classification: Optimal Classification.

EE513 Audio Signals and Systems Statistical Pattern Classification Kevin D. Donohue Electrical and Computer Engineering University of Kentucky.

: Appendix A: Mathematical Foundations 1 Montri Karnjanadecha ac.th/~montri Principles of.

Machine Learning Queens College Lecture 3: Probability and Statistics.

Pattern Recognition: Baysian Decision Theory Charles Tappert Seidenberg School of CSIS, Pace University.

Principles of Pattern Recognition

1 Pattern Recognition: Statistical and Neural Lonnie C. Ludeman Lecture 13 Oct 14, 2005 Nanjing University of Science & Technology.

PBG 650 Advanced Plant Breeding

ECE 8443 – Pattern Recognition LECTURE 06: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Bias in ML Estimates Bayesian Estimation Example Resources:

Lecture 2: Bayesian Decision Theory 1. Diagram and formulation

ECE 8443 – Pattern Recognition LECTURE 03: GAUSSIAN CLASSIFIERS Objectives: Normal Distributions Whitening Transformations Linear Discriminants Resources.

ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 02: BAYESIAN DECISION THEORY Objectives: Bayes.

ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 03: GAUSSIAN CLASSIFIERS Objectives: Whitening.

Chapter 3: Maximum-Likelihood Parameter Estimation l Introduction l Maximum-Likelihood Estimation l Multivariate Case: unknown , known  l Univariate.

1 Pattern Recognition: Statistical and Neural Lonnie C. Ludeman Lecture 8 Sept 23, 2005 Nanjing University of Science & Technology.

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.

1Ellen L. Walker Category Recognition Associating information extracted from images with categories (classes) of objects Requires prior knowledge about.

Bayesian Decision Theory Basic Concepts Discriminant Functions The Normal Density ROC Curves.

Basics on Probability Jingrui He 09/11/2007. Coin Flips  You flip a coin Head with probability 0.5  You flip 100 coins How many heads would you expect.

Chapter 20 Classification and Estimation Classification – Feature selection Good feature have four characteristics: –Discrimination. Features.

Introduction to Pattern Recognition (การรู้จํารูปแบบเบื้องต้น)

ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.

Lecture 3: MLE, Bayes Learning, and Maximum Entropy

Basic Technical Concepts in Machine Learning Introduction Supervised learning Problems in supervised learning Bayesian decision theory.

Pattern Recognition Mathematic Review Hamid R. Rabiee Jafar Muhammadi Ali Jalali.

ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.

Objectives: Loss Functions Risk Min. Error Rate Class. Resources: DHS – Chap. 2 (Part 1) DHS – Chap. 2 (Part 2) RGO - Intro to PR MCE for Speech MCE for.

Pattern Recognition Probability Review

Lecture 2. Bayesian Decision Theory

Basic Technical Concepts in Machine Learning

Lecture 1.31 Criteria for optimal reception of radio signals.

Special Topics In Scientific Computing

LECTURE 03: DECISION SURFACES

CH 5: Multivariate Methods

LECTURE 01: COURSE OVERVIEW

Comp328 tutorial 3 Kai Zhang

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.

Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.

Advanced Pattern Recognition

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.

EE513 Audio Signals and Systems

LECTURE 01: COURSE OVERVIEW

Mathematical Foundations of BME

LECTURE 23: INFORMATION THEORY REVIEW

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.

Parametric Methods Berlin Chen, 2005 References:

Multivariate Methods Berlin Chen

Multivariate Methods Berlin Chen, 2005 References:

Presentation transcript:

Advanced Pattern Recognition Team B 2018.03.13 TO NGUYEN NHAT MINH PHAM THANH NAM PARK JOON RYEOUL KIM MI SUN

Contents Bayes rule Bayes classifier Covariance Euclidian distance and Mahalanobis distance

Bayes Rule

Conditional Probability A B Conditional Probability for Forecast Conditional Probability for Colored balls

Conditional Probability – Example

Bayes rule

Conditional Probability Women (A) = 80 Men (A’) = 120 French (B) = 90 50 40 German (B’) = 110 30 80 𝑃 𝑊𝑜𝑚𝑒𝑛=𝐴 = 80 200 𝑃 𝑀𝑒𝑛=𝐴′ = 120 190 𝑃 𝐹𝑟𝑒𝑛𝑐ℎ=𝐵 = 90 200 𝑃 𝐺𝑒𝑟𝑚𝑎𝑛=𝐵′ = 110 190 𝑇ℎ𝑒 𝑝𝑟𝑜𝑏𝑎𝑏𝑖𝑙𝑖𝑡𝑦 𝑡ℎ𝑎𝑡 𝑤𝑜𝑚𝑒𝑛 𝑠𝑡𝑢𝑑𝑒𝑛𝑡𝑠 𝑎𝑡𝑡𝑒𝑛𝑑𝑒𝑑 𝐹𝑟𝑒𝑛𝑐ℎ 𝑐𝑙𝑎𝑠𝑠 𝑃 𝐹𝑟𝑒𝑛𝑐ℎ|𝑊𝑜𝑚𝑒𝑛 = 50 80 The Probability that men students attended French class 𝑃 𝐺𝑒𝑟𝑚𝑎𝑛|𝑀𝑒𝑛 = 40 120 The Probability that attended German class in the women 𝑃 𝑊𝑜𝑚𝑒𝑛|𝐺𝑒𝑟𝑚𝑎𝑛 = 30 110 The Probability that attended French class in the women 𝑃 𝑀𝑒𝑛|𝐹𝑟𝑒𝑛𝑐ℎ = 40 90 The number of women students 80 The number of men students 120 The number of French Class’ students 90 The number of German Class’ students 110 The number of French class’ women students 50 The number of German class’ women students 30 The number of French class’ men students 40 The number of German class’ men students 80

Mathematical expression Bayes rule - Example Probability of cancer 1%, Probability of detection of cancer 90% Then, Somebody get the positive of the detection of cancer, What’s the percent that he got the cancer Precondition Event Mathematical expression Accuracy of the detection Develop cancer Result of detection : Positive P(Result: Positive | develop cancer) = 0.90 Question P( Developing Cancer | Result: Positive) = ?

Conditional Probability Women (A) = 80 Men (A’) = 120 French (B) = 90 50 40 German (B’) = 110 30 80 𝑃 𝑊𝑜𝑚𝑒𝑛=𝐴 = 80 200 𝑃 𝑀𝑒𝑛=𝐴′ = 120 190 𝑃 𝐹𝑟𝑒𝑛𝑐ℎ=𝐵 = 90 200 𝑃 𝐺𝑒𝑟𝑚𝑎𝑛=𝐵′ = 110 190 𝑇ℎ𝑒 𝑝𝑟𝑜𝑏𝑎𝑏𝑖𝑙𝑖𝑡𝑦 𝑡ℎ𝑎𝑡 𝑤𝑜𝑚𝑒𝑛 𝑠𝑡𝑢𝑑𝑒𝑛𝑡𝑠 𝑎𝑡𝑡𝑒𝑛𝑑𝑒𝑑 𝐹𝑟𝑒𝑛𝑐ℎ 𝑐𝑙𝑎𝑠𝑠 𝑃 𝐹𝑟𝑒𝑛𝑐ℎ|𝑊𝑜𝑚𝑒𝑛 = 50 80 The Probability that men students attended French class 𝑃 𝐺𝑒𝑟𝑚𝑎𝑛|𝑀𝑒𝑛 = 40 120 The Probability that attended German class in the women 𝑃 𝑊𝑜𝑚𝑒𝑛|𝐺𝑒𝑟𝑚𝑎𝑛 = 30 110 The Probability that attended French class in the women 𝑃 𝑀𝑒𝑛|𝐹𝑟𝑒𝑛𝑐ℎ = 40 90 The number of women students 80 The number of men students 120 The number of French Class’ students 90 The number of German Class’ students 110 The number of French class’ women students 50 The number of German class’ women students 30 The number of French class’ men students 40 The number of German class’ men students 80

Bayes Classifier

Terminology State of nature ω (class label): e.g., ω1 for sea bass, ω2 for salmon Probabilities P(ω1 ) and P(ω2 ) (priors): e.g., prior knowledge of how likely is to get a sea bass or a salmon Probability density function p(x) (evidence): e.g., how frequently we will measure a pattern with feature value x (e.g., x corresponds to length)

Terminology (cont’d) Conditional probability density p(x|ωj ) (likelihood) : e.g., how frequently we will measure a pattern with feature value x given that the pattern belongs to class ωj

Terminology (cont’d) Conditional probability P(ωj |x) (posterior) : e.g., the probability that the fish belongs to class ωj given feature x. Ultimately, we are interested in computing P(ωj |x) for each class ωj

Decision Rule Using Prior Probabilities Only Decide ω1 if P(ω1 ) > P(ω2 ); otherwise decide ω2 Probability of error Favors the most likely class. This rule will be making the same decision all times. – i.e., optimum if no other information is available

Conditional Probabilities Can we improve the decision? Use the length measurement of a fish Define 𝑝(𝑥|𝜔j) as the conditional probability density Probability of x given that the state of nature is 𝜔𝑗 for j=1,2 𝑝(𝑥|𝜔1) and 𝑝(𝑥|𝜔2) describe the difference in length between populations of sea bass and salmon

Posterior Probabilities Suppose we know 𝑝(𝜔𝑗) and 𝑝(𝑥|𝜔𝑗) for j=1,2, and measure the length of a fish (x) Define 𝑝(𝜔𝑗 |𝑥) as the a posteriori probability Probability of the state of nature being 𝑤𝑗 given the measurement of feature value x Use Bayes’ rule to convert the prior probability to the posterior probability

Decision Rule Using Conditional Probabilities Using Bayes’ rule:

Probability of Error The probability of error is defined as: What is the average probability error? The Bayes’ rule minimizes the average probability error!

Example: Fish Sorting Known knowledge If r.v. is 𝑁(𝜇, 𝜎2) , density is Salmon’s length has distribution N(5,1) Sea bass’ length has distribution N(10,4) If r.v. is 𝑁(𝜇, 𝜎2) , density is Class conditional densities are

Example: Fish Sorting Fix length Likelihood function

Example: Fish Sorting Suppose a fish has length 7 How do we classify it?

Example: Fish Sorting Choose class which maximizes likelihood Decision Boundary

Example: Fish Sorting Prior P(salmon) = 2/3 P(bass) = 1/3 With the addition of prior to the previous model, how should we classify a fish of length 7?

Example: Fish Sorting Bayes Decision Rule Posterior Decision boundary Likelihood functions: p(length | salmon), p(length | bass) Priors: P(salmon), P(bass) Posterior Decision boundary 𝑃 𝑠𝑎𝑙𝑚𝑜𝑛 𝑙𝑒𝑛𝑔𝑡ℎ) ? 𝑃 𝑏𝑎𝑠𝑠 𝑙𝑒𝑛𝑔𝑡ℎ)

Covariance

Covariance Correlation analysis is necessary to analyze the two variables. That is used to quantify the association between two continuous variables (e.g., between an independent and a dependent variable or between two independent variables). Correlation analysis can be said in three relationships as shown below. The most important value that can be expressed numerically is the covariance.

Covariance Covariance is a measure of the joint variability of two random variables. The sign of the covariance shows the tendency in the linear relationship between the variables. The formula is: 𝐶𝑜𝑣 𝑥,𝑦 = 𝑖=1 𝑛 (𝑥 𝑖 − 𝑚 𝑥 ) (𝑦 𝑖 − 𝑚 𝑦 ) 𝑛 where: 𝑥 and y are random variables 𝑚 𝑥 is the expected value (the mean) of the random variable 𝑥 and 𝑚 𝑦 is the expected value (the mean) of the random variable 𝑦 𝑛 is the number of items in the data set

Example of Covariance The magnitude of the covariance is not easy to interpret because it is not normalized and hence depends on the magnitudes of the variables. Play Study Grade Play - E(p) Study - E(s) Grade - E(g) (Play - E(p))* (Study - E(s)) (Grade - E(g)) (Study - E(s))* Person 1 12 1 15 4.2 -4.4 -35.4 -18.48 -148.68 155.76 Person 2 9 5 50 1.2 -0.4 -0.48 0.16 Person 3 10 3 22 2.2 -2.4 -28.4 -5.28 -62.48 68.16 Person 4 6 8 72 -1.8 2.6 21.6 -4.68 -38.88 56.16 Person 5 2 93 -5.8 4.6 42.6 -26.68 -247.08 195.96 Avg 7.8 5.4 50.4 -11.12 -99.52 95.24

Example of Covariance The magnitude of the covariance is not easy to interpret because it is not normalized and hence depends on the magnitudes of the variables. Height Weight Vision Height - E(h) Weight - E(w) Vision - E(v) (Height - E(h))* (Weight - E(w)) (Height - E(h)) *(Vision - E(v)) Person 1 1.77 75 1.7 0.046 7 0.24 0.322 0.01104 1.68 Person 2 1.69 63 1.5 -0.034 -5 0.04 0.17 -0.00136 -0.2 Person 3 54 1.6 -0.024 -14 0.14 0.336 -0.00336 -1.96 Person 4 1.74 71 1.1 0.016 3 -0.36 0.048 -0.00576 -1.08 Person 5 1.72 77 1.4 -0.004 9 -0.06 -0.036 0.00024 -0.54 Avg 1.724 68 1.46 0.168 0.00016 -0.42

Correlation coefficient

Example of Covariance The covariance matrix is the covariance value of the combination of all variables. 𝐶 𝑎,𝑎 is 𝑉 𝑎 (variance of a) The correlation coefficient is the normalized covariance. Covariance Play Study Grade 15.2 -11.12 -99.52 13.3 95.24 1085.3 Covariance Height Weight Vision 0.00103 0.168 0.00016 90 -0.42 0.053 Correlation Play Study Grade 1 -0.78209 -0.77484 0.792718 Correlation Height Weight Vision 1 0.551784 0.021655 -0.1923

Euclidean distance Mahalanobis distance

Euclidian distance

Euclidian distance

Euclidian norm

Euclidian distance Euclidian distance has some limitations in real datasets, which often have covariance

Mahalanobis distance

Mahalanobis distance

Mahalanobis distance

Use of Covariance Covariance is not used as it is, but it is used to calculate the correlation coefficient, dimension reduction and Mahalanobis distance. Dimension reduction and Mahalanobis distance is calculated by the covariance matrix.

0.32 0.7 -0.45 -0.81 0.3645 0.2025 0.6561 0.41 1.2 -0.36 -0.31 0.1116 0.1296 0.0961 0.75 1.6 -0.02 0.09 -0.0018 0.0004 0.0081 2.13 0.62 -0.0124 0.3844 0.65 0.6 -0.12 -0.91 0.1092 0.0144 0.8281 1.05 2.17 0.28 0.66 0.1848 0.0784 0.4356 0.53 1.5 -0.24 -0.01 0.0024 0.0576 0.0001 2.1 0.59 -0.0708 0.3481 0.85 1.35 0.08 -0.16 -0.0128 0.0064 0.0256 0.5 -0.27 -0.0243 0.0729 0.12 -0.65 0.5265 0.4225 -0.15 0.0015 0.0225 0.48 -0.29 0.2639 0.0841 0.4 1.1 -0.37 -0.41 0.1517 0.1369 0.1681 1.01 2.5 0.24 0.99 0.2376 0.9801 0.91 2 0.14 0.49 0.0686 0.0196 0.2401

Mean 0.77 1.51 0.11936 0.10652 0.31893 Covariance matrix S Inverse of S 16.1674 -6.0508 -6.05076 5.40006 a b Euclidian distance 0.17 0.71 -0.6 -0.8 1 2.31 0.8 Mahalanobis distance -4.86 -0.69 3.467581484 1.86214 -14.54 7.9505 15.08503124 3.88395 Mean 0.77 1.51 0.11936 0.10652 0.31893 Covariance matrix S Inverse of S 16.1674 -6.0508 -6.05076 5.40006 a b Euclidian distance 0.17 0.71 -0.6 -0.8 1 2.31 0.8 Mahalanobis distance -4.86 -0.69 3.467581484 1.86214 -14.54 7.9505 15.08503124 3.88395 Mean 0.77 1.51 0.11936 0.10652 0.31893 Covariance matrix S Inverse of S 16.1674 -6.0508 -6.05076 5.40006 a b Euclidian distance 0.17 0.71 -0.6 -0.8 1 2.31 0.8 Mahalanobis distance -4.86 -0.69 3.467581484 1.86214 -14.54 7.9505 15.08503124 3.88395 Mean 0.77 1.51 0.11936 0.10652 0.31893 Covariance matrix S MMINVERSE Inverse of S 16.1674 -6.0508 -6.05076 5.40006 Euclidian distance TRANSPOSE a 0.17 0.71 -0.6 -0.8 1 b 2.31 0.8 MMULT Mahalanobis distance -4.86 -0.69 3.467581484 1.86214 -14.54 7.9505 15.08503124 3.88395

Thanks Any Questions?