Pattern Recognition and Image Analysis

Slides:



Advertisements
Similar presentations
CS188: Computational Models of Human Behavior
Advertisements

Component Analysis (Review)
LECTURE 11: BAYESIAN PARAMETER ESTIMATION
Pattern Classification. Chapter 2 (Part 1): Bayesian Decision Theory (Sections ) Introduction Bayesian Decision Theory–Continuous Features.
Pattern Classification Chapter 2 (Part 2)0 Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O.
Bayesian Decision Theory Chapter 2 (Duda et al.) – Sections
Lecture 20 Object recognition I
0 Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Decision Theory Naïve Bayes ROC Curves
1 lBayesian Estimation (BE) l Bayesian Parameter Estimation: Gaussian Case l Bayesian Parameter Estimation: General Estimation l Problems of Dimensionality.
Today Logistic Regression Decision Trees Redux Graphical Models
Bayesian Networks Alan Ritter.
CPSC 422, Lecture 18Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18 Feb, 25, 2015 Slide Sources Raymond J. Mooney University of.
Machine Learning CUNY Graduate Center Lecture 21: Graphical Models.
METU Informatics Institute Min 720 Pattern Classification with Bio-Medical Applications PART 2: Statistical Pattern Classification: Optimal Classification.
Methods in Medical Image Analysis Statistics of Pattern Recognition: Classification and Clustering Some content provided by Milos Hauskrecht, University.
Principles of Pattern Recognition
ECE 8443 – Pattern Recognition LECTURE 06: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Bias in ML Estimates Bayesian Estimation Example Resources:
Lecture 2: Bayesian Decision Theory 1. Diagram and formulation
ECE 8443 – Pattern Recognition LECTURE 03: GAUSSIAN CLASSIFIERS Objectives: Normal Distributions Whitening Transformations Linear Discriminants Resources.
Sergios Theodoridis Konstantinos Koutroumbas Version 2
ECE 8443 – Pattern Recognition Objectives: Error Bounds Complexity Theory PAC Learning PAC Bound Margin Classifiers Resources: D.M.: Simplified PAC-Bayes.
An Introduction to Support Vector Machine (SVM) Presenter : Ahey Date : 2007/07/20 The slides are based on lecture notes of Prof. 林智仁 and Daniel Yeung.
Empirical Research Methods in Computer Science Lecture 7 November 30, 2005 Noah Smith.
URL:.../publications/courses/ece_8443/lectures/current/exam/2004/ ECE 8443 – Pattern Recognition LECTURE 15: EXAM NO. 1 (CHAP. 2) Spring 2004 Solutions:
Classification Heejune Ahn SeoulTech Last updated May. 03.
1 E. Fatemizadeh Statistical Pattern Recognition.
Chapter 4: Pattern Recognition. Classification is a process that assigns a label to an object according to some representation of the object’s properties.
Ch 8. Graphical Models Pattern Recognition and Machine Learning, C. M. Bishop, Revised by M.-O. Heo Summarized by J.W. Nam Biointelligence Laboratory,
Marginalization & Conditioning Marginalization (summing out): for any sets of variables Y and Z: Conditioning(variant of marginalization):
Covariance matrices for all of the classes are identical, But covariance matrices are arbitrary.
Chapter 20 Classification and Estimation Classification – Feature selection Good feature have four characteristics: –Discrimination. Features.
ECE 471/571 – Lecture 3 Discriminant Function and Normal Density 08/27/15.
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Classification COMP Seminar BCB 713 Module Spring 2011.
Linear Classifier Team teaching.
Computer Vision Lecture 7 Classifiers. Computer Vision, Lecture 6 Oleh Tretiak © 2005Slide 1 This Lecture Bayesian decision theory (22.1, 22.2) –General.
ECES 690 – Statistical Pattern Recognition Lecture 2- Density Estimation / Linear Classifiers Andrew R. Cohen 1.
Lecture 2. Bayesian Decision Theory
LECTURE 09: BAYESIAN ESTIMATION (Cont.)
Multivariate Methods Slides from Machine Learning by Ethem Alpaydin
Today.
LECTURE 10: DISCRIMINANT ANALYSIS
ECES 481/681 – Statistical Pattern Recognition
Read R&N Ch Next lecture: Read R&N
Bell & Coins Example Coin1 Bell Coin2
Chapter 3: Maximum-Likelihood and Bayesian Parameter Estimation (part 2)
LECTURE 05: THRESHOLD DECODING
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Read R&N Ch Next lecture: Read R&N
Hidden Markov Models Part 2: Algorithms
CSSE463: Image Recognition Day 17
LECTURE 05: THRESHOLD DECODING
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
The basics of Bayes decision theory
Pattern Recognition and Machine Learning
Class #19 – Tuesday, November 3
Generally Discriminant Analysis
Bayesian Classification
Class #16 – Tuesday, October 26
LECTURE 09: DISCRIMINANT ANALYSIS
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Hankz Hankui Zhuo Bayesian Networks Hankz Hankui Zhuo
LECTURE 05: THRESHOLD DECODING
Read R&N Ch Next lecture: Read R&N
Chapter 3: Maximum-Likelihood and Bayesian Parameter Estimation (part 2)
Hairong Qi, Gonzalez Family Professor
ECE – Pattern Recognition Lecture 4 – Parametric Estimation
Hairong Qi, Gonzalez Family Professor
Presentation transcript:

Pattern Recognition and Image Analysis Dr. Manal Helal – Fall 2014 Lecture 3

BAYES DECISION THEORY In Action 2

Recap Example

Example (cont.)

Assign colours to objects. Example (cont.) Assign colours to objects.

Example (cont.)

Example (cont.)

Example (cont.)

Assign colour to pen objects. Example (cont.) Assign colour to pen objects.

Example (cont.)

Assign colour to paper objects. Example (cont.) Assign colour to paper objects.

Example (cont.)

Example (cont.)

Bayes Discriminate Functions

Bayes Discriminate Functions Bayes Discriminate Functions (gi(x), i = 1, . . . , c), assigns feature vector x to class 1 … c 1. Minimum Error Rate Classification 2. Minimum Risk Classification Special Cases for 3) Euclidean Distance and 4) Mahalanobis Distance Discriminate Functions given last week. Other geometric functions are introduced in the following slides and many others in literature.

DISCRIMINANT FUNCTIONS 5. DECISION SURFACES If are contiguous: is the surface separating the regions. On one side is positive (+), on the other is negative (-). It is known as Decision Surface + -

If f(.) monotonic, the rule remains the same if we use: is a discriminant function In general, discriminant functions can be defined independent of the Bayesian rule. They lead to suboptimal solutions, yet if chosen appropriately, can be computationally more tractable.

Case 5: Decision Surface

6. BAYESIAN CLASSIFIER FOR NORMAL DISTRIBUTIONS Multivariate Gaussian pdf called covariance matrix

is monotonic. Define: Example:

quadrics, ellipsoids, parabolas, hyperbolas, pairs of lines. That is, is quadratic and the surfaces quadrics, ellipsoids, parabolas, hyperbolas, pairs of lines. For example:

Case 6: Hyper-planes

Case 7: Arbitrary

EXAMPLE:

EXAMPLE (cont.):

Find the discriminant function for the first class. EXAMPLE (cont.): Find the discriminant function for the first class.

Find the discriminant function for the first class. EXAMPLE (cont.): Find the discriminant function for the first class.

Similarly, find the discriminant function for the second class. EXAMPLE (cont.): Similarly, find the discriminant function for the second class.

The decision boundary: EXAMPLE (cont.): The decision boundary:

The decision boundary: EXAMPLE (cont.): The decision boundary:

EXAMPLE (cont.): >> s = 'x^2-10*x-4*x*y+8*y-1+2*log(2)'; Using MATLAB we can draw the decision boundary: (to draw the decision boundary in MATLAB) >> s = 'x^2-10*x-4*x*y+8*y-1+2*log(2)'; >> ezplot(s)

Using MATLAB we can draw the decision boundary: EXAMPLE (cont.): Using MATLAB we can draw the decision boundary:

EXAMPLE (cont.):

Voronoi Tessellation

Receiver Operating Characteristics Another measure of distance between two Gaussian distributions. found a great use in medicine, radar detection and other fields.

Receiver Operating Characteristics

Receiver Operating Characteristics

Receiver Operating Characteristics

Receiver Operating Characteristics If both diagnosis and test are positive, it is called a true positive. The probability of a TP to occur is estimated by counting the true positives in the sample and divide by the sample size. If the diagnosis is positive and the test is negative it is called a false negative (FN). False positive (FP) and true negative (TN) are defined similarly.

Receiver Operating Characteristics The values described are used to calculate different measurements of the quality of the test. The first one is sensitivity, SE, which is the probability of having a positive test among the patients who have a positive diagnosis.

Receiver Operating Characteristics Specificity, SP, is the probability of having a negative test among the patients who have a negative diagnosis.

Receiver Operating Characteristics Example:

Receiver Operating Characteristics Example (cont.):

Receiver Operating Characteristics Overlap in distributions:

BAYESIAN NETWORKS Bayes Probability Chain Rule Assume now that the conditional dependence for each xi is limited to a subset of the features appearing in each of the product terms. That is: where

For example, if ℓ=6, then we could assume: The above is a generalization of the Naïve – Bayes. For the Naïve – Bayes the assumption is: Ai = Ø, for i=1, 2, …, ℓ

A graphical way to portray conditional dependencies is given below According to this figure we have that: x6 is conditionally dependent on x4, x5. x5 on x4 x4 on x1, x2 x3 on x2 x1, x2 are conditionally independent on other variables. For this case:

A Bayesian Network is specified by: Bayesian Networks Definition: A Bayesian Network is a directed acyclic graph (DAG) where the nodes correspond to random variables. Each node is associated with a set of conditional probabilities (densities), p(xi|Ai), where xi is the variable associated with the node and Ai is the set of its parents in the graph. A Bayesian Network is specified by: The marginal probabilities of its root nodes. The conditional probabilities of the non-root nodes, given their parents, for ALL possible combinations.

The figure below is an example of a Bayesian Network corresponding to a paradigm from the medical applications field. This Bayesian network models conditional dependencies for an example concerning smokers (S), tendencies to develop cancer (C) and heart disease (H), together with variables corresponding to heart (H1, H2) and cancer (C1, C2) medical tests.

Once a DAG has been constructed, the joint probability can be obtained by multiplying the marginal (root nodes) and the conditional (non-root nodes) probabilities. Training: Once a topology is given, probabilities are estimated via the training data set. There are also methods that learn the topology. Probability Inference: This is the most common task that Bayesian networks help us to solve efficiently. Given the values of some of the variables in the graph, known as evidence, the goal is to compute the conditional probabilities for some of the other variables, given the evidence.

Example: Consider the Bayesian network of the figure: P(y1) = P(y1|x1) * P(x1) + P(y1|x0) * P(X0) P(y1) = 0.40* 0.60+ 0.30* 0.40 P(y1) = 0.24 + 0.12 = 0.36 a) If x is measured to be x=1 (x1), compute P(w=0|x=1) [P(w0|x1)]. b) If w is measured to be w=1 (w1) compute P(x=0|w=1) [ P(x0|w1)].

For a), a set of calculations are required that propagate from node x to node w. It turns out that P(w0|x1) = 0.63. For b), the propagation is reversed in direction. It turns out that P(x0|w1) = 0.4. In general, the required inference information is computed via a combined process of “message passing” among the nodes of the DAG. Complexity: For singly connected graphs, message passing algorithms amount to a complexity linear in the number of nodes.

Practical Labs On Moodle you will find two Baysian Classification examples: Image Classification Text Classification