Download presentation
Presentation is loading. Please wait.
1
Pattern Recognition and Image Analysis
Dr. Manal Helal – Fall 2014 Lecture 3
2
BAYES DECISION THEORY In Action 2
3
Recap Example
4
Example (cont.)
5
Assign colours to objects.
Example (cont.) Assign colours to objects.
6
Example (cont.)
7
Example (cont.)
8
Example (cont.)
9
Assign colour to pen objects.
Example (cont.) Assign colour to pen objects.
10
Example (cont.)
11
Assign colour to paper objects.
Example (cont.) Assign colour to paper objects.
12
Example (cont.)
13
Example (cont.)
14
Bayes Discriminate Functions
15
Bayes Discriminate Functions
Bayes Discriminate Functions (gi(x), i = 1, , c), assigns feature vector x to class 1 … c 1. Minimum Error Rate Classification 2. Minimum Risk Classification Special Cases for 3) Euclidean Distance and 4) Mahalanobis Distance Discriminate Functions given last week. Other geometric functions are introduced in the following slides and many others in literature.
16
DISCRIMINANT FUNCTIONS 5. DECISION SURFACES
If are contiguous: is the surface separating the regions. On one side is positive (+), on the other is negative (-). It is known as Decision Surface + -
17
If f(.) monotonic, the rule remains the same if we use:
is a discriminant function In general, discriminant functions can be defined independent of the Bayesian rule. They lead to suboptimal solutions, yet if chosen appropriately, can be computationally more tractable.
18
Case 5: Decision Surface
19
6. BAYESIAN CLASSIFIER FOR NORMAL DISTRIBUTIONS
Multivariate Gaussian pdf called covariance matrix
20
is monotonic. Define: Example:
21
quadrics, ellipsoids, parabolas, hyperbolas, pairs of lines.
That is, is quadratic and the surfaces quadrics, ellipsoids, parabolas, hyperbolas, pairs of lines. For example:
22
Case 6: Hyper-planes
23
Case 7: Arbitrary
24
EXAMPLE:
25
EXAMPLE (cont.):
26
Find the discriminant function for the first class.
EXAMPLE (cont.): Find the discriminant function for the first class.
27
Find the discriminant function for the first class.
EXAMPLE (cont.): Find the discriminant function for the first class.
28
Similarly, find the discriminant function for the second class.
EXAMPLE (cont.): Similarly, find the discriminant function for the second class.
29
The decision boundary:
EXAMPLE (cont.): The decision boundary:
30
The decision boundary:
EXAMPLE (cont.): The decision boundary:
31
EXAMPLE (cont.): >> s = 'x^2-10*x-4*x*y+8*y-1+2*log(2)';
Using MATLAB we can draw the decision boundary: (to draw the decision boundary in MATLAB) >> s = 'x^2-10*x-4*x*y+8*y-1+2*log(2)'; >> ezplot(s)
32
Using MATLAB we can draw the decision boundary:
EXAMPLE (cont.): Using MATLAB we can draw the decision boundary:
33
EXAMPLE (cont.):
34
Voronoi Tessellation
35
Receiver Operating Characteristics
Another measure of distance between two Gaussian distributions. found a great use in medicine, radar detection and other fields.
36
Receiver Operating Characteristics
37
Receiver Operating Characteristics
38
Receiver Operating Characteristics
39
Receiver Operating Characteristics
If both diagnosis and test are positive, it is called a true positive. The probability of a TP to occur is estimated by counting the true positives in the sample and divide by the sample size. If the diagnosis is positive and the test is negative it is called a false negative (FN). False positive (FP) and true negative (TN) are defined similarly.
40
Receiver Operating Characteristics
The values described are used to calculate different measurements of the quality of the test. The first one is sensitivity, SE, which is the probability of having a positive test among the patients who have a positive diagnosis.
41
Receiver Operating Characteristics
Specificity, SP, is the probability of having a negative test among the patients who have a negative diagnosis.
42
Receiver Operating Characteristics
Example:
43
Receiver Operating Characteristics
Example (cont.):
44
Receiver Operating Characteristics
Overlap in distributions:
45
BAYESIAN NETWORKS Bayes Probability Chain Rule
Assume now that the conditional dependence for each xi is limited to a subset of the features appearing in each of the product terms. That is: where
46
For example, if ℓ=6, then we could assume:
The above is a generalization of the Naïve – Bayes. For the Naïve – Bayes the assumption is: Ai = Ø, for i=1, 2, …, ℓ
47
A graphical way to portray conditional dependencies is given below
According to this figure we have that: x6 is conditionally dependent on x4, x5. x5 on x4 x4 on x1, x2 x3 on x2 x1, x2 are conditionally independent on other variables. For this case:
48
A Bayesian Network is specified by:
Bayesian Networks Definition: A Bayesian Network is a directed acyclic graph (DAG) where the nodes correspond to random variables. Each node is associated with a set of conditional probabilities (densities), p(xi|Ai), where xi is the variable associated with the node and Ai is the set of its parents in the graph. A Bayesian Network is specified by: The marginal probabilities of its root nodes. The conditional probabilities of the non-root nodes, given their parents, for ALL possible combinations.
49
The figure below is an example of a Bayesian Network corresponding to a paradigm from the medical applications field. This Bayesian network models conditional dependencies for an example concerning smokers (S), tendencies to develop cancer (C) and heart disease (H), together with variables corresponding to heart (H1, H2) and cancer (C1, C2) medical tests.
50
Once a DAG has been constructed, the joint probability can be obtained by multiplying the marginal (root nodes) and the conditional (non-root nodes) probabilities. Training: Once a topology is given, probabilities are estimated via the training data set. There are also methods that learn the topology. Probability Inference: This is the most common task that Bayesian networks help us to solve efficiently. Given the values of some of the variables in the graph, known as evidence, the goal is to compute the conditional probabilities for some of the other variables, given the evidence.
51
Example: Consider the Bayesian network of the figure:
P(y1) = P(y1|x1) * P(x1) + P(y1|x0) * P(X0) P(y1) = 0.40* * 0.40 P(y1) = = 0.36 a) If x is measured to be x=1 (x1), compute P(w=0|x=1) [P(w0|x1)]. b) If w is measured to be w=1 (w1) compute P(x=0|w=1) [ P(x0|w1)].
52
For a), a set of calculations are required that propagate from node x to node w. It turns out that P(w0|x1) = 0.63. For b), the propagation is reversed in direction. It turns out that P(x0|w1) = 0.4. In general, the required inference information is computed via a combined process of “message passing” among the nodes of the DAG. Complexity: For singly connected graphs, message passing algorithms amount to a complexity linear in the number of nodes.
53
Practical Labs On Moodle you will find two Baysian Classification examples: Image Classification Text Classification
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.