Bayesian Decision Theory (Classification) 主講人：虞台文.

Bayesian Decision Theory (Classification) 主講人：虞台文

Contents Introduction Generalize Bayesian Decision Rule Discriminant Functions The Normal Distribution Discriminant Functions for the Normal Populations. Minimax Criterion Neyman-Pearson Criterion

Bayesian Decision Theory (Classification) Introduction

What is Bayesian Decision Theory? Mathematical foundation for decision making. Using probabilistic approach to help making decision (e.g., classification) so as to minimize the risk (cost).

Preliminaries and Notations a state of nature prior probability feature vector class-conditional density posterior probability

Bayesian Rule

Decision unimportant in making decision unimportant in making decision

Decision Decide  i if P(  i |x) > P(  j |x)  j  i Decide  i if p(x|  i )P(  i ) > p(x|  j )P(  j )  j  i Special cases: 1. P(  1 )=P(  2 )=    =P(  c ) 2. p(x|  1 )=p(x|  2 ) =    = p(x|  c )

Example R2R2 P(  1 )=P(  2 ) R1R1

Example R1R1 R1R1 R2R2 R2R2 P(  1 )=2/3 P(  2 )=1/3 Decide  1 if p(x|  1 )P(  1 ) > p(x|  2 )P(  2 ); otherwise decide  2

Classification Error Consider two categories: Decide  1 if P(  1 |x) > P(  2 |x); otherwise decide  2

Bayesian Decision Theory (Classification) Generalized Bayesian Decision Rule

The Generation a set of c states of nature a set of a possible actions The loss incurred for taking action  i when the true state of nature is  j. We want to minimize the expected loss in making decision. Risk can be zero.

Conditional Risk Given x, the expected loss (risk) associated with taking action  i. Given x, the expected loss (risk) associated with taking action  i.

0/1 Loss Function

Decision Bayesian Decision Rule:

Overall Risk Decision function Bayesian decision rule: the optimal one to minimize the overall risk Its resulting overall risk is called the Bayesian risk

Two-Category Classification Action State of Nature Loss Function

Two-Category Classification Perform  1 if R(  2 |x) > R(  1 |x); otherwise perform  2

Two-Category Classification Perform  1 if R(  2 |x) > R(  1 |x); otherwise perform  2 positive Posterior probabilities are scaled before comparison.

Two-Category Classification irrelevan t Perform  1 if R(  2 |x) > R(  1 |x); otherwise perform  2

Two-Category Classification Perform  1 if Likelihood Ratio Threshold This slide will be recalled later.

Bayesian Decision Theory (Classification) Discriminant Functions

The Multicategory Classification g1(x)g1(x) g1(x)g1(x) g2(x)g2(x) g2(x)g2(x) gc(x)gc(x) gc(x)gc(x) x Action (e.g., classification) (x)(x) Assign x to  i if g i (x) > g j (x) for all j  i. g i (x)’s are called the discriminant functions. How to define discriminant functions?

Simple Discriminant Functions Minimum Risk case: Minimum Error-Rate case: If f( ． ) is a monotonically increasing function, than f(g i ( ． ) )’s are also be discriminant functions.

Decision Regions Two-category example Decision regions are separated by decision boundaries.

Bayesian Decision Theory (Classification) The Normal Distribution

Basics of Probability Discrete random variable (X) － Assume integer Continuous random variable (X) Probability mass function (pmf): Cumulative distribution function (cdf): Probability density function (pdf): Cumulative distribution function (cdf): not a probability

Expectations Let g be a function of random variable X. The k th moment The k th central moments The 1 st moment

Important Expectations Mean Variance Fact:

Entropy The entropy measures the fundamental uncertainty in the value of points selected randomly from a distribution.

Univariate Gaussian Distribution x p(x)p(x) X~N(μ,σ 2 ) μ σσ 2σ 3σ3σ 3σ3σ E[X] = μ Var[X] =σ 2 Properties: 1.Maximize the entropy 2.Central limit theorem

Random Vectors A d-dimensional random vector Vector Mean: Covariance Matrix:

Multivariate Gaussian Distribution X~N(μ,Σ)X~N(μ,Σ) E[X] = μ E[(X -μ ) (X -μ ) T ] = Σ A d-dimensional random vector

Properties of N(μ,Σ) X~N(μ,Σ)X~N(μ,Σ) A d-dimensional random vector Let Y=A T X, where A is a d × k matrix. Y~N(A T μ, A T Σ A)

On Parameters of N(μ,Σ) X~N(μ,Σ)X~N(μ,Σ)

More On Covariance Matrix  is symmetric and positive semidefinite.  : orthonormal matrix, whose columns are eigenvectors of .  : diagonal matrix (eigenvalues).

Whitening Transform X~N(μ,Σ)X~N(μ,Σ) Y=ATXY=ATX Y~N(A T μ, A T Σ A) Let

Whitening Transform X~N(μ,Σ)X~N(μ,Σ) Y=ATXY=ATX Y~N(A T μ, A T Σ A) Let Whitening Projection Linear Transform

Mahalanobis Distance constant r2r2 depends on the value of r 2 X~N(μ,Σ)X~N(μ,Σ)

Bayesian Decision Theory (Classification) Discriminant Functions for the Normal Populations

Minimum-Error-Rate Classification Xi~N(μi,Σi)Xi~N(μi,Σi)

Three Cases: Case 1: Case 2: Case 3: Classes are centered at different mean, and their feature components are pairwisely independent have the same variance. Classes are centered at different mean, but have the same variation. Arbitrary.

Case 1.  i =  2 I irrelevant

Case 1.  i =  2 I

Boundary btw.  i and  j

Case 1.  i =  2 I Boundary btw.  i and  j wTwT w x0x0 x xx0xx0 The decision boundary will be a hyperplane perpendicular to the line btw. the means at somewhere. 0 if P (  i )= P (  j ) midpoint

Case 1.  i =  2 I Minimum distance classifier (template matching)

Case 1.  i =  2 I

Case 2.  i =  Irrelevant if P (  i )= P (  j )  i, j Mahalanobis Distance irrelevant

Case 2.  i =  Irrelevant

Case 2.  i =  w x0x0 x

Case 3.  i   j irrelevant Without this term In Case 1 and 2 Decision surfaces are hyperquadrics, e.g., hyperplanes hyperspheres hyperellipsoids hyperhyperboloids

Case 3.  i   j Non-simply connected decision regions can arise in one dimensions for Gaussians having unequal variance.

Case 3.  i   j

Multi-Category Classification

Bayesian Decision Theory (Classification) Minimax Criterion

Bayesian Decision Rule: Two-Category Classification Decide  1 if Likelihood Ratio Threshold Minimax criterion deals with the case that the prior probabilities are unknown.

Basic Concept on Minimax To choose the worst-case prior probabilities (the maximum loss) and, then, pick the decision rule that will minimize the overall risk. Minimize the maximum possible overall risk.

Overall Risk

The overall risk for a particular P(  1 ). The value depends on the setting of decision boundary The value depends on the setting of decision boundary R(x) = ax + b

Overall Risk = 0 for minimax solution = R mm, minimax risk R(x) = ax + b Independent on the value of P(  i ).

Minimax Risk

Error Probability Use 0/1 loss function

Minimax Error-Probability Use 0/1 loss function P(1|2)P(1|2) P(2|1)P(2|1)

Minimax Error-Probability R1R1 R2R2 11 22 P(1|2)P(1|2) P(2|1)P(2|1)

Bayesian Decision Theory (Classification) Neyman-Pearson Criterion

Bayesian Decision Rule: Two-Category Classification Decide  1 if Likelihood Ratio Threshold Neyman-Pearson Criterion deals with the case that both loss functions and the prior probabilities are unknown.

Signal Detection Theory The theory of signal detection theory evolved from the development of communications and radar equipment the first half of the last century. It migrated to psychology, initially as part of sensation and perception, in the 50's and 60's as an attempt to understand some of the features of human behavior when detecting very faint stimuli that were not being explained by traditional theories of thresholds.

The situation of interest A person is faced with a stimulus (signal) that is very faint or confusing. The person must make a decision, is the signal there or not. What makes this situation confusing and difficult is the presences of other mess that is similar to the signal. Let us call this mess noise.

Example Noise is present both in the environment and in the sensory system of the observer. The observer reacts to the momentary total activation of the sensory system, which fluctuates from moment to moment, as well as responding to environmental stimuli, which may include a signal.

Example A radiologist is examining a CT scan, looking for evidence of a tumor. A Hard job, because there is always some uncertainty. There are four possible outcomes: – hit (tumor present and doctor says "yes'') – miss (tumor present and doctor says "no'') – false alarm (tumor absent and doctor says "yes") – correct rejection (tumor absent and doctor says "no"). Two types of Error

Correct Rejection The Four Cases P(1|1)P(1|1) Miss False Alarms Hit Signal (tumor) Absent (  1 ) Present (  2 ) Decision No (  1 ) Yes (  2 ) P(2|2)P(2|2) P(1|2)P(1|2) P(2|1)P(2|1) Signal detection theory was developed to help us understand how a continuous and ambiguous signal can lead to a binary yes/no decision.

No (  1 ) Yes (  2 ) Decision Making d’d’ Noise 11 Noise + Signal 22 Criterion Hit False Alarm Discriminability Based on expectancy (decision bias) P(2|2)P(2|2) P(2|1)P(2|1)

ROC Curve (Receiver Operating Characteristic) Hit False Alarm P H =P(  2 |  2 ) P FA =P(  2 |  1 )

Neyman-Pearson Criterion False Alarm P FA =P(  2 |  1 ) NP: max. P H subject to P FA ≦ a Hit P H =P(  2 |  2 )

Likelihood Ratio Test where T is a threshold that meets the P FA constraint ( ≦ a). How to determine T?

Likelihood Ratio Test PHPH P FA R2R2 R1R1

Neyman-Pearson Lemma Consider the aforementioned rule with T chosen to give P FA (  ) = a. There is no decision rule  ’ such that P FA (  ’ )  a and P H (  ’ ) > P H (  ). Pf) Let  ’ be a decision rule with =1  0 0 > 0> 0

Neyman-Pearson Lemma Consider the aforementioned rule with T chosen to give P FA (  ) = a. There is no decision rule  ’ such that P FA (  ’ ) ≦ a and P H (  ’ ) > P H (  ). Pf) Let  ’ be a decision rule with =0 00 00 

OK Neyman-Pearson Lemma Consider the aforementioned rule with T chosen to give P FA (  ) = a. There is no decision rule  ’ such that P FA (  ’ ) ≦ a and P H (  ’ ) > P H (  ). Pf) Let  ’ be a decision rule with 00

Bayesian Decision Theory (Classification) 主講人：虞台文.

Similar presentations

Presentation on theme: "Bayesian Decision Theory (Classification) 主講人：虞台文."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Bayesian Decision Theory (Classification) 主講人：虞台文.

Similar presentations

Presentation on theme: "Bayesian Decision Theory (Classification) 主講人：虞台文."— Presentation transcript:

Similar presentations

About project

Feedback