Introduction to Probability and Bayesian Decision Making Soo-Hyung Kim Department of Computer Science Chonnam National University
2 Bayesian Decision Making Definition of Probability Conditional Probability Bayes’ Theorem Probability Distribution Gaussian Random Variable Naïve Bayesian Decision References
3 Bayesian Decision Making (1/2) Male/Female Classification Given a priori data of pairs, (168, m ) (146, f ) (173, m ) (160, f ) (157, m ) (156, f ) (163, m ) (159, f ) (162, m ) (149, f ) What is the sex of a people whose height is 160?
4 Bayesian Decision Making (2/2) UCI-Iris Data Classification Given a dataset of 150 tuples of 4 numeric attributes Min Max Mean SD Correlation sepal length: sepal width: petal length: petal width: types of class : Iris Setosa, Iris Versicolour, Iris Virginica What is the class of the data ?
5 UCI-Iris Data
6 Definition of Probability (1/4) Experiment & Probability Experiment = procedure + observation Sample : a possible outcome of an experiment Sample space : set of all samples = { s 1, s 2, …, s N } Event : set of samples (or subset of , A ) Probability : a value associated with an event, P ( A )
7 Definition of Probability (2/4) Various Definitions of Probability, where samples are all equally likely Axiomatic Model
8 Definition of Probability (3/4) Probability Axioms (A. N. Kolmogorov) 1. For any event A, P ( A ) 0 2. P ( ) = 1 3. For any countable collection A 1, A 2, … of mutually exclusive events, P ( A 1 A 2 … ) = P ( A 1 ) + P ( A 2 ) + …
9 Definition of Probability (4/4) Properties of Probability P ( ) =0 P ( A c ) = 1 – P ( A ) P ( A B ) = P ( A ) + P ( B ) – P ( A, B ) If A & B are mutually exclusive, P ( A B ) = P ( A ) + P ( B ) If A B, P ( A ) P ( B )
10 Conditional Probability (1/3) Prob. of event A given the occurrence of event B Independence: if A & B are independent events,
11 Conditional Probability (2/3) Properties of P ( A | B ) 1. For any event A & B, P ( A|B ) 0 2. P ( B|B ) = 1 3. If A=A 1 A 2 … where A 1, A 2, … are mutually exclusive, P ( A|B ) = P ( A 1 |B ) + P ( A 2 |B ) + …
12 Conditional Probability (3/3) Total Probability Law Event space : set { B 1, B 2, …, B m } of events which are mutually exclusive: B i B j = , i j collectively exhaustive: B 1 B 2 … B n = For an event space { B 1, B 2, …, B m } with P ( B i )>0,
13 Bayes’ Theorem (1/2) From the definition of conditional probability, If the set { C 1, C 2, …, C m } is an event space then, from the total probability law,
14 Bayes’ Theorem (2/2) Posterior Probability Example Application A: 기침 질병의 집합 : C = { C 1 ( 독감 ), C 2 ( 고지혈증 ), …, C m ( 폐암 )} 기침하는 환자가 어떤 질병에 걸렸는지 판단 P ( 독감 | 기침 ), P ( 고지혈증 | 기침 ), …, P ( 폐암 | 기침 ) Generalization
15 Probability Distribution Probability Model, P ( ) A function that assigns a probability to each sample Histogram Table Mathematical Formula
16 Random Variable A function that assigns a real value to each element in sample space ( ) X : s i x, where s i , x R If s i =aaraaa, X ( s i ) = 5 (number of a ) Prob. Model for a discrete random variable P K ( k ): probability mass function (PMF) Prob. Model for a continuous random variable f X ( x ): probability density function (PDF)
17 Cumulative Probability Distribution (CDF) F R ( r ) = P R ( R r )
18 PMF vs PDF PMF: P K ( k )= P K ( K=k ) PDF:
19 Gaussian Random Variable (1/6) PDF of a random variable X has a form of is an average; is a standard deviation ( >0)
20 Gaussian Random Variable (2/6) Example #1: 10 pairs of (168, m ) (146, f ) (173, m ) (160, f ) (157, m ) (156, f ) (163, m ) (159, f ) (162, m ) (149, f ) MLE for the PDF of H(Height) for the class m
21 Gaussian Random Variable (3/6) MLE for the PDF of H(Height) for the class f Classification of a people whose height is 160 Classify the data into male (with a probability of 0.59)
23 Gaussian Random Variable (4/6) PDF of a n -D random vector X has a form of X is an average vector; C X is a covariance matrix where c ij = Cov( x i, x j ) = E ( x i x j ) – i j
24 Gaussian Random Variable (5/6) Example #2: UCI–Iris data Learning Phase MLE of PDF for a 4-D R.V. X for individual classes Using a part of the data (e.g., 30 out of 50 samples) Generalization (Testing) Phase Using a sample which are not used in learning If classify x into the class having the maximum posterior probability
25 Gaussian Random Variable (6/6)
26 Naïve Bayesian Decision Accuracy of Bayesian Decision depends on Independence assumption can make it!
27 UCI Data
28 References Textbooks R.D. Yates and D.J. Goodman, Probability and Stochastic Processes, 2 nd ed., Wiley, 송홍엽, 정하봉, 확률과 랜덤변수 및 랜덤과정, 교보문고, R.E. Walpole, et. al., Probability and Statistics for Engineers and Scientist, 7 th ed., Prentice Hall, W. Mendelhall, Probability and Statistics, 12 th ed., Thomson Brooks/Cole, 신양우, 기초확률론, 경문사,