Download presentation
Presentation is loading. Please wait.
Published byAnn Watson Modified over 8 years ago
1
Ch 1. Introduction (Latter) Pattern Recognition and Machine Learning, C. M. Bishop, 2006. Summarized by J.W. Ha Biointelligence Laboratory, Seoul National University http://bi.snu.ac.kr/
2
2(C) 2006, SNU Biointelligence Lab, http://bi.snu.ac.kr/http://bi.snu.ac.kr/Contents 1.4 The Curse of Dimensionality 1.5 Decision Theory 1.6 Information Theroy
3
3(C) 2006, SNU Biointelligence Lab, http://bi.snu.ac.kr/http://bi.snu.ac.kr/ 1.4 The Curse of Dimensionality The High Dimensionality Problem Ex. Mixture of Oil, Water, Gas - 3-Class (Homogeneous, Annular, Laminar) - 12 Input Variables - Scatter Plot of x6, x7 - Predict Point X - Simple and Naïve Approach
4
4(C) 2006, SNU Biointelligence Lab, http://bi.snu.ac.kr/http://bi.snu.ac.kr/ 1.4 The Curse of Dimensionality (Cont’d) The Shortcomings of Naïve Approach - The number of cells increase exponentially. - Needs a large training data set for cells not to be empty.
5
5(C) 2006, SNU Biointelligence Lab, http://bi.snu.ac.kr/http://bi.snu.ac.kr/ 1.4 The Curse of Dimensionality (Cont’d) Polynomial Curve Fitting Method(M Order) - Althogh D increases, it grows propotionally to D m The Volume of High Dimensional Sphere - Concentrated in a thin shell near the space
6
6(C) 2006, SNU Biointelligence Lab, http://bi.snu.ac.kr/http://bi.snu.ac.kr/ 1.4 The Curse of Dimensionality (Cont’d) Gaussian Distribution
7
7(C) 2006, SNU Biointelligence Lab, http://bi.snu.ac.kr/http://bi.snu.ac.kr/ 1.5 Decision Theory Make Optimal Decisions - Inference Step & Decision Step - Select Higher Posterior Probability Minimizing the Misclassification Rate - Object: → Minimizing Colored Area
8
8(C) 2006, SNU Biointelligence Lab, http://bi.snu.ac.kr/http://bi.snu.ac.kr/ 1.5 Decision Theory (Cont’d) Minimizing the Expected Loss - Class 마다 Missclassification 의 Damage 가 다르다. - Introduction of Loss Function(Cost Function) - Object : Minimizing Expected Loss The Reject Option - Threshold θ - Reject if θ > Posterior Prob.
9
9(C) 2006, SNU Biointelligence Lab, http://bi.snu.ac.kr/http://bi.snu.ac.kr/ 1.5 Decision Theory (Cont’d) Inference and Decision - Three Distinct Approach 1. Obtain Posterior Probability & Generative Models - Obtain data distribution by Caculating p(x|C k ) for each class - Obtain p(C k ), p(x) to get p(C k |x) in Bayesian Rule - Can generate synthetic data points - Overheads of Calculation
10
10(C) 2006, SNU Biointelligence Lab, http://bi.snu.ac.kr/http://bi.snu.ac.kr/ 2. Discriminative Models using Posterior - Obtain Posterior Directly - Classify the class for new input data - In case that classification is needed only 3. Discriminative Function - Maps input x to class directly 1.5 Decision Theory (Cont’d)
11
11(C) 2006, SNU Biointelligence Lab, http://bi.snu.ac.kr/http://bi.snu.ac.kr/ Why do we compute the posterior? 1. Minimizing Risk - Frequently changed Loss Matrix 2. Reject Option 3. Compensating for Class Priors - In case of large difference between the probablities of each class - Posterior is proportional to prior 4. Combining Models - Seprate subproblem and Obtain each posterior 1.5 Decision Theory (Cont’d)
12
12(C) 2006, SNU Biointelligence Lab, http://bi.snu.ac.kr/http://bi.snu.ac.kr/ 1.5 Decision Theory (Cont’d) Loss Function for Regression - Multiple Target Variable Vector
13
13(C) 2006, SNU Biointelligence Lab, http://bi.snu.ac.kr/http://bi.snu.ac.kr/ Minkowski Loss 1.5 Decision Theory (Cont’d)
14
14(C) 2006, SNU Biointelligence Lab, http://bi.snu.ac.kr/http://bi.snu.ac.kr/ 1.6 Information Theory Entropy - Low probability events corresponds to high information content.( h(x) = -log 2 p(x) ) - Expectaion value of information content. - Higher Entropy, Lager Uncertainty
15
15(C) 2006, SNU Biointelligence Lab, http://bi.snu.ac.kr/http://bi.snu.ac.kr/ 1.6 Information Theory (Cont’d) Maximum Entropy Configuration for Continuous Variable - The distribution that maximize the differential entropy is the Gaussian Conditional Entropy : H[x,y] = H[y|x] + H[x] - Adopt Lagrange multipliers to obtain maximum entropy
16
16(C) 2006, SNU Biointelligence Lab, http://bi.snu.ac.kr/http://bi.snu.ac.kr/ 1.6 Information Theory (Cont’d) Relative Entropy [Kullback-Leibler divergence] Convexity Function (Jensen’s Inequality) - Predict unknown distribution p(x) with an approaxiamting distribution q(x)
17
17(C) 2006, SNU Biointelligence Lab, http://bi.snu.ac.kr/http://bi.snu.ac.kr/ Mutual Information 1.6 Information Theory (Cont’d) - I[x, y] = H[x] – H[x|y] = H[y] – H[y|x] - If x and y are independent, I[x,y] = 0 - the Reduction in the uncertainty about x by virtue of being told the value of y - Relative Entropy between the joint distribution and the product of the marginals
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.