Ch 1. Introduction Pattern Recognition and Machine Learning, C. M. Bishop, Summarized by J.W. Ha Biointelligence Laboratory, Seoul National University
2(C) 2006, SNU Biointelligence Lab, The Curse of Dimensionality 1.5 Decision Theory 1.6 Information Theroy
3(C) 2006, SNU Biointelligence Lab, The Curse of Dimensionality The High Dimesionality Problem Ex. Mixture of Oil, Water, Gas - 3-Class (Homogeneous, Annular, Laminar) - 12 Input Variables - Scatter Plot of x6, x7 - Predict Point X - Simple and Naïve Approach
4(C) 2006, SNU Biointelligence Lab, The Curse of Dimensionality (Cont’d) The Shortcomings of Naïve Approach - The number of cells increase exponentially. - Needs a large training data set for cells not to be empty.
5(C) 2006, SNU Biointelligence Lab, The Curse of Dimensionality (Cont’d) Polynomial Curve Fitting Method(M Order) - Althogh D increases, it grows propotionally to D m The Volume of High Dimensional Sphere - Concentrated in a thin shell near the space
6(C) 2006, SNU Biointelligence Lab, The Curse of Dimensionality (Cont’d) Gaussian Distribution
7(C) 2006, SNU Biointelligence Lab, Decision Theory Make Optimal Decisions - Inferrence Step & Decision Step - Select Higher Posterior Probability Minimizing the Misclassification Rate - MAP → Minimizing Colored Area
8(C) 2006, SNU Biointelligence Lab, Decision Theory (Cont’d) Minimizing the Expected Loss - Class 마다 Missclassification 의 Damage 가 다르다. - Introduction of Loss Function(Cost Function) - MAP → Minimizing Expected Loss The Reject Option - Threshold θ - Reject if θ > Posterior Prob.
9(C) 2006, SNU Biointelligence Lab, Decision Theory (Cont’d) Inference and Decision - Three Distinct Approach 1. Obtain Posterior Probability & Generative Models 2. Obtain Posterior Probability & Discriminative Models 3. Find Discrimitive Function
10(C) 2006, SNU Biointelligence Lab, The Reason to Compute the Posterior 1. Minimizing Risk 2. Reject Option 3. Compensating for Class Priors 4. Combining Models 1.5 Decision Theory (Cont’d)
11(C) 2006, SNU Biointelligence Lab, Decision Theory (Cont’d) Loss Function for Regression - Multiple Target Variable Vector
12(C) 2006, SNU Biointelligence Lab, Minkowski Loss 1.5 Decision Theory (Cont’d)
13(C) 2006, SNU Biointelligence Lab, Information Theory Entropy - The noiseless coding theorem states that the entropy is lower bound on the number of bits needed to transmit the state of a random variable. - Higher Entropy, Lager Uncertainty
14(C) 2006, SNU Biointelligence Lab, Information Theory (Cont’d) Maximum Entropy Configuration for Continuous Variable - Constraints - Result - The distribution that maximize the differential entropy is the Gaussian Conditional Entropy : H[x,y] = H[y|x] + H[x]
15(C) 2006, SNU Biointelligence Lab, Information Theory (Cont’d) Relative Entropy [Kullback-Leibler divergence] Convexity Function (Jensen’s Inequality)
16(C) 2006, SNU Biointelligence Lab, Mutual Information 1.6 Information Theory (Cont’d) - I[x, y] = H[x] – H[x|y] = H[y] – H[y|x] - If x and y are independent, I[x,y] = 0 - the Reduction in the uncertainty about x by virtue of being told the value of y