Ch 1. Introduction (Latter) Pattern Recognition and Machine Learning, C. M. Bishop, 2006. Summarized by J.W. Ha Biointelligence Laboratory, Seoul National.

Slides:



Advertisements
Similar presentations
Pattern Recognition and Machine Learning
Advertisements

INTRODUCTION TO MACHINE LEARNING Bayesian Estimation.
Pattern Recognition and Machine Learning
Biointelligence Laboratory, Seoul National University
Pattern Recognition and Machine Learning
LECTURE 11: BAYESIAN PARAMETER ESTIMATION
Ch 11. Sampling Models Pattern Recognition and Machine Learning, C. M. Bishop, Summarized by I.-H. Lee Biointelligence Laboratory, Seoul National.
Linear Models for Classification: Probabilistic Methods
Chapter 4: Linear Models for Classification
What is Statistical Modeling
Laboratory for Social & Neural Systems Research (SNS) PATTERN RECOGNITION AND MACHINE LEARNING Institute of Empirical Research in Economics (IEW)
Visual Recognition Tutorial
Lecture 17: Supervised Learning Recap Machine Learning April 6, 2010.
Classification and risk prediction
Today Linear Regression Logistic Regression Bayesians v. Frequentists
Machine Learning CMPT 726 Simon Fraser University
Decision Theory Naïve Bayes ROC Curves
Machine Learning CUNY Graduate Center Lecture 3: Linear Regression.
Visual Recognition Tutorial
Arizona State University DMML Kernel Methods – Gaussian Processes Presented by Shankar Bhargav.
CHAPTER 4: Parametric Methods. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Parametric Estimation Given.
Thanks to Nir Friedman, HU
Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)
Crash Course on Machine Learning
1. Entropy as an Information Measure - Discrete variable definition Relationship to Code Length - Continuous Variable Differential Entropy 2. Maximum Entropy.
PATTERN RECOGNITION AND MACHINE LEARNING
Machine Learning CUNY Graduate Center Lecture 3: Linear Regression.
Outline Separating Hyperplanes – Separable Case
ECSE 6610 Pattern Recognition Professor Qiang Ji Spring, 2011.
Ch 6. Kernel Methods Pattern Recognition and Machine Learning, C. M. Bishop, Summarized by J. S. Kim Biointelligence Laboratory, Seoul National University.
Biointelligence Laboratory, Seoul National University
Speech Recognition Pattern Classification. 22 September 2015Veton Këpuska2 Pattern Classification  Introduction  Parametric classifiers  Semi-parametric.
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 3: LINEAR MODELS FOR REGRESSION.
Overview of Supervised Learning Overview of Supervised Learning2 Outline Linear Regression and Nearest Neighbors method Statistical Decision.
Ch 2. Probability Distributions (1/2) Pattern Recognition and Machine Learning, C. M. Bishop, Summarized by Yung-Kyun Noh and Joo-kyung Kim Biointelligence.
Ch 4. Linear Models for Classification (1/2) Pattern Recognition and Machine Learning, C. M. Bishop, Summarized and revised by Hee-Woong Lim.
1 E. Fatemizadeh Statistical Pattern Recognition.
Computational Intelligence: Methods and Applications Lecture 23 Logistic discrimination and support vectors Włodzisław Duch Dept. of Informatics, UMK Google:
Machine Learning CUNY Graduate Center Lecture 4: Logistic Regression.
Ch 8. Graphical Models Pattern Recognition and Machine Learning, C. M. Bishop, Revised by M.-O. Heo Summarized by J.W. Nam Biointelligence Laboratory,
Biointelligence Laboratory, Seoul National University
Linear Models for Classification
Ch15: Decision Theory & Bayesian Inference 15.1: INTRO: We are back to some theoretical statistics: 1.Decision Theory –Make decisions in the presence of.
Lecture 2: Statistical learning primer for biologists
Bayesian decision theory: A framework for making decisions when uncertainty exit 1 Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e.
MACHINE LEARNING 7. Dimensionality Reduction. Dimensionality of input Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
6. Population Codes Presented by Rhee, Je-Keun © 2008, SNU Biointelligence Lab,
Discriminative Training and Machine Learning Approaches Machine Learning Lab, Dept. of CSIE, NCKU Chih-Pin Liao.
Basic Technical Concepts in Machine Learning Introduction Supervised learning Problems in supervised learning Bayesian decision theory.
Ch 2. Probability Distributions (1/2) Pattern Recognition and Machine Learning, C. M. Bishop, Summarized by Joo-kyung Kim Biointelligence Laboratory,
CHAPTER 3: BAYESIAN DECISION THEORY. Making Decision Under Uncertainty Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
Giansalvo EXIN Cirrincione unit #4 Single-layer networks They directly compute linear discriminant functions using the TS without need of determining.
Bayesian Brain Probabilistic Approaches to Neural Coding 1.1 A Probability Primer Bayesian Brain Probabilistic Approaches to Neural Coding 1.1 A Probability.
Ch 1. Introduction Pattern Recognition and Machine Learning, C. M. Bishop, Summarized by J.W. Ha Biointelligence Laboratory, Seoul National University.
Ch 1. Introduction Pattern Recognition and Machine Learning, C. M. Bishop, Updated by J.-H. Eom (2 nd round revision) Summarized by K.-I.
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 1: INTRODUCTION.
Basic Technical Concepts in Machine Learning
LECTURE 09: BAYESIAN ESTIMATION (Cont.)
Ch 14. Combining Models Pattern Recognition and Machine Learning, C. M
CS668: Pattern Recognition Ch 1: Introduction
Special Topics In Scientific Computing
Lecture 26: Faces and probabilities
Course Outline MODEL INFORMATION COMPLETE INCOMPLETE
Pattern Recognition and Machine Learning
Biointelligence Laboratory, Seoul National University
Generally Discriminant Analysis
LECTURE 23: INFORMATION THEORY REVIEW
Parametric Methods Berlin Chen, 2005 References:
Biointelligence Laboratory, Seoul National University
Ch 3. Linear Models for Regression (2/2) Pattern Recognition and Machine Learning, C. M. Bishop, Previously summarized by Yung-Kyun Noh Updated.
Presentation transcript:

Ch 1. Introduction (Latter) Pattern Recognition and Machine Learning, C. M. Bishop, Summarized by J.W. Ha Biointelligence Laboratory, Seoul National University

2(C) 2006, SNU Biointelligence Lab, The Curse of Dimensionality 1.5 Decision Theory 1.6 Information Theroy

3(C) 2006, SNU Biointelligence Lab, The Curse of Dimensionality The High Dimensionality Problem Ex. Mixture of Oil, Water, Gas - 3-Class (Homogeneous, Annular, Laminar) - 12 Input Variables - Scatter Plot of x6, x7 - Predict Point X - Simple and Naïve Approach

4(C) 2006, SNU Biointelligence Lab, The Curse of Dimensionality (Cont’d) The Shortcomings of Naïve Approach - The number of cells increase exponentially. - Needs a large training data set for cells not to be empty.

5(C) 2006, SNU Biointelligence Lab, The Curse of Dimensionality (Cont’d) Polynomial Curve Fitting Method(M Order) - Althogh D increases, it grows propotionally to D m The Volume of High Dimensional Sphere - Concentrated in a thin shell near the space

6(C) 2006, SNU Biointelligence Lab, The Curse of Dimensionality (Cont’d) Gaussian Distribution

7(C) 2006, SNU Biointelligence Lab, Decision Theory Make Optimal Decisions - Inference Step & Decision Step - Select Higher Posterior Probability Minimizing the Misclassification Rate - Object: → Minimizing Colored Area

8(C) 2006, SNU Biointelligence Lab, Decision Theory (Cont’d) Minimizing the Expected Loss - Class 마다 Missclassification 의 Damage 가 다르다. - Introduction of Loss Function(Cost Function) - Object : Minimizing Expected Loss The Reject Option - Threshold θ - Reject if θ > Posterior Prob.

9(C) 2006, SNU Biointelligence Lab, Decision Theory (Cont’d) Inference and Decision - Three Distinct Approach 1. Obtain Posterior Probability & Generative Models - Obtain data distribution by Caculating p(x|C k ) for each class - Obtain p(C k ), p(x) to get p(C k |x) in Bayesian Rule - Can generate synthetic data points - Overheads of Calculation

10(C) 2006, SNU Biointelligence Lab, 2. Discriminative Models using Posterior - Obtain Posterior Directly - Classify the class for new input data - In case that classification is needed only 3. Discriminative Function - Maps input x to class directly 1.5 Decision Theory (Cont’d)

11(C) 2006, SNU Biointelligence Lab, Why do we compute the posterior? 1. Minimizing Risk - Frequently changed Loss Matrix 2. Reject Option 3. Compensating for Class Priors - In case of large difference between the probablities of each class - Posterior is proportional to prior 4. Combining Models - Seprate subproblem and Obtain each posterior 1.5 Decision Theory (Cont’d)

12(C) 2006, SNU Biointelligence Lab, Decision Theory (Cont’d) Loss Function for Regression - Multiple Target Variable Vector

13(C) 2006, SNU Biointelligence Lab, Minkowski Loss 1.5 Decision Theory (Cont’d)

14(C) 2006, SNU Biointelligence Lab, Information Theory Entropy - Low probability events corresponds to high information content.( h(x) = -log 2 p(x) ) - Expectaion value of information content. - Higher Entropy, Lager Uncertainty

15(C) 2006, SNU Biointelligence Lab, Information Theory (Cont’d) Maximum Entropy Configuration for Continuous Variable - The distribution that maximize the differential entropy is the Gaussian Conditional Entropy : H[x,y] = H[y|x] + H[x] - Adopt Lagrange multipliers to obtain maximum entropy

16(C) 2006, SNU Biointelligence Lab, Information Theory (Cont’d) Relative Entropy [Kullback-Leibler divergence] Convexity Function (Jensen’s Inequality) - Predict unknown distribution p(x) with an approaxiamting distribution q(x)

17(C) 2006, SNU Biointelligence Lab, Mutual Information 1.6 Information Theory (Cont’d) - I[x, y] = H[x] – H[x|y] = H[y] – H[y|x] - If x and y are independent, I[x,y] = 0 - the Reduction in the uncertainty about x by virtue of being told the value of y - Relative Entropy between the joint distribution and the product of the marginals