Laboratory for Social & Neural Systems Research (SNS) PATTERN RECOGNITION AND MACHINE LEARNING Institute of Empirical Research in Economics (IEW) 22-09-20101.

Slides:



Advertisements
Similar presentations
Pattern Recognition and Machine Learning
Advertisements

Pattern Recognition and Machine Learning
Pattern Recognition and Machine Learning
CS Statistical Machine learning Lecture 13 Yuan (Alan) Qi Purdue CS Oct
An Overview of Machine Learning
Supervised Learning Recap
Pattern Classification. Chapter 2 (Part 1): Bayesian Decision Theory (Sections ) Introduction Bayesian Decision Theory–Continuous Features.
Chapter 4: Linear Models for Classification
Visual Recognition Tutorial
A Bayesian Approach to Joint Feature Selection and Classifier Design Balaji Krishnapuram, Alexander J. Hartemink, Lawrence Carin, Fellow, IEEE, and Mario.
Lecture 17: Supervised Learning Recap Machine Learning April 6, 2010.
Pattern Recognition and Machine Learning
Predictive Automatic Relevance Determination by Expectation Propagation Yuan (Alan) Qi Thomas P. Minka Rosalind W. Picard Zoubin Ghahramani.
Machine Learning CMPT 726 Simon Fraser University CHAPTER 1: INTRODUCTION.
Announcements  Project proposal is due on 03/11  Three seminars this Friday (EB 3105) Dealing with Indefinite Representations in Pattern Recognition.
Today Linear Regression Logistic Regression Bayesians v. Frequentists
Neural Networks: A Statistical Pattern Recognition Perspective
Machine Learning CMPT 726 Simon Fraser University
Visual Recognition Tutorial
Arizona State University DMML Kernel Methods – Gaussian Processes Presented by Shankar Bhargav.
CHAPTER 4: Parametric Methods. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Parametric Estimation Given.
EE462 MLCV 1 Lecture 3-4 Clustering (1hr) Gaussian Mixture and EM (1hr) Tae-Kyun Kim.
Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)
Crash Course on Machine Learning
PATTERN RECOGNITION AND MACHINE LEARNING
ECE 8443 – Pattern Recognition Objectives: Error Bounds Complexity Theory PAC Learning PAC Bound Margin Classifiers Resources: D.M.: Simplified PAC-Bayes.
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 3: LINEAR MODELS FOR REGRESSION.
CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.
Empirical Research Methods in Computer Science Lecture 7 November 30, 2005 Noah Smith.
CSE 446 Logistic Regression Winter 2012 Dan Weld Some slides from Carlos Guestrin, Luke Zettlemoyer.
1 Generative and Discriminative Models Jie Tang Department of Computer Science & Technology Tsinghua University 2012.
Ch 4. Linear Models for Classification (1/2) Pattern Recognition and Machine Learning, C. M. Bishop, Summarized and revised by Hee-Woong Lim.
Machine Learning CUNY Graduate Center Lecture 4: Logistic Regression.
Optimal Bayes Classification
Christopher M. Bishop, Pattern Recognition and Machine Learning.
Bias and Variance of the Estimator PRML 3.2 Ethem Chp. 4.
Sparse Kernel Methods 1 Sparse Kernel Methods for Classification and Regression October 17, 2007 Kyungchul Park SKKU.
Linear Models for Classification
Guest lecture: Feature Selection Alan Qi Dec 2, 2004.
Lecture 2: Statistical learning primer for biologists
Christopher M. Bishop Object Recognition: A Statistical Learning Perspective Microsoft Research, Cambridge Sicily, 2003.
Linear Methods for Classification Based on Chapter 4 of Hastie, Tibshirani, and Friedman David Madigan.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Bias and Variance of the Estimator PRML 3.2 Ethem Chp. 4.
Machine Learning 5. Parametric Methods.
CS Statistical Machine learning Lecture 12 Yuan (Alan) Qi Purdue CS Oct
NTU & MSRA Ming-Feng Tsai
Review of statistical modeling and probability theory Alan Moses ML4bio.
1 Machine Learning: Lecture 6 Bayesian Learning (Based on Chapter 6 of Mitchell T.., Machine Learning, 1997)
Basic Technical Concepts in Machine Learning Introduction Supervised learning Problems in supervised learning Bayesian decision theory.
Ch 1. Introduction (Latter) Pattern Recognition and Machine Learning, C. M. Bishop, Summarized by J.W. Ha Biointelligence Laboratory, Seoul National.
Bayesian Brain Probabilistic Approaches to Neural Coding 1.1 A Probability Primer Bayesian Brain Probabilistic Approaches to Neural Coding 1.1 A Probability.
SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.
Learning Machine Learning Further Introduction from Eleftherios inspired mainly from Ghahramani’s and Bishop’s lectures.
Ch 1. Introduction Pattern Recognition and Machine Learning, C. M. Bishop, Summarized by J.W. Ha Biointelligence Laboratory, Seoul National University.
Ch 1. Introduction Pattern Recognition and Machine Learning, C. M. Bishop, Updated by J.-H. Eom (2 nd round revision) Summarized by K.-I.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Introduction to Machine Learning Nir Ailon Lecture 11: Probabilistic Models.
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 1: INTRODUCTION.
Lecture 2. Bayesian Decision Theory
Basic Technical Concepts in Machine Learning
Probability Theory and Parameter Estimation I
Special Topics In Scientific Computing
CS668: Pattern Recognition Ch 1: Introduction
Special Topics In Scientific Computing
Pattern Recognition and Machine Learning
Loss.
LECTURE 23: INFORMATION THEORY REVIEW
Parametric Methods Berlin Chen, 2005 References:
Linear Discrimination
Presentation transcript:

Laboratory for Social & Neural Systems Research (SNS) PATTERN RECOGNITION AND MACHINE LEARNING Institute of Empirical Research in Economics (IEW) Computational Neuroeconomics and Neuroscience

Course schedule Computational Neuroeconomics and Neuroscience 2 Date Topic Chapter Density Estimation, Bayesian Inference 2 Adrian Etter, Marco Piccirelli, Giuseppe Ugazio Linear Models for Regression 3 Susanne Leiberg, Grit Hein Linear Models for Classification 4 Friederike Meyer, Chaohui Guo Kernel Methods I: Gaussian Processes 6 Kate Lomakina Kernel Methods II: SVM and RVM 7 Christoph Mathys, Morteza Moazami Probabilistic Graphical Models 8 Justin Chumbley Date Topic Chapter Density Estimation, Bayesian Inference 2 Adrian Etter, Marco Piccirelli, Giuseppe Ugazio Linear Models for Regression 3 Susanne Leiberg, Grit Hein Linear Models for Classification 4 Friederike Meyer, Chaohui Guo Kernel Methods I: Gaussian Processes 6 Kate Lomakina Kernel Methods II: SVM and RVM 7 Christoph Mathys, Morteza Moazami Probabilistic Graphical Models 8 Justin Chumbley

Course schedule Computational Neuroeconomics and Neuroscience 3 Date Topic Chapter Mixture Models and EM 9 Bastiaan Oud, Tony Williams Approximate Inference I: Deterministic Approximations 10 Falk Lieder Approximate Inference II: Stochastic Approximations 11 Kay Brodersen Inference on Continuous Latent Variables: PCA, Probabilistic PCA, ICA 12 Lars Kasper Sequential Data: Hidden Markov Models, Linear Dynamical Systems 13 Chris Burke, Yosuke Morishima Date Topic Chapter Mixture Models and EM 9 Bastiaan Oud, Tony Williams Approximate Inference I: Deterministic Approximations 10 Falk Lieder Approximate Inference II: Stochastic Approximations 11 Kay Brodersen Inference on Continuous Latent Variables: PCA, Probabilistic PCA, ICA 12 Lars Kasper Sequential Data: Hidden Markov Models, Linear Dynamical Systems 13 Chris Burke, Yosuke Morishima

Sandra Iglesias Laboratory for Social & Neural Systems Research (SNS) CHAPTER 1: PROBABILITY, DECISION, AND INFORMATION THEORY Institute of Empirical Research in Economics (IEW) Computational Neuroeconomics and Neuroscience

Outline -Introduction -Probability Theory -Probability Rules -Bayes’Theorem -Gaussian Distribution -Decision Theory -Information Theory Computational Neuroeconomics and Neuroscience 5

Pattern recognition Computational Neuroeconomics and Neuroscience 6 computer algorithms  automatic discovery of regularities in data use of these regularities to take actions such as classifying the data into different categories classify data (patterns) based either on -a priori knowledge or -statistical information extracted from the patterns

Machine learning Computational Neuroeconomics and Neuroscience 7 'How can we program systems to automatically learn and to improve with experience?' the machine is programmed to learn from an incomplete set of examples (training set) the core objective of a learner is to generalize from its experience

Polynomial Curve Fitting Computational Neuroeconomics and Neuroscience

Sum-of-Squares Error Function Computational Neuroeconomics and Neuroscience 

Plots of polynomials Computational Neuroeconomics and Neuroscience

Over-fitting Root-Mean-Square (RMS) Error: Computational Neuroeconomics and Neuroscience

Regularization Penalize large coefficient values Computational Neuroeconomics and Neuroscience M = 9

Regularization: vs Computational Neuroeconomics and Neuroscience M = 9

Outline -Introduction -Probability Theory -Decision Theory -Information Theory Computational Neuroeconomics and Neuroscience 14

Probability Theory Uncertainty Probability theory:  consistent framework for the quantification and manipulation of uncertainty Computational Neuroeconomics and Neuroscience 15 Noise on measurements Finite size of data sets

Probability Theory Marginal Probability Conditional Probability Joint Probability Computational Neuroeconomics and Neuroscience

Probability Theory Computational Neuroeconomics and Neuroscience i = 1, …,M j = 1, …,L n ij : number of trials in which X = x i and Y = y j c i : number of trials in which X = x i irrespective of the value of Y r j : number of trials in which X = x i irrespective of the value of Y

Probability Theory Marginal Probability Conditional Probability Joint Probability Computational Neuroeconomics and Neuroscience

Probability Theory Marginal Probability Conditional Probability Joint Probability Computational Neuroeconomics and Neuroscience

Probability Theory Marginal Probability Conditional Probability Joint Probability Computational Neuroeconomics and Neuroscience

Probability Theory Sum Rule Computational Neuroeconomics and Neuroscience

Probability Theory Product Rule Computational Neuroeconomics and Neuroscience

The Rules of Probability Sum Rule Product Rule Computational Neuroeconomics and Neuroscience

Bayes’ Theorem Computational Neuroeconomics and Neuroscience T. Bayes ( ) P.-S. Laplace ( ) p(X,Y) = p(Y,X)

Bayes’ Theorem posterior  likelihood × prior Computational Neuroeconomics and Neuroscience T. Bayes ( ) P.-S. Laplace ( ) Polynomial curve fitting problem

Probability Densities Computational Neuroeconomics and Neuroscience

Expectations Expectation for a discrete distribution: Computational Neuroeconomics and Neuroscience Expectation for a continuous distribution: Expectation of f(x) is the average value of some function f(x) under a probability distribution p(x)

The Gaussian Distribution Computational Neuroeconomics and Neuroscience

Gaussian Parameter Estimation Likelihood function Computational Neuroeconomics and Neuroscience

Maximum (Log) Likelihood Computational Neuroeconomics and Neuroscience

Curve Fitting Re-visited Computational Neuroeconomics and Neuroscience

Maximum Likelihood Determine by minimizing sum-of-squares error, Computational Neuroeconomics and Neuroscience

Outline -Introduction -Probability Theory -Decision Theory -Information Theory Computational Neuroeconomics and Neuroscience 33

Decision Theory Used with probability theory to make optimal decisions Input vector x, target vector t Regression: t is continuous Classification: t will consist of class labels Summary of uncertainty associated is given by Inference problem: is to obtain from data Decision problem: make specific prediction for value of t and take specific actions based on t Inference stepDecision step Determine either or. For given x, determine optimal t Computational Neuroeconomics and Neuroscience

Medical Diagnosis Problem X-ray image of patient Whether patient has cancer or not Input vector x: set of pixel intensities Output variable t: whether cancer or not C1 = cancer; C2 = no cancer General inference problem is to determine which gives most complete description of situation In the end we need to decide whether to give treatment or not  Decision theory helps do this Computational Neuroeconomics and Neuroscience

Bayes’ Decision How do probabilities play a role in making a decision? Given input x and classes C k using Bayes’ theorem Quantities in Bayes theorem can be obtained from p(x,Ck) either by marginalizing or conditioning with respect to the appropriate variable Computational Neuroeconomics and Neuroscience 36

Minimum Expected Loss Example: classify medical images as ‘cancer’ or ‘normal’ Unequal importance of mistakes Loss or Cost Function given by Loss Matrix Utility is negative of Loss Minimize Average Loss Decision Truth Computational Neuroeconomics and Neuroscience Regions are chosen to minimize

Why Separate Inference and Decision? Classification problem  broken into two separate stages: – Inference stage: training data is used to learn a model for – Decision stage: posterior probabilities used to make optimal class assignments Three distinct approaches to solving decision problems 1. Generative models: 2. Discriminative models 3. Discriminant functions Computational Neuroeconomics and Neuroscience

Generative models 1. solve inference problem of determining class-conditional densities for each class separately and the prior probabilities 2. use Bayes’ theorem to determine posterior probabilities 3. use decision theory to determine class membership Computational Neuroeconomics and Neuroscience 39

Discriminative models 1. solve inference problem to determine posterior class probabilities 2. Use decision theory to determine class membership Computational Neuroeconomics and Neuroscience 40

Discriminant functions 1. Find a function f(x) that maps each input x directly to a class label e.g. two-class problem: f (·) is binary valued f =0 represents C1, f =1 represents C2  Probabilities play no role Computational Neuroeconomics and Neuroscience 41

Decision Theory for Regression Inference step Determine Decision step For given x, make optimal prediction, y(x), for t Loss function: Computational Neuroeconomics and Neuroscience

Outline -Introduction -Probability Theory -Decision Theory -Information Theory Computational Neuroeconomics and Neuroscience 43

Information theory Quantification of information Degree of surprise: highly improbable  a lot of information highly probable  less information certain  no information Based on probability theory Most important quantity: entropy Computational Neuroeconomics and Neuroscience 44

Entropy Computational Neuroeconomics and Neuroscience H[x] p(x) 0 Entropy is the average amount of information expected, weighted with the probability of the random variable  quantifies the uncertainty involved when we encounter this random variable

The Kullback-Leibler Divergence Computational Neuroeconomics and Neuroscience Non-symmetric measure of the difference between two probability distributions Also called relative entropy

Mutual Information Computational Neuroeconomics and Neuroscience Two sets of variables: x and y If independent: If not independent:

Mutual Information Computational Neuroeconomics and Neuroscience Mutual information  mutual dependence  shared information  related to the conditional entropy

Course schedule Computational Neuroeconomics and Neuroscience 49 Date Topic Chapter Probability, Decision, and Information Theory Density Estimation, Bayesian Inference Linear Models for Regression Linear Models for Classification Kernel Methods I: Gaussian Processes Kernel Methods II: SVM and RVM Probabilistic Graphical Models Mixture Models and EM Approximate Inference I: Deterministic Approximations Approximate Inference II: Stochastic Approximations Inference on Continuous Latent Variables: PCA, Probabilistic PCA, ICA Sequential Data: Hidden Markov Models, Linear Dynamical Systems 13 Date Topic Chapter Probability, Decision, and Information Theory Density Estimation, Bayesian Inference Linear Models for Regression Linear Models for Classification Kernel Methods I: Gaussian Processes Kernel Methods II: SVM and RVM Probabilistic Graphical Models Mixture Models and EM Approximate Inference I: Deterministic Approximations Approximate Inference II: Stochastic Approximations Inference on Continuous Latent Variables: PCA, Probabilistic PCA, ICA Sequential Data: Hidden Markov Models, Linear Dynamical Systems 13