Machine Learning CMPT 726 Simon Fraser University CHAPTER 1: INTRODUCTION.

Slides:



Advertisements
Similar presentations
Pattern Recognition and Machine Learning
Advertisements

Bayes rule, priors and maximum a posteriori
Lecture 4A: Probability Theory Review Advanced Artificial Intelligence.
INTRODUCTION TO MACHINE LEARNING Bayesian Estimation.
Review of Probability. Definitions (1) Quiz 1.Let’s say I have a random variable X for a coin, with event space {H, T}. If the probability P(X=H) is.
Pattern Recognition and Machine Learning
ECE 8443 – Pattern Recognition LECTURE 05: MAXIMUM LIKELIHOOD ESTIMATION Objectives: Discrete Features Maximum Likelihood Resources: D.H.S: Chapter 3 (Part.
CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 10: The Bayesian way to fit models Geoffrey Hinton.
What is Statistical Modeling
Laboratory for Social & Neural Systems Research (SNS) PATTERN RECOGNITION AND MACHINE LEARNING Institute of Empirical Research in Economics (IEW)
06/05/2008 Jae Hyun Kim Chapter 2 Probability Theory (ii) : Many Random Variables Bioinformatics Tea Seminar: Statistical Methods in Bioinformatics.
An Introduction to Bayesian Inference Michael Betancourt April 8,
Descriptive statistics Experiment  Data  Sample Statistics Sample mean Sample variance Normalize sample variance by N-1 Standard deviation goes as square-root.
Lecture 5: Learning models using EM
Assignment 2 Chapter 2: Problems  Due: March 1, 2004 Exam 1 April 1, 2004 – 6:30-8:30 PM Exam 2 May 13, 2004 – 6:30-8:30 PM Makeup.
Visual Recognition Tutorial
A gentle introduction to Gaussian distribution. Review Random variable Coin flip experiment X = 0X = 1 X: Random variable.
Descriptive statistics Experiment  Data  Sample Statistics Experiment  Data  Sample Statistics Sample mean Sample mean Sample variance Sample variance.
Machine Learning CMPT 726 Simon Fraser University
Probability theory 2010 Conditional distributions  Conditional probability:  Conditional probability mass function: Discrete case  Conditional probability.
Chapter 5. Operations on Multiple R. V.'s 1 Chapter 5. Operations on Multiple Random Variables 0. Introduction 1. Expected Value of a Function of Random.
Hamid R. Rabiee Fall 2009 Stochastic Processes Review of Elementary Probability Lecture I.
Machine Learning Queens College Lecture 3: Probability and Statistics.
PBG 650 Advanced Plant Breeding
ECE 8443 – Pattern Recognition LECTURE 06: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Bias in ML Estimates Bayesian Estimation Example Resources:
IRDM WS Chapter 2: Basics from Probability Theory and Statistics 2.1 Probability Theory Events, Probabilities, Random Variables, Distributions,
Bayesian Inference Ekaterina Lomakina TNU seminar: Bayesian inference 1 March 2013.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Deterministic vs. Random Maximum A Posteriori Maximum Likelihood Minimum.
ECE 5984: Introduction to Machine Learning Dhruv Batra Virginia Tech Topics: –Statistical Estimation (MLE, MAP, Bayesian) Readings: Barber 8.6, 8.7.
Bayesian Hypothesis Testing for Proportions Antonio Nieto / Sonia Extremera / Javier Gómez PhUSE Annual Conference, 9th-12th Oct 2011, Brighton UK.
Ch 2. Probability Distributions (1/2) Pattern Recognition and Machine Learning, C. M. Bishop, Summarized by Yung-Kyun Noh and Joo-kyung Kim Biointelligence.
IE 300, Fall 2012 Richard Sowers IESE. 8/30/2012 Goals: Rules of Probability Counting Equally likely Some examples.
Lecture 2: Statistical learning primer for biologists
Chapter 20 Classification and Estimation Classification – Feature selection Good feature have four characteristics: –Discrimination. Features.
Statistics Sampling Distributions and Point Estimation of Parameters Contents, figures, and exercises come from the textbook: Applied Statistics and Probability.
Central Limit Theorem Let X 1, X 2, …, X n be n independent, identically distributed random variables with mean  and standard deviation . For large n:
Sparse Approximate Gaussian Processes. Outline Introduction to GPs Subset of Data Bayesian Committee Machine Subset of Regressors Sparse Pseudo GPs /
Statistical NLP: Lecture 4 Mathematical Foundations I: Probability Theory (Ch2)
Ch 2. Probability Distributions (1/2) Pattern Recognition and Machine Learning, C. M. Bishop, Summarized by Joo-kyung Kim Biointelligence Laboratory,
Chapter 5 Joint Probability Distributions and Random Samples  Jointly Distributed Random Variables.2 - Expected Values, Covariance, and Correlation.3.
Essential Probability & Statistics (Lecture for CS397-CXZ Algorithms in Bioinformatics) Jan. 23, 2004 ChengXiang Zhai Department of Computer Science University.
Bayesian Brain Probabilistic Approaches to Neural Coding 1.1 A Probability Primer Bayesian Brain Probabilistic Approaches to Neural Coding 1.1 A Probability.
Ch 1. Introduction Pattern Recognition and Machine Learning, C. M. Bishop, Updated by J.-H. Eom (2 nd round revision) Summarized by K.-I.
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 1: INTRODUCTION.
Crash course in probability theory and statistics – part 2 Machine Learning, Wed Apr 16, 2008.
Review of Probability.
Probability Theory and Parameter Estimation I
Probability for Machine Learning
ICS 280 Learning in Graphical Models
Ch3: Model Building through Regression
CS 2750: Machine Learning Probability Review Density Estimation
Bayes Net Learning: Bayesian Approaches
Lecture 09: Gaussian Processes
Of Probability & Information Theory
Chapter 7: Sampling Distributions
Review of Probabilities and Basic Statistics
Basic Probability Theory
Special Topics In Scientific Computing
CHAPTER 3: Bayesian Decision Theory
OVERVIEW OF BAYESIAN INFERENCE: PART 1
Probability & Statistics Probability Theory Mathematical Probability Models Event Relationships Distributions of Random Variables Continuous Random.
Advanced Artificial Intelligence
Introduction to Econometrics
Statistical NLP: Lecture 4
Pattern Recognition and Machine Learning
Lecture 10: Gaussian Processes
Parametric Methods Berlin Chen, 2005 References:
Pattern Recognition and Machine Learning Chapter 2: Probability Distributions July chonbuk national university.
CS639: Data Management for Data Science
Mathematical Foundations of BME Reza Shadmehr
Presentation transcript:

Machine Learning CMPT 726 Simon Fraser University CHAPTER 1: INTRODUCTION

Outline Comments on general approach. Probability Theory. Joint, conditional and marginal probabilities. Random Variables. Functions of R.V.s Bernoulli Distribution (Coin Tosses). Maximum Likelihood Estimation. Bayesian Learning With Conjugate Prior. The Gaussian Distribution. Maximum Likelihood Estimation. Bayesian Learning With Conjugate Prior. More Probability Theory. Entropy. KL Divergence.

Our Approach The course generally follows statistics, very interdisciplinary. Emphasis on predictive models: guess the value(s) of target variable(s). “Pattern Recognition” Generally a Bayesian approach as in the text. Compared to standard Bayesian statistics: more complex models (neural nets, Bayes nets) more discrete variables more emphasis on algorithms and efficiency

Things Not Covered Within statistics: Hypothesis testing Frequentist theory, learning theory. Other types of data (not random samples) Relational data Scientific data (automated scientific discovery) Action + learning = reinforcement learning. Could be optional – what do you think?

Probability Theory Apples and Oranges

Probability Theory Marginal Probability Conditional Probability Joint Probability

Probability Theory Sum Rule Product Rule

The Rules of Probability Sum Rule Product Rule

Bayes’ Theorem posterior  likelihood × prior

Bayes’ Theorem: Model Version Let M be model, E be evidence. P(M|E) proportional to P(M) x P(E|M) Intuition prior = how plausible is the event (model, theory) a priori before seeing any evidence. likelihood = how well does the model explain the data?

Probability Densities

Transformed Densities

Expectations Conditional Expectation (discrete) Approximate Expectation (discrete and continuous)

Expectations are Linear Let aX + bY + c be a linear combination of two random variables (itself a random variable). Then E[aX + bY + c] = aE[X] + bE[Y] + c. This holds whether or not X and Y are independent. Good exercise to prove it.

Variances and Covariances Think about this difference: 1.Everybody gets a B students get a C, 10 get an A. The average is the same – how to quantify the difference? Prove this. Hint: use the linearity of expectation.