Bayesian Learning 1 of (probably) 2. Administrivia Readings 1 back today Good job, overall Watch your spelling/grammar! Nice analyses, though Possible.

Slides:



Advertisements
Similar presentations
Machine Learning Math Essentials Part 2
Advertisements

Pattern Recognition and Machine Learning
Clustering Beyond K-means
Computer vision: models, learning and inference Chapter 8 Regression.
CS Statistical Machine learning Lecture 13 Yuan (Alan) Qi Purdue CS Oct
Machine Learning in Practice Lecture 7 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.
Pattern Classification. Chapter 2 (Part 1): Bayesian Decision Theory (Sections ) Introduction Bayesian Decision Theory–Continuous Features.
Bayesian Wrap-Up (probably). 5 minutes of math... Marginal probabilities If you have a joint PDF:... and want to know about the probability of just one.
Chapter 4: Linear Models for Classification
Probabilistic Generative Models Rong Jin. Probabilistic Generative Model Classify instance x into one of K classes Class prior Density function for class.
What is Statistical Modeling
Visual Recognition Tutorial
Support Vector Machines and Kernel Methods
Classification and risk prediction
Bayesian Learning, Part 1 of (probably) 4 Reading: Bishop Ch. 1.2, 1.5, 2.3.
Intro to Linear Methods Reading: DH&S, Ch 5.{1-4,8} hip to be hyperplanar...
Intro to Linear Methods Reading: Bishop, 3.0, 3.1, 4.0, 4.1 hip to be hyperplanar...
Bayesian Learning, Part 1 of (probably) 4 Reading: DH&S, Ch. 2.{1-5}, 3.{1-4}
Bayesianness, cont’d Part 2 of... 4?. Administrivia CSUSC (CS UNM Student Conference) March 1, 2007 (all day) That’s a Thursday... Thoughts?
Margins, support vectors, and linear programming Thanks to Terran Lane and S. Dreiseitl.
Bayesian learning finalized (with high probability)
Bayesian Learning, Cont’d. Administrivia Various homework bugs: Due: Oct 12 (Tues) not 9 (Sat) Problem 3 should read: (duh) (some) info on naive Bayes.
Generative Models Rong Jin. Statistical Inference Training ExamplesLearning a Statistical Model  Prediction p(x;  ) Female: Gaussian distribution N(
Expectation- Maximization. News o’ the day First “3-d” picture of sun Anybody got red/green sunglasses?
Visual Recognition Tutorial
Bayesian Wrap-Up (probably). Administrivia Office hours tomorrow on schedule Woo hoo! Office hours today deferred... [sigh] 4:30-5:15.
Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)
Bayesian Learning, cont’d. Administrivia Homework 1 returned today (details in a second) Reading 2 assigned today S. Thrun, Learning occupancy grids with.
Bayesian Learning Part 3+/- σ. Administrivia Final project/proposal Hand-out/brief discussion today Proposal due: Mar 27 Midterm exam: Thurs, Mar 22 (Thurs.
METU Informatics Institute Min 720 Pattern Classification with Bio-Medical Applications PART 2: Statistical Pattern Classification: Optimal Classification.
EE513 Audio Signals and Systems Statistical Pattern Classification Kevin D. Donohue Electrical and Computer Engineering University of Kentucky.
1 Linear Methods for Classification Lecture Notes for CMPUT 466/551 Nilanjan Ray.
Final review LING572 Fei Xia Week 10: 03/11/
B. RAMAMURTHY EAP#2: Data Mining, Statistical Analysis and Predictive Analytics for Automotive Domain CSE651C, B. Ramamurthy 1 6/28/2014.
Principles of Pattern Recognition
ADVANCED CLASSIFICATION TECHNIQUES David Kauchak CS 159 – Fall 2014.
Support Vector Machines Mei-Chen Yeh 04/20/2010. The Classification Problem Label instances, usually represented by feature vectors, into one of the predefined.
IID Samples In supervised learning, we usually assume that data points are sampled independently and from the same distribution IID assumption: data are.
Learning Theory Reza Shadmehr logistic regression, iterative re-weighted least squares.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Deterministic vs. Random Maximum A Posteriori Maximum Likelihood Minimum.
CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.
Empirical Research Methods in Computer Science Lecture 7 November 30, 2005 Noah Smith.
Learning Theory Reza Shadmehr Linear and quadratic decision boundaries Kernel estimates of density Missing data.
1 E. Fatemizadeh Statistical Pattern Recognition.
Computational Intelligence: Methods and Applications Lecture 23 Logistic discrimination and support vectors Włodzisław Duch Dept. of Informatics, UMK Google:
Overview of the final test for CSC Overview PART A: 7 easy questions –You should answer 5 of them. If you answer more we will select 5 at random.
Principal Component Analysis Machine Learning. Last Time Expectation Maximization in Graphical Models – Baum Welch.
ECE 471/571 – Lecture 2 Bayesian Decision Theory 08/25/15.
Chapter 20 Classification and Estimation Classification – Feature selection Good feature have four characteristics: –Discrimination. Features.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Gaussian Processes For Regression, Classification, and Prediction.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 1: INTRODUCTION.
CS 9633 Machine Learning Support Vector Machines
Probability Theory and Parameter Estimation I
Combining Random Variables
Special Topics In Scientific Computing
REMOTE SENSING Multispectral Image Classification
دانشگاه صنعتی امیرکبیر Instructor : Saeed Shiry
Mathematical Foundations of BME Reza Shadmehr
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
Machine Learning Math Essentials Part 2
EE513 Audio Signals and Systems
Pattern Recognition and Machine Learning
Mathematical Foundations of BME
Parametric Methods Berlin Chen, 2005 References:
Mathematical Foundations of BME
Linear Discrimination
Bayesian Decision Theory
Presentation transcript:

Bayesian Learning 1 of (probably) 2

Administrivia Readings 1 back today Good job, overall Watch your spelling/grammar! Nice analyses, though Possible fruit for final proj? HW2 assigned today Due Oct 12 (2 weeks) Do start early...

ML trivia of the day... Which data mining techniques [have] you used in a successfully deployed application?

Bayesian Classification

Assumptions “ ‘Assume’ makes an ass out of ‘U’ and ‘me’” Bull**** Assumptions about data are unavoidable Learn faster/better when you know (assume) more about data Decision tree: Axis orthagonality; accuracy cost (0/1 loss) k-NN: Distance function/metric; accuracy cost LSE: Linear separator; squared error cost

Assumptions SVMs Linear separator High-dim projection (kernel function) Generalized inner product/cosine Max margin cost function

Specifying assumptions Bayesian learning assumes: Data were generated by some stochastic process Can write down (some) mathematical form for that process CDF/PDF/PMF Mathematical form needs to be parameterized Have some “prior beliefs” about those params Essentially, an attempt to make assumptions explicit and to divorce them from learning algorithm In practice, not a single learning algorithm, but a recipe for generating problem-specific algs.

Example F ={height, weight} C ={male, female} Q1: Any guesses about individual distributions of height/weight by class? What probability function (PDF)? Q2: What about the joint distribution? Q3: What about the means of each? Reasonable guess for the upper/lower bounds on the means?

Some actual data* * Actual synthesized data, anyway...

General idea Find probability distribution that describes classes of data Find decision surface in terms of those probability distributions

H/W data as PDFs

Or, if you prefer...

General idea Find probability distribution that describes classes of data Find decision surface in terms of those probability distributions What would be a good rule?

The Bayes optimal decision For 0/1 loss (accuracy), is provable that optimal decision is: Equivalently, it’s sometimes useful to use log odds ratio test:

Bayes decisions in pictures x1 f(x1) f(x1|c2) f(x1|c1) c1 c2 c1

Bayesian learning process So where do the probability distributions come from? The art of Bayesian data modeling is: Deciding what probability models to use Figuring out how to find the parameters In Bayesian learning, the “learning” is (almost) all in finding the parameters

Back to the H/W data

Gaussian (a.k.a. normal or bell curve) is a reasonable assumption for this data Other distributions better for other data Can make reasonable guesses about means Probably not -3 kg or 2 million lightyears Assumptions like these are called Model assumptions (Gaussian) Parameter priors (means) How do we incorporate these into learning? Prior knowledge

5 minutes of math... Our friend the Gaussian distribution 1n 1-dimension: Mean: Std deviation: Both parameters scalar Usually, we talk about variance rather than std dev:

5 minutes of math... In d dimensions: Where: Mean vector: Covariance matrix: Determinant of covariance:

Exercise: For the 1-d Gaussian: Given two classes, with means and and std devs and Find a description of the decision point if the std devs are the same, but diff means And if means are the same, but std devs are diff For the d-dim Gaussian, What shapes are the isopotentials? Why? Repeat above exercise for d-dim Gaussian