Ai in game programming it university of copenhagen Statistical Learning Methods Marco Loog.

Slides:



Advertisements
Similar presentations
Naïve Bayes. Bayesian Reasoning Bayesian reasoning provides a probabilistic approach to inference. It is based on the assumption that the quantities of.
Advertisements

ITCS 3153 Artificial Intelligence Lecture 24 Statistical Learning Chapter 20 Lecture 24 Statistical Learning Chapter 20.
ECE 8443 – Pattern Recognition LECTURE 05: MAXIMUM LIKELIHOOD ESTIMATION Objectives: Discrete Features Maximum Likelihood Resources: D.H.S: Chapter 3 (Part.
CS479/679 Pattern Recognition Dr. George Bebis
Biointelligence Laboratory, Seoul National University
2 – In previous chapters: – We could design an optimal classifier if we knew the prior probabilities P(wi) and the class- conditional probabilities P(x|wi)
Statistical Learning Methods Copyright, 1996 © Dale Carnegie & Associates, Inc. Chapter 20 (20.1, 20.2, 20.3, 20.4) Fall 2005.
Chapter 20 of AIMA KAIST CS570 Lecture note
Bayesian Learning, Regression-based learning. Overview  Bayesian Learning  Full  MAP learning  Maximum Likelihood Learning  Learning Bayesian Networks.
Visual Recognition Tutorial
Overview Full Bayesian Learning MAP learning
Prof. Ramin Zabih (CS) Prof. Ashish Raj (Radiology) CS5540: Computational Techniques for Analyzing Clinical Data.
Parameter Estimation: Maximum Likelihood Estimation Chapter 3 (Duda et al.) – Sections CS479/679 Pattern Recognition Dr. George Bebis.
1 Chapter 12 Probabilistic Reasoning and Bayesian Belief Networks.
Statistical Learning: Bayesian and ML COMP155 Sections May 2, 2007.
Predictive Automatic Relevance Determination by Expectation Propagation Yuan (Alan) Qi Thomas P. Minka Rosalind W. Picard Zoubin Ghahramani.
Review P(h i | d) – probability that the hypothesis is true, given the data (effect  cause) Used by MAP: select the hypothesis that is most likely given.
Neural Networks Marco Loog.
Machine Learning CMPT 726 Simon Fraser University
Bayesian Learning and Learning Bayesian Networks.
Visual Recognition Tutorial
Computer vision: models, learning and inference
Introduction to Bayesian Parameter Estimation
Rutgers CS440, Fall 2003 Introduction to Statistical Learning Reading: Ch. 20, Sec. 1-4, AIMA 2 nd Ed.
Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)
CS Bayesian Learning1 Bayesian Learning. CS Bayesian Learning2 States, causes, hypotheses. Observations, effect, data. We need to reconcile.
Jeff Howbert Introduction to Machine Learning Winter Classification Bayesian Classifiers.
Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.
ECE 8443 – Pattern Recognition LECTURE 06: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Bias in ML Estimates Bayesian Estimation Example Resources:
EM and expected complete log-likelihood Mixture of Experts
Statistical Decision Theory
Naive Bayes Classifier
CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 11: Bayesian learning continued Geoffrey Hinton.
ECE 8443 – Pattern Recognition LECTURE 07: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Class-Conditional Density The Multivariate Case General.
1 Instance-Based & Bayesian Learning Chapter Some material adapted from lecture notes by Lise Getoor and Ron Parr.
Statistical Learning (From data to distributions).
CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.
Learning Theory Reza Shadmehr Linear and quadratic decision boundaries Kernel estimates of density Missing data.
Bayesian Classification. Bayesian Classification: Why? A statistical classifier: performs probabilistic prediction, i.e., predicts class membership probabilities.
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Statistical Inference (By Michael Jordon) l Bayesian perspective –conditional perspective—inferences.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 07: BAYESIAN ESTIMATION (Cont.) Objectives:
Guest lecture: Feature Selection Alan Qi Dec 2, 2004.
Dropout as a Bayesian Approximation
Machine Learning 5. Parametric Methods.
1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.
Univariate Gaussian Case (Cont.)
Intro. ANN & Fuzzy Systems Lecture 15. Pattern Classification (I): Statistical Formulation.
Naïve Bayes Classifier April 25 th, Classification Methods (1) Manual classification Used by Yahoo!, Looksmart, about.com, ODP Very accurate when.
CS Ensembles and Bayes1 Ensembles, Model Combination and Bayesian Combination.
Naive Bayes Classifier. REVIEW: Bayesian Methods Our focus this lecture: – Learning and classification methods based on probability theory. Bayes theorem.
CSC321: Lecture 8: The Bayesian way to fit models Geoffrey Hinton.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Statistical Learning Methods
Univariate Gaussian Case (Cont.)
Lecture 1.31 Criteria for optimal reception of radio signals.
CS479/679 Pattern Recognition Dr. George Bebis
Chapter 3: Maximum-Likelihood Parameter Estimation
LECTURE 06: MAXIMUM LIKELIHOOD ESTIMATION
Naive Bayes Classifier
Ch3: Model Building through Regression
Parameter Estimation 主講人:虞台文.
CS 416 Artificial Intelligence
Data Mining Lecture 11.
Course Outline MODEL INFORMATION COMPLETE INCOMPLETE
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
Integration of sensory modalities
LECTURE 09: BAYESIAN LEARNING
LECTURE 07: BAYESIAN ESTIMATION
Parametric Methods Berlin Chen, 2005 References:
Naive Bayes Classifier
Presentation transcript:

ai in game programming it university of copenhagen Statistical Learning Methods Marco Loog

ai in game programming it university of copenhagen Introduction  Agents can handle uncertainty by using the methods of probability and decision theory  But first they must learn their probabilistic theories of the world from experience...

ai in game programming it university of copenhagen Key Concepts  Data : evidence, i.e., instantiation of one or more random variables describing the domain  Hypotheses : probabilistic theories of how the domain works

ai in game programming it university of copenhagen Outline  Bayesian learning  Maximum a posteriori and maximum likelihood learning  Instance-based learning  Neural networks

ai in game programming it university of copenhagen Bayesian Learning  Let D be all data, with observed value d, then probability of a hypothesis h i, using Bayes rule : P(h i |d) = aP(d|h i )P(h i )  For prediction about quantity X : P(X|d)= ∑ P(X|d,h i )P(h i |d)= ∑ P(X|h i )P(h i |d)

ai in game programming it university of copenhagen Bayesian Learning  For prediction about quantity X : P(X|d)= ∑ P(X|d,h i )P(h i |d)= ∑ P(X|h i )P(h i |d)  No single best-guess hypothesis

ai in game programming it university of copenhagen Bayesian Learning  Simply calculates probability of each hypothesis, given data, and makes predictions based on this  I.e., predictions based on all hypothesis, weighted by their probabilities, rather than using only ‘single best’ hypothesis

ai in game programming it university of copenhagen Candy  Suppose five kinds of bags of candies  10% are h1 : 100% cherry candies  20% are h2 : 75% cherry candies + 25% lime candies  40% are h3 : 50% cherry candies + 50% lime candies  20% are h4 : 25% cherry candies + 75% lime candies  10% are h5 : 100% lime candies  We observe candies drawn from some bag

ai in game programming it university of copenhagen Mo’ Candy  We observe candies drawn from some bag  Assume observations are i.i.d., e.g. because many candies in the bag  Assume we don’t like the green lime candy  Important questions  What kind of bag is it? h1, h2,...,h5?  What flavor will the next candy be?

ai in game programming it university of copenhagen Posterior Probability of Hypotheses

ai in game programming it university of copenhagen Posterior Probability of Hypotheses  True hypothesis will eventually dominate the Bayesian prediction [prior is of no influence in the long run]  More importantly [maybe not for us?] : Bayesian prediction is optimal

ai in game programming it university of copenhagen The Price for Being Optimal  For real learning problems the hypothesis space is large, possibly infinite  Summation / integration over hypothesis cannot be carried out  Resort to approximate or simplified methods

ai in game programming it university of copenhagen Maximum A Posteriori  Common approximation method : make predictions on the single most probable hypothesis  I.e. take the h i that maximizes P(h i |d)  Such a MAP hypothesis is approximately Bayesian, i.e., P(X|d) ≈ P(X|h i ) [the more evidence the better the approximation]

ai in game programming it university of copenhagen Hypothesis Prior  Both in Bayesian learning and in MAP learning, hypothesis prior plays an important role  If hypothesis space is too expressive overfitting can occur [see also Chapter 18]  Prior is used to penalize complexity [instead of explicitly limiting the space] : the more complex the hypothesis the lower the prior probability  If enough evidence available, eventually complex hypothesis chosen [if necessary]

ai in game programming it university of copenhagen Maximum Likelihood Approximation  For enough data, prior becomes irrelevant  Maximum likelihood [ML] learning : choose h that maximizes P(d|h i )  I.e., simply get the best fit to the data  Identical to MAP for uniform prior P(h i )  Also reasonable if all hypotheses are of the same complexity  ML is the ‘standard’ [non-Bayesian / ‘classical’] statistical learning method

ai in game programming it university of copenhagen E.g.  Bag from new manufacturer; fraction  of red cherry candies; any  is possible  Suppose unwrap N candies, c cherries and l = N - c limes  Likelihood  Maximize for  using log likelihood

ai in game programming it university of copenhagen E.g. 2  Gaussian model [often denoted by N(µ,  )]  Log likelihood is given by  If  is known, find maximum likelihood for µ  If µ is known, find maximum likelihood for 

ai in game programming it university of copenhagen Halfway Summary and Additional Remarks  Full Bayesian learning gives best possible predictions but is intractable  MAP selects single best hypothesis; prior is still used  Maximum likelihood assumes uniform prior, OK for large data sets  Choose parameterized family of models to describe the data  Write down likelihood of data as function of parameters  Write down derivative of log likelihood w.r.t. each parameter  Find parameter values such that the derivatives are zero  ML estimation may be hard / impossible; modern optimization techniques help  In games, data often becomes available sequentially; not necessary to train in one go

ai in game programming it university of copenhagen Outline  Bayesian learning √  Maximum a posteriori and maximum likelihood learning √  Instance-based learning  Neural networks

ai in game programming it university of copenhagen Instance-Based Learning  So far we saw statistical learning as parameter learning, i.e., given a specific parameter-dependent family of probability models fit it to the data by tweaking parameters  Often simple and effective  Fixed complexity  Maybe good for very little data

ai in game programming it university of copenhagen Instance-Based Learning  So far we saw statistical learning as parameter learning  Nonparametric learning methods allow hypothesis complexity to grow with the data  “The more data we have, the more ‘wigglier’ the hypothesis can be”

ai in game programming it university of copenhagen Nearest-Neighbor Method  Key idea : properties of an input point x are likely to be similar to points in the neighborhood of x  E.g. classification : estimate unknown class of x using classes of neighboring points  Simple, but how does one define what a neighborhood is?  One solution : find the k nearest neighbors  But now the problem is how to decide what nearest is...

ai in game programming it university of copenhagen k Nearest-Neighbor Classification  Check the class / output label of your k neighbors and simply take [for example] # of neighbors having class label x k as the posterior probability of having class label x  When assigning a single label : take MAP!

ai in game programming it university of copenhagen kNN Probability Density Estimation

ai in game programming it university of copenhagen Kernel Models  Idea : Put little density function [a kernel] in every data point and take the [normalized] sum of these  Somehow similar to kNN  Often providing comparable performance

ai in game programming it university of copenhagen Probability Density Estimation

ai in game programming it university of copenhagen Outline  Bayesian learning √  Maximum a posteriori and maximum likelihood learning √  Instance-based learning √  Neural networks

ai in game programming it university of copenhagen Neural Networks and Games

ai in game programming it university of copenhagen Neural Networks and Games

ai in game programming it university of copenhagen Neural Networks and Games

ai in game programming it university of copenhagen Neural Networks and Games

ai in game programming it university of copenhagen Neural Networks and Games

ai in game programming it university of copenhagen Neural Networks and Games

ai in game programming it university of copenhagen Neural Networks and Games

ai in game programming it university of copenhagen So First... Neural Networks  According to Robert Hecht-Nielsen, a neural network is simply “a computing system made up of a number of simple, highly interconnected processing elements, which process information by their dynamic state response to external inputs” Simply...  We skip the biology for now  And provide the bare basics