CSCI 5822 Probabilistic Models of Human and Machine Learning

Slides:



Advertisements
Similar presentations
Basics of Statistical Estimation
Advertisements

A Tutorial on Learning with Bayesian Networks
Probabilistic models Haixu Tang School of Informatics.
INTRODUCTION TO MACHINE LEARNING Bayesian Estimation.
Bayesian Learning Provides practical learning algorithms
Bayesian inference “Very much lies in the posterior distribution” Bayesian definition of sufficiency: A statistic T (x 1, …, x n ) is sufficient for 
Next Semester CSCI 5622 – Machine learning (Matt Wilder)  great text by Hastie, Tibshirani, & Friedman great text ECEN 5018 – Game Theory ECEN 5322 –
Learning Bayesian Networks. Dimensions of Learning ModelBayes netMarkov net DataCompleteIncomplete StructureKnownUnknown ObjectiveGenerativeDiscriminative.
Flipping A Biased Coin Suppose you have a coin with an unknown bias, θ ≡ P(head). You flip the coin multiple times and observe the outcome. From observations,
Regulatory Network (Part II) 11/05/07. Methods Linear –PCA (Raychaudhuri et al. 2000) –NIR (Gardner et al. 2003) Nonlinear –Bayesian network (Friedman.
Lecture 5: Learning models using EM
CS 547: Sensing and Planning in Robotics Gaurav S. Sukhatme Computer Science Robotic Embedded Systems Laboratory University of Southern California
Basics of Statistical Estimation. Learning Probabilities: Classical Approach Simplest case: Flipping a thumbtack tails heads True probability  is unknown.
Learning Bayesian Networks. Dimensions of Learning ModelBayes netMarkov net DataCompleteIncomplete StructureKnownUnknown ObjectiveGenerativeDiscriminative.
Haimonti Dutta, Department Of Computer And Information Science1 David HeckerMann A Tutorial On Learning With Bayesian Networks.
Presenting: Assaf Tzabari
. PGM: Tirgul 10 Parameter Learning and Priors. 2 Why learning? Knowledge acquisition bottleneck u Knowledge acquisition is an expensive process u Often.
Maximum Likelihood (ML), Expectation Maximization (EM)
Learning Bayesian Networks
Thanks to Nir Friedman, HU
Bayesian Analysis for Extreme Events Pao-Shin Chu and Xin Zhao Department of Meteorology School of Ocean & Earth Science & Technology University of Hawaii-
Learning Bayesian Networks (From David Heckerman’s tutorial)
1 Learning with Bayesian Networks Author: David Heckerman Presented by Yan Zhang April
Learning In Bayesian Networks. Learning Problem Set of random variables X = {W, X, Y, Z, …} Training set D = { x 1, x 2, …, x N }  Each observation specifies.
Jeff Howbert Introduction to Machine Learning Winter Classification Bayesian Classifiers.
Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.
Learning Structure in Bayes Nets (Typically also learn CPTs here) Given the set of random variables (features), the space of all possible networks.
Estimating parameters in a statistical model Likelihood and Maximum likelihood estimation Bayesian point estimates Maximum a posteriori point.
Comparison of Bayesian Neural Networks with TMVA classifiers Richa Sharma, Vipin Bhatnagar Panjab University, Chandigarh India-CMS March, 2009 Meeting,
CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 11: Bayesian learning continued Geoffrey Hinton.
1 Let X represent a Binomial r.v as in (3-42). Then from (2-30) Since the binomial coefficient grows quite rapidly with n, it is difficult to compute (4-1)
Learning With Bayesian Networks Markus Kalisch ETH Zürich.
Announcements Spring Courses Somewhat Relevant to Machine Learning 5314: Algorithms for molecular bio (who’s teaching?) 5446: Chaotic dynamics (Bradley)
Slides for “Data Mining” by I. H. Witten and E. Frank.
Learning In Bayesian Networks. General Learning Problem Set of random variables X = {X 1, X 2, X 3, X 4, …} Training set D = { X (1), X (2), …, X (N)
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Supervised Learning Resources: AG: Conditional Maximum Likelihood DP:
IE 300, Fall 2012 Richard Sowers IESE. 8/30/2012 Goals: Rules of Probability Counting Equally likely Some examples.
1 Machine Learning: Lecture 6 Bayesian Learning (Based on Chapter 6 of Mitchell T.., Machine Learning, 1997)
Statistical NLP: Lecture 4 Mathematical Foundations I: Probability Theory (Ch2)
Spring 2014 Course PSYC Thinking Proseminar – Matt Jones Provides beginning Ph.D. students with a basic introduction to research on complex human.
Bayesian Hierarchical Clustering Paper by K. Heller and Z. Ghahramani ICML 2005 Presented by David Williams Paper Discussion Group ( )
CS Ensembles and Bayes1 Ensembles, Model Combination and Bayesian Combination.
SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.
Bayesian Belief Network AI Contents t Introduction t Bayesian Network t KDD Data.
Generalization Performance of Exchange Monte Carlo Method for Normal Mixture Models Kenji Nagata, Sumio Watanabe Tokyo Institute of Technology.
Oliver Schulte Machine Learning 726
Probability Theory and Parameter Estimation I
Bayesian data analysis
ICS 280 Learning in Graphical Models
Bayes Net Learning: Bayesian Approaches
CS 416 Artificial Intelligence
Oliver Schulte Machine Learning 726
Basic Probability Theory
Data Mining Lecture 11.
An Overview of Learning Bayes Nets From Data
CSCI 5822 Probabilistic Models of Human and Machine Learning
Markov Networks.
Bayesian Networks: Motivation
CSCI 5822 Probabilistic Models of Human and Machine Learning
More about Posterior Distributions
Bayesian Models in Machine Learning
CSCI 5822 Probabilistic Models of Human and Machine Learning
Computing and Statistical Data Analysis / Stat 8
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
Instructors: Fei Fang (This Lecture) and Dave Touretzky
Statistical NLP: Lecture 4
CSCI 5822 Probabilistic Models of Human and Machine Learning
Parametric Methods Berlin Chen, 2005 References:
Machine Learning: Lecture 6
Machine Learning: UNIT-3 CHAPTER-1
Presentation transcript:

CSCI 5822 Probabilistic Models of Human and Machine Learning Mike Mozer Department of Computer Science and Institute of Cognitive Science University of Colorado at Boulder

Learning Structure and Parameters The principle Treat network structure, Sh, as a discrete RV Calculate structure posterior Integrate over uncertainty in structure to predict The practice Computing marginal likelihood, p(D|Sh), can be difficult. Learning structure can be impractical due to the large number of hypotheses (more than exponential in # of nodes)

source: www.bayesnets.com

Approach to Structure Learning model selection find a good model, and treat it as the correct model selective model averaging select a manageable number of candidate models and pretend that these models are exhaustive Experimentally, both of these approaches produce good results. i.e., good generalization

SLIDES STOLEN FROM DAVID HECKERMAN

Interpretation of Marginal Likelihood Using chain rule for probabilities Maximizing marginal likelihood also maximizes sequential prediction ability! Relation to leave-one-out cross validation Problems with cross validation can overfit the data, possibly because of interchanges (each item is used for training and for testing each other item) has a hard time dealing with temporal sequence data

Coin Example

αh, αt, #h, and #t all indexed by these conditions

# parent config # node states # nodes

Computation of Marginal Likelihood Efficient closed form solution if no missing data (including no hidden variables) mutual independence of parameters θ local distribution functions from the exponential family (binomial, Poisson, gamma, Gaussian, etc.) conjugate priors

Computation of Marginal Likelihood Approximation techniques must be used otherwise. E.g., for missing data can use Gibbs sampling or Gaussian approximation described earlier. Bayes theorem 1. Evaluate numerator directly, estimate denominator using Gibbs sampling 2. For large amounts of data, numerator can be approximated by a multivariate Gaussian

Structure Priors Hypothesis equivalence identify equivalence class of a given network structure All possible structures equally likely Partial specification: required and prohibited arcs (based on causal knowledge) Ordering of variables + independence assumptions ordering based on e.g., temporal precedence presence or absence of arcs are mutually independent -> n(n-1)/2 priors p(m) ~ similarity(m, prior Belief Net)

Parameter Priors all uniform: Beta(1,1) use a prior Belief Net parameters depend only on local structure

Model Search Finding the belief net structure with highest score among those structures with at most k parents is NP-hard for k > 1 (Chickering, 1995) Sequential search add, remove, reverse arcs ensure no directed cycles efficient in that changes to arcs affect only some components of p(D|M) Heuristic methods greedy greedy with restarts MCMC / simulated annealing

two most likely structures

2x1010