CSCI 121 Special Topics: Bayesian Networks Lecture #4: Learning in Bayes Nets.

Slides:



Advertisements
Similar presentations
Modeling of Data. Basic Bayes theorem Bayes theorem relates the conditional probabilities of two events A, and B: A might be a hypothesis and B might.
Advertisements

Image Modeling & Segmentation
ECE 8443 – Pattern Recognition LECTURE 05: MAXIMUM LIKELIHOOD ESTIMATION Objectives: Discrete Features Maximum Likelihood Resources: D.H.S: Chapter 3 (Part.
CSCI 121 Special Topics: Bayesian Networks Lecture #5: Dynamic Bayes Nets.
Ai in game programming it university of copenhagen Statistical Learning Methods Marco Loog.
Visual Recognition Tutorial
Lecture 14 – Neural Networks
Parameter Estimation: Maximum Likelihood Estimation Chapter 3 (Duda et al.) – Sections CS479/679 Pattern Recognition Dr. George Bebis.
Hidden Markov Model 11/28/07. Bayes Rule The posterior distribution Select k with the largest posterior distribution. Minimizes the average misclassification.
Statistical Learning: Bayesian and ML COMP155 Sections May 2, 2007.
Neural Networks Marco Loog.
Probability and Bayesian Networks
Fitting models to data. Step 5) Express the relationships mathematically in equations Step 6)Get values of parameters Determine what type of model you.
Goal: Reconstruct Cellular Networks Biocarta. Conditions Genes.
1er. Escuela Red ProTIC - Tandil, de Abril, Bayesian Learning 5.1 Introduction –Bayesian learning algorithms calculate explicit probabilities.
Data Mining CS 341, Spring 2007 Lecture 4: Data Mining Techniques (I)
Bayesian Learning and Learning Bayesian Networks.
. Approximate Inference Slides by Nir Friedman. When can we hope to approximate? Two situations: u Highly stochastic distributions “Far” evidence is discarded.
Computer vision: models, learning and inference
Rutgers CS440, Fall 2003 Introduction to Statistical Learning Reading: Ch. 20, Sec. 1-4, AIMA 2 nd Ed.
© Daniel S. Weld 1 Statistical Learning CSE 573 Lecture 16 slides which overlap fix several errors.
CS Bayesian Learning1 Bayesian Learning. CS Bayesian Learning2 States, causes, hypotheses. Observations, effect, data. We need to reconcile.
Particle Filtering. Sensors and Uncertainty Real world sensors are noisy and suffer from missing data (e.g., occlusions, GPS blackouts) Use sensor models.
CSCI 121 Special Topics: Bayesian Network Lecture #1: Reasoning Under Uncertainty.
More Machine Learning Linear Regression Squared Error L1 and L2 Regularization Gradient Descent.
Midterm Review Rao Vemuri 16 Oct Posing a Machine Learning Problem Experience Table – Each row is an instance – Each column is an attribute/feature.
A survey on using Bayes reasoning in Data Mining Directed by : Dr Rahgozar Mostafa Haghir Chehreghani.
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
EM and expected complete log-likelihood Mixture of Experts
Naive Bayes Classifier
CMSC 471 Spring 2014 Class #16 Thursday, March 27, 2014 Machine Learning II Professor Marie desJardins,
CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 11: Bayesian learning continued Geoffrey Hinton.
Unsupervised Learning: Clustering Some material adapted from slides by Andrew Moore, CMU. Visit for
1 Instance-Based & Bayesian Learning Chapter Some material adapted from lecture notes by Lise Getoor and Ron Parr.
Bayesian Learning Chapter Some material adapted from lecture notes by Lise Getoor and Ron Parr.
Statistical Learning (From data to distributions).
Modelling Language Evolution Lecture 1: Introduction to Learning Simon Kirby University of Edinburgh Language Evolution & Computation Research Unit.
Made by: Maor Levy, Temple University  Inference in Bayes Nets ◦ What is the probability of getting a strong letter? ◦ We want to compute the.
1 CMSC 671 Fall 2001 Class #25-26 – Tuesday, November 27 / Thursday, November 29.
Chapter 6 Bayesian Learning
Learning In Bayesian Networks. General Learning Problem Set of random variables X = {X 1, X 2, X 3, X 4, …} Training set D = { X (1), X (2), …, X (N)
Inference Algorithms for Bayes Networks
Regress-itation Feb. 5, Outline Linear regression – Regression: predicting a continuous value Logistic regression – Classification: predicting a.
Bayesian Learning Provides practical learning algorithms
CPSC 7373: Artificial Intelligence Lecture 5: Probabilistic Inference Jiang Bian, Fall 2012 University of Arkansas at Little Rock.
1 Machine Learning: Lecture 6 Bayesian Learning (Based on Chapter 6 of Mitchell T.., Machine Learning, 1997)
Bayesian Learning Bayes Theorem MAP, ML hypotheses MAP learners
Naive Bayes Classifier. REVIEW: Bayesian Methods Our focus this lecture: – Learning and classification methods based on probability theory. Bayes theorem.
Stochasticity and Probability. A new approach to insight Pose question and think of the answer needed to answer it. Ask: How do the data arise? What is.
A Brief Introduction to Bayesian networks
CS479/679 Pattern Recognition Dr. George Bebis
Class #21 – Tuesday, November 10
CS 2750: Machine Learning Review
Qian Liu CSE spring University of Pennsylvania
Computer Science Department
Bayes Net Learning: Bayesian Approaches
CS 416 Artificial Intelligence
Data Mining Lecture 11.
CSCI 121 Special Topics: Bayesian Networks Lecture #2: Bayes Nets
Still More Uncertainty
Read R&N Ch Next lecture: Read R&N
Structure and Semantics of BN
CSCI 5822 Probabilistic Models of Human and Machine Learning
Professor Marie desJardins,
Bayesian Learning Chapter
Expectation-Maximization & Belief Propagation
Structure and Semantics of BN
Machine Learning: Lecture 6
Machine Learning: UNIT-3 CHAPTER-1
Read R&N Ch Next lecture: Read R&N
Presentation transcript:

CSCI 121 Special Topics: Bayesian Networks Lecture #4: Learning in Bayes Nets

Situation #1: Known Structure, All Variables Observable EarthquakeBurglary Alarm Mary callsJohn calls FFT..FFT.. FFF..FFF.. FFT..FFT.. FFT..FFT.. FFT..FFT.. Solution: Build probability tables directly from observations.

Situation #2: Known Structure, Some Variables Unobservable EarthquakeBurglary Alarm Mary callsJohn calls FFT..FFT.. FFF..FFF.. FFT..FFT.. FFT..FFT.. Solution: Bayesian learning through Maximum Likelihood Estimation ???..???..

Bayesian Learning Given: Data D, Hypotheses H 1, H 2,... Hn Want: Prediction for unknown quantity X E.g., D = almanac for past 100 years H i = chance of rain in May = 50% X = how much rain tomorrow?

Bayesian Learning Maximum a posteriori (MAP) hypothesis H MAP : The H i that maximizes P(H i | D) Use Bayes' Rule :

Bayesian Learning For a given set of hypotheses, P(D) is fixed. So we are maximizing P(D | H i ) P(H i ). P(H i ) is a philosophical issue: e.g., Ockham's Razor (simplest consistent hypothesis is best) So it all comes down to P(D | Hi): Maximum Likelihood Estimation

Maximum Likelihood Estimation For many problems, P(D | Hi) can’t be determined analytically (in one step, using algebra). In such cases, iterate gradient methods can be used to explore the space of possibilities. For example, each Hi might be a set of conditional probability table values (a.k.a. weights) We can visualize the effect of different weight values via a “hill-climbing” metaphorvisualize Mathematically, this is called gradient descent and involves …

Maximum Likelihood Estimation For many problems, P(D | Hi) can’t be determined analytically (in one step, using algebra). In such cases, iterate gradient methods can be used to explore the space of possibilities. For example, each Hi might be a set of conditional probability table values (a.k.a. weights) We can visualize the effect of different weight values via a “hill-climbing” metaphor Mathematically, this is called gradient descent and involves …calculus!

Situation #3: Unknown Structure EarthquakeBurglary Mary callsJohn calls FFT..FFT.. FFF..FFF.. FFT..FFT.. FFT..FFT.. Solution: Hidden variables + structure learning

Learning Structure: “Hidden” Variables Hidden variables: Unknown internal factors that we posit to explain the relationship between observed input and output Why not just figure out conditional probabilities directly from observables (burglary, earthquake) to observables (John calls, Mary calls)?

Why Hidden Variables? Hidden variables can yield a more compact model, making learning easier:

Learning Structure Problem: # G(N) of graphs of N nodes goes up explosively in N: A N = 1 G(N) = 1 A N = 2 G(N) = 3 B AB AB NG(N) * Solution: Local (greedy hill-climbing) or global (Monte Carlo) algorithms