CSCI 121 Special Topics: Bayesian Networks Lecture #4: Learning in Bayes Nets.

Slides:

Advertisements

Similar presentations

Modeling of Data. Basic Bayes theorem Bayes theorem relates the conditional probabilities of two events A, and B: A might be a hypothesis and B might.

Advertisements

Image Modeling & Segmentation

ECE 8443 – Pattern Recognition LECTURE 05: MAXIMUM LIKELIHOOD ESTIMATION Objectives: Discrete Features Maximum Likelihood Resources: D.H.S: Chapter 3 (Part.

CSCI 121 Special Topics: Bayesian Networks Lecture #5: Dynamic Bayes Nets.

Ai in game programming it university of copenhagen Statistical Learning Methods Marco Loog.

Visual Recognition Tutorial

Lecture 14 – Neural Networks

Parameter Estimation: Maximum Likelihood Estimation Chapter 3 (Duda et al.) – Sections CS479/679 Pattern Recognition Dr. George Bebis.

Hidden Markov Model 11/28/07. Bayes Rule The posterior distribution Select k with the largest posterior distribution. Minimizes the average misclassification.

Statistical Learning: Bayesian and ML COMP155 Sections May 2, 2007.

Neural Networks Marco Loog.

Probability and Bayesian Networks

Fitting models to data. Step 5) Express the relationships mathematically in equations Step 6)Get values of parameters Determine what type of model you.

Goal: Reconstruct Cellular Networks Biocarta. Conditions Genes.

1er. Escuela Red ProTIC - Tandil, de Abril, Bayesian Learning 5.1 Introduction –Bayesian learning algorithms calculate explicit probabilities.

Data Mining CS 341, Spring 2007 Lecture 4: Data Mining Techniques (I)

Bayesian Learning and Learning Bayesian Networks.

. Approximate Inference Slides by Nir Friedman. When can we hope to approximate? Two situations: u Highly stochastic distributions “Far” evidence is discarded.

Computer vision: models, learning and inference

Rutgers CS440, Fall 2003 Introduction to Statistical Learning Reading: Ch. 20, Sec. 1-4, AIMA 2 nd Ed.

© Daniel S. Weld 1 Statistical Learning CSE 573 Lecture 16 slides which overlap fix several errors.

CS Bayesian Learning1 Bayesian Learning. CS Bayesian Learning2 States, causes, hypotheses. Observations, effect, data. We need to reconcile.

Particle Filtering. Sensors and Uncertainty Real world sensors are noisy and suffer from missing data (e.g., occlusions, GPS blackouts) Use sensor models.

CSCI 121 Special Topics: Bayesian Network Lecture #1: Reasoning Under Uncertainty.

More Machine Learning Linear Regression Squared Error L1 and L2 Regularization Gradient Descent.

Midterm Review Rao Vemuri 16 Oct Posing a Machine Learning Problem Experience Table – Each row is an instance – Each column is an attribute/feature.

A survey on using Bayes reasoning in Data Mining Directed by : Dr Rahgozar Mostafa Haghir Chehreghani.

Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.

EM and expected complete log-likelihood Mixture of Experts

Naive Bayes Classifier

CMSC 471 Spring 2014 Class #16 Thursday, March 27, 2014 Machine Learning II Professor Marie desJardins,

CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 11: Bayesian learning continued Geoffrey Hinton.

Unsupervised Learning: Clustering Some material adapted from slides by Andrew Moore, CMU. Visit for

1 Instance-Based & Bayesian Learning Chapter Some material adapted from lecture notes by Lise Getoor and Ron Parr.

Bayesian Learning Chapter Some material adapted from lecture notes by Lise Getoor and Ron Parr.

Statistical Learning (From data to distributions).

Modelling Language Evolution Lecture 1: Introduction to Learning Simon Kirby University of Edinburgh Language Evolution & Computation Research Unit.

Made by: Maor Levy, Temple University  Inference in Bayes Nets ◦ What is the probability of getting a strong letter? ◦ We want to compute the.

1 CMSC 671 Fall 2001 Class #25-26 – Tuesday, November 27 / Thursday, November 29.

Chapter 6 Bayesian Learning

Learning In Bayesian Networks. General Learning Problem Set of random variables X = {X 1, X 2, X 3, X 4, …} Training set D = { X (1), X (2), …, X (N)

Inference Algorithms for Bayes Networks

Regress-itation Feb. 5, Outline Linear regression – Regression: predicting a continuous value Logistic regression – Classification: predicting a.

Bayesian Learning Provides practical learning algorithms

CPSC 7373: Artificial Intelligence Lecture 5: Probabilistic Inference Jiang Bian, Fall 2012 University of Arkansas at Little Rock.

1 Machine Learning: Lecture 6 Bayesian Learning (Based on Chapter 6 of Mitchell T.., Machine Learning, 1997)

Bayesian Learning Bayes Theorem MAP, ML hypotheses MAP learners

Naive Bayes Classifier. REVIEW: Bayesian Methods Our focus this lecture: – Learning and classification methods based on probability theory. Bayes theorem.

Stochasticity and Probability. A new approach to insight Pose question and think of the answer needed to answer it. Ask: How do the data arise? What is.

A Brief Introduction to Bayesian networks

CS479/679 Pattern Recognition Dr. George Bebis

Class #21 – Tuesday, November 10

CS 2750: Machine Learning Review

Qian Liu CSE spring University of Pennsylvania

Computer Science Department

Bayes Net Learning: Bayesian Approaches

CS 416 Artificial Intelligence

Data Mining Lecture 11.

CSCI 121 Special Topics: Bayesian Networks Lecture #2: Bayes Nets

Still More Uncertainty

Read R&N Ch Next lecture: Read R&N

Structure and Semantics of BN

CSCI 5822 Probabilistic Models of Human and Machine Learning

Professor Marie desJardins,

Bayesian Learning Chapter

Expectation-Maximization & Belief Propagation

Structure and Semantics of BN

Machine Learning: Lecture 6

Machine Learning: UNIT-3 CHAPTER-1

Read R&N Ch Next lecture: Read R&N

Presentation transcript:

CSCI 121 Special Topics: Bayesian Networks Lecture #4: Learning in Bayes Nets

Situation #1: Known Structure, All Variables Observable EarthquakeBurglary Alarm Mary callsJohn calls FFT..FFT.. FFF..FFF.. FFT..FFT.. FFT..FFT.. FFT..FFT.. Solution: Build probability tables directly from observations.

Situation #2: Known Structure, Some Variables Unobservable EarthquakeBurglary Alarm Mary callsJohn calls FFT..FFT.. FFF..FFF.. FFT..FFT.. FFT..FFT.. Solution: Bayesian learning through Maximum Likelihood Estimation ???..???..

Bayesian Learning Given: Data D, Hypotheses H 1, H 2,... Hn Want: Prediction for unknown quantity X E.g., D = almanac for past 100 years H i = chance of rain in May = 50% X = how much rain tomorrow?

Bayesian Learning Maximum a posteriori (MAP) hypothesis H MAP : The H i that maximizes P(H i | D) Use Bayes' Rule :

Bayesian Learning For a given set of hypotheses, P(D) is fixed. So we are maximizing P(D | H i ) P(H i ). P(H i ) is a philosophical issue: e.g., Ockham's Razor (simplest consistent hypothesis is best) So it all comes down to P(D | Hi): Maximum Likelihood Estimation

Maximum Likelihood Estimation For many problems, P(D | Hi) can’t be determined analytically (in one step, using algebra). In such cases, iterate gradient methods can be used to explore the space of possibilities. For example, each Hi might be a set of conditional probability table values (a.k.a. weights) We can visualize the effect of different weight values via a “hill-climbing” metaphorvisualize Mathematically, this is called gradient descent and involves …

Maximum Likelihood Estimation For many problems, P(D | Hi) can’t be determined analytically (in one step, using algebra). In such cases, iterate gradient methods can be used to explore the space of possibilities. For example, each Hi might be a set of conditional probability table values (a.k.a. weights) We can visualize the effect of different weight values via a “hill-climbing” metaphor Mathematically, this is called gradient descent and involves …calculus!

Situation #3: Unknown Structure EarthquakeBurglary Mary callsJohn calls FFT..FFT.. FFF..FFF.. FFT..FFT.. FFT..FFT.. Solution: Hidden variables + structure learning

Learning Structure: “Hidden” Variables Hidden variables: Unknown internal factors that we posit to explain the relationship between observed input and output Why not just figure out conditional probabilities directly from observables (burglary, earthquake) to observables (John calls, Mary calls)?

Why Hidden Variables? Hidden variables can yield a more compact model, making learning easier:

Learning Structure Problem: # G(N) of graphs of N nodes goes up explosively in N: A N = 1 G(N) = 1 A N = 2 G(N) = 3 B AB AB NG(N) * Solution: Local (greedy hill-climbing) or global (Monte Carlo) algorithms