Some Neat Results From Assignment 1. Assignment 1: Negative Examples (Rohit)

Slides:



Advertisements
Similar presentations
CS188: Computational Models of Human Behavior
Advertisements

ABSTRACT: We examine how to detect hidden variables when learning probabilistic models. This problem is crucial for for improving our understanding of.
CSE 473/573 Computer Vision and Image Processing (CVIP) Ifeoma Nwogu Lecture 27 – Overview of probability concepts 1.
Markov Networks Alan Ritter.
Learning with Missing Data
Graphical Models - Inference - Wolfram Burgard, Luc De Raedt, Kristian Kersting, Bernhard Nebel Albert-Ludwigs University Freiburg, Germany PCWP CO HRBP.
Bayesian Networks CSE 473. © Daniel S. Weld 2 Last Time Basic notions Atomic events Probabilities Joint distribution Inference by enumeration Independence.
Exact Inference in Bayes Nets
An introduction to machine learning and probabilistic graphical models
Learning: Parameter Estimation
Dynamic Bayesian Networks (DBNs)
For Monday Read chapter 18, sections 1-2 Homework: –Chapter 14, exercise 8 a-d.
Hidden Markov Models Reading: Russell and Norvig, Chapter 15, Sections
An Introduction to Variational Methods for Graphical Models.
Introduction of Probabilistic Reasoning and Bayesian Networks
EE462 MLCV Lecture Introduction of Graphical Models Markov Random Fields Segmentation Tae-Kyun Kim 1.
Chapter 8-3 Markov Random Fields 1. Topics 1. Introduction 1. Undirected Graphical Models 2. Terminology 2. Conditional Independence 3. Factorization.
Graphical Models - Learning -
Bayesian Networks - Intro - Wolfram Burgard, Luc De Raedt, Kristian Kersting, Bernhard Nebel Albert-Ludwigs University Freiburg, Germany PCWP CO HRBP HREKG.
Graphical Models - Inference -
Graphical Models - Modeling - Wolfram Burgard, Luc De Raedt, Kristian Kersting, Bernhard Nebel Albert-Ludwigs University Freiburg, Germany PCWP CO HRBP.
Hilbert Space Embeddings of Hidden Markov Models Le Song, Byron Boots, Sajid Siddiqi, Geoff Gordon and Alex Smola 1.
Bayesian Networks Chapter 2 (Duda et al.) – Section 2.11
Kevin Murphy MIT AI Lab 19 May 2003
Graphical Models - Inference - Wolfram Burgard, Luc De Raedt, Kristian Kersting, Bernhard Nebel Albert-Ludwigs University Freiburg, Germany PCWP CO HRBP.
Regulatory Network (Part II) 11/05/07. Methods Linear –PCA (Raychaudhuri et al. 2000) –NIR (Gardner et al. 2003) Nonlinear –Bayesian network (Friedman.
. PGM: Tirgul 10 Learning Structure I. Benefits of Learning Structure u Efficient learning -- more accurate models with less data l Compare: P(A) and.
Part 2 of 3: Bayesian Network and Dynamic Bayesian Network.
Goal: Reconstruct Cellular Networks Biocarta. Conditions Genes.
Bayesian Networks Alan Ritter.
CPSC 422, Lecture 18Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18 Feb, 25, 2015 Slide Sources Raymond J. Mooney University of.
Computer vision: models, learning and inference Chapter 10 Graphical Models.
. DAGs, I-Maps, Factorization, d-Separation, Minimal I-Maps, Bayesian Networks Slides by Nir Friedman.
Learning In Bayesian Networks. Learning Problem Set of random variables X = {W, X, Y, Z, …} Training set D = { x 1, x 2, …, x N }  Each observation specifies.
A Brief Introduction to Graphical Models
CSC2535 Spring 2013 Lecture 1: Introduction to Machine Learning and Graphical Models Geoffrey Hinton.
Bayesian Learning By Porchelvi Vijayakumar. Cognitive Science Current Problem: How do children learn and how do they get it right?
Learning Lateral Connections between Hidden Units Geoffrey Hinton University of Toronto in collaboration with Kejie Bao University of Toronto.
For Wednesday Read Chapter 11, sections 1-2 Program 2 due.
Introduction to Bayesian Networks
UIUC CS 498: Section EA Lecture #21 Reasoning in Artificial Intelligence Professor: Eyal Amir Fall Semester 2011 (Some slides from Kevin Murphy (UBC))
COMP 538 Reasoning and Decision under Uncertainty Introduction Readings: Pearl (1998, Chapter 1 Shafer and Pearl, Chapter 1.
Ch 8. Graphical Models Pattern Recognition and Machine Learning, C. M. Bishop, Revised by M.-O. Heo Summarized by J.W. Nam Biointelligence Laboratory,
Notes on Graphical Models Padhraic Smyth Department of Computer Science University of California, Irvine.
INTERVENTIONS AND INFERENCE / REASONING. Causal models  Recall from yesterday:  Represent relevance using graphs  Causal relevance ⇒ DAGs  Quantitative.
CS Statistical Machine learning Lecture 24
The famous “sprinkler” example (J. Pearl, Probabilistic Reasoning in Intelligent Systems, 1988)
Learning In Bayesian Networks. General Learning Problem Set of random variables X = {X 1, X 2, X 3, X 4, …} Training set D = { X (1), X (2), …, X (N)
CS498-EA Reasoning in AI Lecture #10 Instructor: Eyal Amir Fall Semester 2009 Some slides in this set were adopted from Eran Segal.
Lecture 2: Statistical learning primer for biologists
Bayesian networks and their application in circuit reliability estimation Erin Taylor.
Exact Inference in Bayes Nets. Notation U: set of nodes in a graph X i : random variable associated with node i π i : parents of node i Joint probability:
Guidance: Assignment 3 Part 1 matlab functions in statistics toolbox  betacdf, betapdf, betarnd, betastat, betafit.
1 CMSC 671 Fall 2001 Class #20 – Thursday, November 8.
Pattern Recognition and Machine Learning
04/21/2005 CS673 1 Being Bayesian About Network Structure A Bayesian Approach to Structure Discovery in Bayesian Networks Nir Friedman and Daphne Koller.
Bayesian Belief Propagation for Image Understanding David Rosenberg.
Bayesian Brain Probabilistic Approaches to Neural Coding 1.1 A Probability Primer Bayesian Brain Probabilistic Approaches to Neural Coding 1.1 A Probability.
Instructor: Eyal Amir Grad TAs: Wen Pu, Yonatan Bisk Undergrad TAs: Sam Johnson, Nikhil Johri CS 440 / ECE 448 Introduction to Artificial Intelligence.
CS 2750: Machine Learning Bayesian Networks Prof. Adriana Kovashka University of Pittsburgh March 14, 2016.
Bayesian Networks Chapter 2 (Duda et al.) – Section 2.11 CS479/679 Pattern Recognition Dr. George Bebis.
CS 2750: Machine Learning Directed Graphical Models
Qian Liu CSE spring University of Pennsylvania
Introduction to Artificial Intelligence
Introduction to Artificial Intelligence
CSCI 5822 Probabilistic Models of Human and Machine Learning
Read R&N Ch Next lecture: Read R&N
Bayesian Networks Read R&N Ch. 13.6,
Introduction to Artificial Intelligence
CSCI 5822 Probabilistic Models of Human and Machine Learning
Presentation transcript:

Some Neat Results From Assignment 1

Assignment 1: Negative Examples (Rohit)

Assignment 1: Noisy Observations (Nick) Z: true feature vector X: noisy observation X ~ Normal(z, s 2 ) We need to compute P(X|H) Φ: cumulative density fn of Gaussian

Assignment 1: Noisy Observations (Nick)

Guidance on Assignment 3

Guidance: Assignment 3 Part 1 matlab functions in statistics toolbox  betacdf, betapdf, betarnd, betastat, betafit

Guidance: Assignment 3 Part 2 You will explore the role of the priors. The Weiss model showed that priors play an important role when  observations are noisy  observations don’t provide strong constraints  there aren’t many observations.

Guidance: Assignment 3 Part 3 Implement model a bit like Weiss et al. (2002) Goal: infer motion (velocity) of a rigid shape from observations at two instances in time. Assume distinctive features that make it easy to identify the location of the feature at successive times.

Assignment 2 Guidance Bx: the x displacement of the blue square (= delta x in one unit of time) By: the y displacement of the blue square Rx: the x displacement of the red square Ry: the y displacement of the red square These observations are corrupted by measurement noise. Gaussian, mean zero, std deviation σ D: direction of motion (up, down, left, right) Assume only possibilities are one unit of motion in any direction

Assignment 2: Generative Model Same assumptions for Bx, By. Rx conditioned on D=up is drawn from a Gaussian

Assignment 2 Math Conditional independence

Assignment 2 Implementation Quiz: do we need worry about the Gaussian density function normalization term?

Introduction To Bayes Nets (Stuff stolen from Kevin Murphy, UBC, and Nir Friedman, HUJI)

What Do You Need To Do Probabilistic Inference In A Given Domain? Joint probability distribution over all variables in domain

Qualitative part Directed acyclic graph (DAG) Nodes: random vars. Edges: direct influence Quantitative part Set of conditional probability distributions e b e be b b e BE P(A | E,B) Family of Alarm Earthquake Radio Burglary Alarm Call Compact representation of joint probability distributions via conditional independence Together Define a unique distribution in a factored form Bayes Nets (a.k.a. Belief Nets) Figure from N. Friedman

What Is A Bayes Net? Earthquake Radio Burglary Alarm Call A node is conditionally independent of its ancestors given its parents. E.g., C is conditionally independent of R, E, and B given A Notation: C? R,B,E | A Quiz: What sort of parameter reduction do we get? From 2 5 – 1 = 31 parameters to =10

Conditional Distributions Are Flexible E.g., Earthquake and Burglary might have independent effects on Alarm A.k.a. noisy-or where p B and p E are alarm probability given burglary and earthquake alone This constraint reduces # free parameters to 8! Earthquake Burglary Alarm BEP(A|B,E) pEpE 10pBpB 11p E +p B -p E p B

Domain: Monitoring Intensive-Care Patients 37 variables 509 parameters …instead of 2 37 PCWP CO HRBP HREKG HRSAT ERRCAUTER HR HISTORY CATECHOL SAO2 EXPCO2 ARTCO2 VENTALV VENTLUNG VENITUBE DISCONNECT MINVOLSET VENTMACH KINKEDTUBE INTUBATIONPULMEMBOLUS PAPSHUNT ANAPHYLAXIS MINOVL PVSAT FIO2 PRESS INSUFFANESTHTPR LVFAILURE ERRBLOWOUTPUT STROEVOLUMELVEDVOLUME HYPOVOLEMIA CVP BP A Real Bayes Net: Alarm Figure from N. Friedman

More Real-World Bayes Net Applications “Microsoft’s competitive advantage lies in its expertise in Bayesian networks” -- Bill Gates, quoted in LA Times, 1996 MS Answer Wizards, (printer) troubleshooters Medical diagnosis Speech recognition (HMMs) Gene sequence/expression analysis Turbocodes (channel coding)

Why Are Bayes Nets Useful? Factored representation may have exponentially fewer parameters than full joint  Easier inference (lower time complexity)  Less data required for learning (lower sample complexity) Graph structure supports  Modular representation of knowledge  Local, distributed algorithms for inference and learning  Intuitive (possibly causal) interpretation  Strong theory about the nature of cognition or the generative process that produces observed data Can’t represent arbitrary contingencies among variables, so theory can be rejected by data

Reformulating Naïve Bayes As Graphical Model D RxRyBxBy Marginalizing over D Definition of conditional probability survive AgeClass Gender

Review: Bayes Net Nodes = random variables Links = expression of joint distribution Compare to full joint distribution by chain rule Earthquake Radio Burglary Alarm Call

Bayesian Analysis Make inferences from data using probability models about quantities we want to predict  E.g., expected age of death given 51 yr old  E.g., latent topics in document  E.g., What direction is the motion? Set up full probability model that characterizes distribution over all quantities (observed and unobserved)  incorporates prior beliefs Condition model on observed data to compute posterior distribution 1. Evaluate fit of model to data  adjust model parameters to achieve better fits

Inference Computing posterior probabilities – Probability of hidden events given any evidence Most likely explanation – Scenario that explains evidence Rational decision making – Maximize expected utility – Value of Information Effect of intervention – Causal analysis Earthquake Radio Burglary Alarm Call Radio Call Figure from N. Friedman Explaining away effect

Conditional Independence A node is conditionally independent of its ancestors given its parents.  Example? What about conditional independence between variables that aren’t directly connected?  e.g., Earthquake and Burglary?  e.g., Burglary and Radio? Earthquake Radio Burglary Alarm Call

d-separation Criterion for deciding if nodes are conditionally independent. A path from node u to node v is d-separated by a node z if the path matches one of these templates: uzv uzv uzv uzv z z z observed unobserved

d-separation Think about d-separation as breaking a chain. If any link on a chain is broken, the whole chain is broken uzv uzv uzv uzv z u u u u v v v v xzy xzy xzy xzy z

d-separation Along Paths Are u and v d-separated? uzv uzv uzv uzv z u v z z u v zz u v zz d separated Not d separated

Conditional Independence Nodes u and v are conditionally independent given set Z if all (undirected) paths between u and v are d- separated by Z. E.g., uv z z z

PCWP CO HRBP HREKG HRSAT ERRCAUTER HR HISTORY CATECHOL SAO2 EXPCO2 ARTCO2 VENTALV VENTLUNG VENITUBE DISCONNECT MINVOLSET VENTMACH KINKEDTUBE INTUBATIONPULMEMBOLUS PAPSHUNT ANAPHYLAXIS MINOVL PVSAT FIO2 PRESS INSUFFANESTHTPR LVFAILURE ERRBLOWOUTPUT STROEVOLUMELVEDVOLUME HYPOVOLEMIA CVP BP

PCWP CO HRBP HREKG HRSAT ERRCAUTER HR HISTORY CATECHOL SAO2 EXPCO2 ARTCO2 VENTALV VENTLUNG VENITUBE DISCONNECT MINVOLSET VENTMACH KINKEDTUBE INTUBATIONPULMEMBOLUS PAPSHUNT ANAPHYLAXIS MINOVL PVSAT FIO2 PRESS INSUFFANESTHTPR LVFAILURE ERRBLOWOUTPUT STROEVOLUMELVEDVOLUME HYPOVOLEMIA CVP BP

Sufficiency For Conditional Independence: Markov Blanket The Markov blanket of node u consists of the parents, children, and children’s parents of u P(u|MB(u),v) = P(u|MB(u)) u

Probabilistic Models Probabilistic models DirectedUndirected Graphical models Alarm network State-space models HMMs Naïve Bayes classifier PCA/ ICA Markov Random Field Boltzmann machine Ising model Max-ent model Log-linear models (Bayesian belief nets)(Markov nets)

Turning A Directed Graphical Model Into An Undirected Model Via Moralization Moralization: connect all parents of each node and remove arrows

Toy Example Of A Markov Net X1X1 X2X2 X5X5 X3X3 X4X4 e.g., X 1 ? X 4, X 5 | X 2, X 3 X i ? X rest | X nbrs Potential function Partition function Maximal clique: largest subset of vertices such that each pair is connected by an edge Clique

A Real Markov Net Estimate P(x 1, …, x n | y 1, …, y n ) Ψ (x i, y i ) = P(y i | x i ): local evidence likelihood Ψ (x i, x j ) = exp(-J(x i, x j )): compatibility matrix Observed pixels Latent causes

Example Of Image Segmentation With MRFs Sziranyi et al. (2000)

Graphical Models Are A Useful Formalism E.g., feedforward neural net with noise, sigmoid belief net Hidden layer Input layer Output layer

Graphical Models Are A Useful Formalism E.g., Restricted Boltzmann machine (Hinton) Also known as Harmony network (Smolensky) Hidden units Visible units

Graphical Models Are A Useful Formalism E.g., Gaussian Mixture Model

Graphical Models Are A Useful Formalism E.g., dynamical (time varying) models in which data arrives sequentially or output is produced as a sequence  Dynamic Bayes nets (DBNs) can be used to model such time-series (sequence) data  Special cases of DBNs include Hidden Markov Models (HMMs) State-space models

Hidden Markov Model (HMM) Y1Y1 Y3Y3 X1X1 X2X2 X3X3 Y2Y2 Phones/ words acoustic signal transition matrix Gaussian observations

State-Space Model (SSM)/ Linear Dynamical System (LDS) Y1Y1 Y3Y3 X1X1 X2X2 X3X3 Y2Y2 “True” state Noisy observations

Example: LDS For 2D Tracking Q3Q3 R1R1 R3R3 R2R2 Q1Q1 Q2Q2 X1X1 X1X1 X2X2 X2X2 X1X1 X2X2 y1y1 y1y1 y2y2 y2y2 y2y2 y1y1 o o o o sparse linear-Gaussian system

Kalman Filtering (Recursive State Estimation In An LDS) Y1Y1 Y3Y3 X1X1 X2X2 X3X3 Y2Y2 Estimate P(X t |y 1:t ) from P(X t-1 |y 1:t-1 ) and y t Predict: P(X t |y 1:t-1 ) = s Xt-1 P(X t |X t-1 ) P(X t-1 |y 1:t-1 ) Update: P(X t |y 1:t ) / P(y t |X t ) P(X t |y 1:t-1 )

Mike’s Project From Last Year G X student trial α P δ problem IRT model

Mike’s Project From Last Year X student trial L0L0 T τ GS BKT model

Mike’s Project From Last Year X γσ student trial L0L0 T τ α P δ problem η GS IRT+BKT model