PatReco: Bayesian Networks Alexandros Potamianos Dept of ECE, Tech. Univ. of Crete Fall 2004-2005.

Slides:



Advertisements
Similar presentations
CSE 473/573 Computer Vision and Image Processing (CVIP) Ifeoma Nwogu Lecture 27 – Overview of probability concepts 1.
Advertisements

Junction Tree Algorithm Brookes Vision Reading Group.
GS 540 week 6. HMM basics Given a sequence, and state parameters: – Each possible path through the states has a certain probability of emitting the sequence.
Overview Full Bayesian Learning MAP learning
Bayesian Networks Chapter 2 (Duda et al.) – Section 2.11
PatReco: Hidden Markov Models Alexandros Potamianos Dept of ECE, Tech. Univ. of Crete Fall
Parameter Estimation: Maximum Likelihood Estimation Chapter 3 (Duda et al.) – Sections CS479/679 Pattern Recognition Dr. George Bebis.
First introduced in 1977 Lots of mathematical derivation Problem : given a set of data (data is incomplete or having missing values). Goal : assume the.
PatReco: Estimation/Training Alexandros Potamianos Dept of ECE, Tech. Univ. of Crete Fall
Lecture 5: Learning models using EM
Non Parametric Classifiers Alexandros Potamianos Dept of ECE, Tech. Univ. of Crete Fall
Unsupervised Training and Clustering Alexandros Potamianos Dept of ECE, Tech. Univ. of Crete Fall
Review Alexandros Potamianos Dept of ECE, Tech. Univ. of Crete Fall
PatReco: Detection Alexandros Potamianos Dept of ECE, Tech. Univ. of Crete Fall
PatReco: Bayes Classifier and Discriminant Functions Alexandros Potamianos Dept of ECE, Tech. Univ. of Crete Fall
Bayesian Networks Alan Ritter.
Computer vision: models, learning and inference Chapter 10 Graphical Models.
Bayes Nets. Bayes Nets Quick Intro Topic of much current research Models dependence/independence in probability distributions Graph based - aka “graphical.
PatReco: Discriminant Functions for Gaussians Alexandros Potamianos Dept of ECE, Tech. Univ. of Crete Fall
Pattern Recognition Applications Alexandros Potamianos Dept of ECE, Tech. Univ. of Crete Fall
1 Bayesian Networks Chapter ; 14.4 CS 63 Adapted from slides by Tim Finin and Marie desJardins. Some material borrowed from Lise Getoor.
A Brief Introduction to Graphical Models
Bayesian Belief Network Compiled By: Raj Gaurang Tiwari Assistant Professor SRMGPC, Lucknow.
BINF6201/8201 Hidden Markov Models for Sequence Analysis
Probabilistic graphical models. Graphical models are a marriage between probability theory and graph theory (Michael Jordan, 1998) A compact representation.
第十讲 概率图模型导论 Chapter 10 Introduction to Probabilistic Graphical Models
Bayesian Networks What is the likelihood of X given evidence E? i.e. P(X|E) = ?
Introduction to Bayesian Networks
Ch 8. Graphical Models Pattern Recognition and Machine Learning, C. M. Bishop, Revised by M.-O. Heo Summarized by J.W. Nam Biointelligence Laboratory,
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.
METU Informatics Institute Min720 Pattern Classification with Bio-Medical Applications Lecture notes 9 Bayesian Belief Networks.
CS Statistical Machine learning Lecture 24
Slides for “Data Mining” by I. H. Witten and E. Frank.
1 CONTEXT DEPENDENT CLASSIFICATION  Remember: Bayes rule  Here: The class to which a feature vector belongs depends on:  Its own value  The values.
The famous “sprinkler” example (J. Pearl, Probabilistic Reasoning in Intelligent Systems, 1988)
Lecture 2: Statistical learning primer for biologists
1 Hidden Markov Models (HMMs). 2 Definition Hidden Markov Model is a statistical model where the system being modeled is assumed to be a Markov process.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Elements of a Discrete Model Evaluation.
Maximum Entropy Model, Bayesian Networks, HMM, Markov Random Fields, (Hidden/Segmental) Conditional Random Fields.
Christopher M. Bishop, Pattern Recognition and Machine Learning 1.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.
Dynamic Programming & Hidden Markov Models. Alan Yuille Dept. Statistics UCLA.
Hidden Markov Models. A Hidden Markov Model consists of 1.A sequence of states {X t |t  T } = {X 1, X 2,..., X T }, and 2.A sequence of observations.
An Algorithm to Learn the Structure of a Bayesian Network Çiğdem Gündüz Olcay Taner Yıldız Ethem Alpaydın Computer Engineering Taner Bilgiç Industrial.
PatReco: Introduction Alexandros Potamianos Dept of ECE, Tech. Univ. of Crete Fall
Bayesian Networks Chapter 2 (Duda et al.) – Section 2.11 CS479/679 Pattern Recognition Dr. George Bebis.
Hidden Markov Models BMI/CS 576
CS479/679 Pattern Recognition Dr. George Bebis
Qian Liu CSE spring University of Pennsylvania
Today.
Read R&N Ch Next lecture: Read R&N
Learning Bayesian Network Models from Data
Irina Rish IBM T.J.Watson Research Center
Bayesian Networks: Motivation
Read R&N Ch Next lecture: Read R&N
Hidden Markov Models Part 2: Algorithms
CS498-EA Reasoning in AI Lecture #20
'Linear Hierarchical Models'
Hidden Markov Model LR Rabiner
Pattern Recognition and Image Analysis
CONTEXT DEPENDENT CLASSIFICATION
EE513 Audio Signals and Systems
Markov Random Fields Presented by: Vladan Radosavljevic.
Class #16 – Tuesday, October 26
LECTURE 15: REESTIMATION, EM AND MIXTURES
Read R&N Ch Next lecture: Read R&N
Read R&N Ch Next lecture: Read R&N
A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models Jeff A. Bilmes International.
A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models Jeff A. Bilmes International.
Presentation transcript:

PatReco: Bayesian Networks Alexandros Potamianos Dept of ECE, Tech. Univ. of Crete Fall

Definitions  Bayesian networks consist of nodes and (usually directional) arcs  Nodes or states represent a classification class or in general events and are described with a pdf  Arcs represent relations between arcs, e.g., cause and effect, time sequence  Two nodes that are connected via another node are conditionally independent (given that node)

When to use Bayesian nets  Bayesian networks (or networks of inference) are statistical models that are used for classification (or in general pattern recognition) problems where there are dependencies among classes, e.g., time dependencies, cause and effect dependencies

Conditional Independence  Full independence between A and B P(A|B) = P(A) or P(A,B) = P(A) P(B)  Conditional independence of A, B given C P(A|BC) = P(A|C) or P(A,B|C) = P(A|C)P(B|C)

Conditional Independence A, C independent given B P(C|BA) = P(C|B) B,C independent given A P(B,C|A) = P(B|A)P(C|A) A,C dependent given B P(A,C|B) cannot be reduced! A B C A B C A BC

Three problems 1.Probability computation (use independence) 2.Training/Parameter Estimation Maximum likelihood (ML) if all is observable Expectation maximization (EM) if missing data 3.Inference (Testing) Diagnosis P(cause|effect)bottom-up PredictionP(effect|cause)top-down

Probability Computation For a Bayesian Network that consists of N nodes: 1.Compute P(n 1, n 2..n N ) using chain rule starting from the “last/bottom” node and working your way up P(n 1, n 2..n N ) = P(n N | n 1, n 2.. n N-1 ) P(n N-1 |n 1, n 2.. n N-2 ) … P(n 2 |n 1 ) P(n 1 ) 2.Identify conditional independence conditions from Bayesian network topology 3.Simplify the conditionals probabilities using independence conditions

Probability Computation Topology: P(C,S,R,W) = P(W|C,S,R) P(S|CR) P(R|C)P(C) Independent: (W,C)|S,R(S,R)|C Dependent: (S,R)|W P(C,S,R,W) = P(W|S,R) P(S|C) P(R|C) P(C) C S W R

Probability Computation  There are general algorithms for identifying cliques in the Bayesian net  Cliques are islands of conditional dependence, i.e., terms in the probability computation that cannot be further reduced SC WSR RC

Training/Parameter Estimation  Instead of estimating the joint pdf of the whole network the joint pdf of each of the cliques is estimated  For example if the network joint pdf is P(C,S,R,W) = P(W|S,R) P(S|C) P(R|C) P(C) instead of computing P(C,S,R,W) we compute each of P(W|S,R), P(S|C), P(R|C), P(C) for all possible values of W, S, R, C (much simpler)

Training/Parameter Estimation  For fully observable data and discrete probabilities compute maximum likelihood estimates of parameters, e.g., for discrete probs counts(W=1,S=1,R=0) P(W=1|S=1,R=0) ML = _______________________ counts(W=*,S=1,R=0)

Training/Parameter Estimation  Example: the following observations pairs are given for (W,C,S,R): (1,0,1,0), (0,0,1,0),(1,1,1,0),(0,1,1,0),(1,0,1,0), (0,1,0,0),(1,0,0,1),(0,1,1,1),(1,1,1,0)  Using Maximum Likelihood Estimation: P(W=1|S=1,R=0) ML = #(1, *, 1, 0)/#(*,*,1,0) = 2/5 = 0.4

Training/Parameter Estimation  When data is non observable or missing the EM algorithm is employed  There are efficient implementations of the EM algorithm for Bayesian nets that operate on the clique network  When the topology of the Bayesian network is not known structural EM can be used

Inference  There are two types of inference (testing) Diagnosis P(cause|effect)bottom-up PredictionP(effect|cause)top-down Once  Once the parameters of the network are estimated the joint network pdf can be estimated for ALL possible network values  Inference is simply probability computation using the network pdf

Inference  For example P(W=1|C=1) = P(W=1,C=1) / P(C=1) where P(W=1,C=1) =  RS P(W=1,C=1,R=*,S=*) P(C=1) =  RWS P(W=*,C=1,R=*,S=*)

Inference  Efficient algorithms exist for performing inference in large networks which operate on the clique network  Inference is often shown as a probability maximization problem, e.g., what is the most probable cause or effect? argmax W P(W|C=1)

Continuous Case  In our examples the network nodes represented discrete events (states or classes)  Network nodes often hold continuous variables (observations), e.g., length, energy  For the continuous case parametric pdf are introduced and their parameters are estimated using ML (observed) or EM (hidden)

Some Applications  Medical diagnosis  Computer problem diagnosis (MS)  Markov chains  Hidden Markov Models (HMMs)

Conclusions  Bayesian networks are used to represent dependencies between classes  Network topology defines conditional independence conditions that simplify the network pdf modeling and computation  Three problems: probability computation, estimation/training, inference/testing