Kansas State University Department of Computing and Information Sciences KDD Seminar Series, Fall 2002: Dynamic Bayesian Networks Friday, August 23, 2002.

Slides:



Advertisements
Similar presentations
EXPLORIS Montserrat Volcano Observatory Aspinall and Associates Risk Management Solutions An Evidence Science approach to volcano hazard forecasting.
Advertisements

CSE 473/573 Computer Vision and Image Processing (CVIP) Ifeoma Nwogu Lecture 27 – Overview of probability concepts 1.
State Estimation and Kalman Filtering CS B659 Spring 2013 Kris Hauser.
1 Some Comments on Sebastiani et al Nature Genetics 37(4)2005.
Rutgers CS440, Fall 2003 Review session. Rutgers CS440, Fall 2003 Topics Final will cover the following topics (after midterm): 1.Uncertainty & introduction.
Dynamic Bayesian Networks (DBNs)
Modeling Uncertainty over time Time series of snapshot of the world “state” we are interested represented as a set of random variables (RVs) – Observable.
Lirong Xia Approximate inference: Particle filter Tue, April 1, 2014.
Hidden Markov Models Reading: Russell and Norvig, Chapter 15, Sections
Introduction of Probabilistic Reasoning and Bayesian Networks
Chapter 15 Probabilistic Reasoning over Time. Chapter 15, Sections 1-5 Outline Time and uncertainty Inference: ltering, prediction, smoothing Hidden Markov.
Hidden Markov Models M. Vijay Venkatesh. Outline Introduction Graphical Model Parameterization Inference Summary.
1 Reasoning Under Uncertainty Over Time CS 486/686: Introduction to Artificial Intelligence Fall 2013.
GS 540 week 6. HMM basics Given a sequence, and state parameters: – Each possible path through the states has a certain probability of emitting the sequence.
Hilbert Space Embeddings of Hidden Markov Models Le Song, Byron Boots, Sajid Siddiqi, Geoff Gordon and Alex Smola 1.
Probabilistic reasoning over time So far, we’ve mostly dealt with episodic environments –One exception: games with multiple moves In particular, the Bayesian.
… Hidden Markov Models Markov assumption: Transition model:
10/28 Temporal Probabilistic Models. Temporal (Sequential) Process A temporal process is the evolution of system state over time Often the system state.
CS 188: Artificial Intelligence Fall 2009 Lecture 20: Particle Filtering 11/5/2009 Dan Klein – UC Berkeley TexPoint fonts used in EMF. Read the TexPoint.
CS 547: Sensing and Planning in Robotics Gaurav S. Sukhatme Computer Science Robotic Embedded Systems Laboratory University of Southern California
Part 2 of 3: Bayesian Network and Dynamic Bayesian Network.
Conditional Random Fields
Probabilistic Robotics Introduction Probabilities Bayes rule Bayes filters.
Genome evolution: a sequence-centric approach Lecture 3: From Trees to HMMs.
CPSC 322, Lecture 31Slide 1 Probability and Time: Markov Models Computer Science cpsc322, Lecture 31 (Textbook Chpt 6.5) March, 25, 2009.
11/14  Continuation of Time & Change in Probabilistic Reasoning Project 4 progress? Grade Anxiety? Make-up Class  On Monday?  On Wednesday?
Bayesian Networks Alan Ritter.
CPSC 422, Lecture 14Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 14 Feb, 4, 2015 Slide credit: some slides adapted from Stuart.
CS 188: Artificial Intelligence Fall 2009 Lecture 19: Hidden Markov Models 11/3/2009 Dan Klein – UC Berkeley.
1 Bayesian Networks Chapter ; 14.4 CS 63 Adapted from slides by Tim Finin and Marie desJardins. Some material borrowed from Lise Getoor.
A Unifying Review of Linear Gaussian Models
QUIZ!!  T/F: The forward algorithm is really variable elimination, over time. TRUE  T/F: Particle Filtering is really sampling, over time. TRUE  T/F:
Computer vision: models, learning and inference Chapter 19 Temporal models.
Bayesian networks Classification, segmentation, time series prediction and more. Website: Twitter:
Recap: Reasoning Over Time  Stationary Markov models  Hidden Markov models X2X2 X1X1 X3X3 X4X4 rainsun X5X5 X2X2 E1E1 X1X1 X3X3 X4X4 E2E2 E3E3.
1 Generative and Discriminative Models Jie Tang Department of Computer Science & Technology Tsinghua University 2012.
UIUC CS 498: Section EA Lecture #21 Reasoning in Artificial Intelligence Professor: Eyal Amir Fall Semester 2011 (Some slides from Kevin Murphy (UBC))
Dynamic Bayesian Networks and Particle Filtering COMPSCI 276 (chapter 15, Russel and Norvig) 2007.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: ML and Simple Regression Bias of the ML Estimate Variance of the ML Estimate.
Processing Sequential Sensor Data The “John Krumm perspective” Thomas Plötz November 29 th, 2011.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.
Maximum a posteriori sequence estimation using Monte Carlo particle filters S. J. Godsill, A. Doucet, and M. West Annals of the Institute of Statistical.
Computing & Information Sciences Kansas State University Data Sciences Summer Institute Multimodal Information Access and Synthesis Learning and Reasoning.
CS Statistical Machine learning Lecture 24
The famous “sprinkler” example (J. Pearl, Probabilistic Reasoning in Intelligent Systems, 1988)
QUIZ!!  In HMMs...  T/F:... the emissions are hidden. FALSE  T/F:... observations are independent given no evidence. FALSE  T/F:... each variable X.
Tractable Inference for Complex Stochastic Processes X. Boyen & D. Koller Presented by Shiau Hong Lim Partially based on slides by Boyen & Koller at UAI.
Probabilistic reasoning over time Ch. 15, 17. Probabilistic reasoning over time So far, we’ve mostly dealt with episodic environments –Exceptions: games.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Elements of a Discrete Model Evaluation.
Tracking with dynamics
1 CMSC 671 Fall 2001 Class #20 – Thursday, November 8.
Pattern Recognition and Machine Learning
CS Statistical Machine learning Lecture 25 Yuan (Alan) Qi Purdue CS Nov
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.
Dynamic Bayesian Network Fuzzy Systems Lifelog management.
Instructor: Eyal Amir Grad TAs: Wen Pu, Yonatan Bisk Undergrad TAs: Sam Johnson, Nikhil Johri CS 440 / ECE 448 Introduction to Artificial Intelligence.
CS 541: Artificial Intelligence Lecture VIII: Temporal Probability Models.
CS 541: Artificial Intelligence Lecture VIII: Temporal Probability Models.
CS498-EA Reasoning in AI Lecture #23 Instructor: Eyal Amir Fall Semester 2011.
Course: Autonomous Machine Learning
Probabilistic Reasoning over Time
Probability and Time: Markov Models
Hidden Markov Models Part 2: Algorithms
Probability and Time: Markov Models
Instructors: Fei Fang (This Lecture) and Dave Touretzky
Probability and Time: Markov Models
Class #16 – Tuesday, October 26
Chapter14-cont..
Probability and Time: Markov Models
Non-Standard-Datenbanken
Presentation transcript:

Kansas State University Department of Computing and Information Sciences KDD Seminar Series, Fall 2002: Dynamic Bayesian Networks Friday, August 23, 2002 Haipeng Guo KDD Research Group Department of Computing and Information Sciences Kansas State University Dynamic Bayesian Networks KDD Group Seminar

Kansas State University Department of Computing and Information Sciences KDD Seminar Series, Fall 2002: Dynamic Bayesian Networks Presentation Outline Introduction to State-space Models Dynamic Bayesian Networks(DBN) Representation Inference Learning Summary Reference

Kansas State University Department of Computing and Information Sciences KDD Seminar Series, Fall 2002: Dynamic Bayesian Networks The Problem of Modeling Sequential Data Sequential Data Modeling is important in many areas Time series generated by a dynamic system –Time series modeling A sequence generated by an one-dimensional spatial process –Bio-sequences

Kansas State University Department of Computing and Information Sciences KDD Seminar Series, Fall 2002: Dynamic Bayesian Networks The Solutions Classic approaches to time-series prediction –Linear models: ARIMA(auto-regressive integrated moving average), ARMAX(autoregressive moving average exogenous variables model) –Nonlinear models: neural networks, decision trees Problems with classic approaches – prediction of the future is based on only a finite window –it’s difficult to incorporate prior knowledge –difficulties with multi-dimentional inputs and/or outputs State-space models –Assume there is some underlying hidden state of the world(query) that generates the observations(evidence), and that this hidden state evolves in time, possibly as a function of our inputs –The belief state: our belief on the hidden state of the world given the observations up to the current time y 1:t and our inputs u 1:t to the system, P( X | y 1:t, u 1:t ) –Two most common state-space models: Hidden Markov Models(HMMs) and Kalman Filter Models(KFMs) –a more general state-space model: dynamic Bayesian networks(DBNs)

Kansas State University Department of Computing and Information Sciences KDD Seminar Series, Fall 2002: Dynamic Bayesian Networks State-space Models: Representation Any state-space model must define a prior P(X 1 ) and a state-transition function, P(X t | X t- 1 ), and an observation function, P(Y t | X t ). Assumptions: –Models are first-order Markov, i.e., P(X t | X 1:t-1 ) = P(X t | X t-1 ) –observations are conditional first-order Markov P(Y t | X t, Y t-1 ) = P(Y t | X t ) –Time-invariant or homogeneous Representations: –HMMs: X t is a discrete random variables –KFMs: X t is a vector of continuous random variables –DBNs: more general and expressive language for representing state-space models

Kansas State University Department of Computing and Information Sciences KDD Seminar Series, Fall 2002: Dynamic Bayesian Networks State-space Models: Inference A state-space model defines how X t generates Y t and X t. The goal of inference is to infer the hidden states(query) X 1:t given the observations(evidence) Y 1:t. Inference tasks: –Filtering(monitoring): recursively estimate the belief state using Bayes’ rule predict: computing P(X t | y 1:t-1 ) updating: computing P(X t | y 1:t ) throw away the old belief state once we have computed the prediction(“rollup”) –Smoothing: estimate the state of the past, given all the evidence up to the current time Fixed-lag smoothing(hindsight): computing P(X t-l | y 1:t ) where l > 0 is the lag –Prediction: predict the future Lookahead: computing P(X t+h | y 1:t ) where h > 0 is how far we want to look ahead –Viterbi decoding: compute the most likely sequence of hidden states given the data MPE(abduction): x * 1:t = argmax P(x 1:t | y 1:t )

Kansas State University Department of Computing and Information Sciences KDD Seminar Series, Fall 2002: Dynamic Bayesian Networks State-space Models: Learning Parameters learning(system identification) means estimating from data these parameters that are used to define the transition model P( X t | X t-1 ) and the observation model P( Y t | X t ) The usual criterion is maximum-likelihood(ML) The goal of parameter learning is to compute –  * ML = argmax  P( Y|  ) = argmax  log P( Y|  ) –Or  * MAP = argmax  log P( Y|  ) + logP(  ) if we include a prior on the parameters –Two standard approaches: gradient ascent and EM(Expectation Maximization) Structure learning: more ambitious

Kansas State University Department of Computing and Information Sciences KDD Seminar Series, Fall 2002: Dynamic Bayesian Networks HMM: Hidden Markov Model one discrete hidden node and one discrete or continuous observed node per time slice. X: hidden variables Y: observations Structures and parameters remain same over time Three parameters in a HMM: –The initial state distribution P( X 1 ) –The transition model P( X t | X t-1 ) –The observation model P( Y t | X t ) HMM is the simplest DBN –a discrete state variable with arbitrary dynamics and arbitrary measurements X1X1 Y1Y1 X2X2 Y2Y2 X3X3 Y3Y3 X4X4 Y4Y4

Kansas State University Department of Computing and Information Sciences KDD Seminar Series, Fall 2002: Dynamic Bayesian Networks KFL: Kalman Filter Model KFL has the same topology as an HMM all the nodes are assumed to have linear-Gaussian distributions x(t+1) = F*x(t) + w(t), - w ~ N(0, Q) : process noise, x(0) ~ N(X(0), V(0)) y(t) = H*x(t) + v(t), - v ~ N(0, R) : measurement noise Also known as Linear Dynamic Systems(LDSs) –a partially observed stochastic process –with linear dynamics and linear observations: f( a + b) = f(a) + f(b) –both subject to Gaussian noise KFL is the simplest continuous DBN –a continuous state variable with linear-Gaussian dynamics and measurements X1X1 Y1Y1 X2X2 Y2Y2

Kansas State University Department of Computing and Information Sciences KDD Seminar Series, Fall 2002: Dynamic Bayesian Networks DBN: Dynamic Bayesian networks DBNs are directed graphical models of stochastic process DBNs generalize HMMs and KFLs by representing the hidden and observed state in terms of state variables, which can have complex interdependencies The graphical structure provides an easy way to specify these conditional independencies A compact parameterization of the state-space model An extension of BNs to handle temporal models Time-invariant: the term ‘dynamic’ means that we are modeling a dynamic model, not that the networks change over time

Kansas State University Department of Computing and Information Sciences KDD Seminar Series, Fall 2002: Dynamic Bayesian Networks DBN: a formal definition Definition: A DBN is defined as a pair (B 0, B  ), where B 0 defines the prior P(Z 1 ), and is a two-slice temporal Bayes net(2TBN) which defines P(Z t | Z t-1 ) by means of a DAG(directed acyclic graph) as follows: Z(i,t) is a node at time slice t, it can be a hidden node, an observation node, or a control node(optional) Pa(Z ( i, t) ) are parent nodes of Z(i,t), they can be at either time slice t or t-1 The nodes in the first slice of a 2TBN do not have parameters associated with them But each node in the second slice has an associated CPD(conditional probability distribution)

Kansas State University Department of Computing and Information Sciences KDD Seminar Series, Fall 2002: Dynamic Bayesian Networks DBN representation in BNT(MatLab) To specify a DBN, we need to define the intra-slice topology (within a slice), the inter-slice topology (between two slices), as well as the parameters for the first two slices. (Such a two-slice temporal Bayes net is often called a 2TBN.) We can specify the topology as follows: intra = zeros(2); intra(1,2) = 1; // node 1 in slice t connects to node 2 in slice t inter = zeros(2); inter(1,1) = 1; // node 1 in slice t-1 connects to node 1 in slice t We can specify the parameters as follows, where for simplicity we assume the observed node is discrete. Q = 2; // num hidden states O = 2; // num observable symbols ns = [Q O]; dnodes = 1:2; bnet = mk_dbn(intra, inter, ns, 'discrete', dnodes); for i=1:4 bnet.CPD{i} = tabular_CPD(bnet, i); end eclass1 = [1 2]; eclass2 = [3 2]; eclass = [eclass1 eclass2]; bnet = mk_dbn(intra, inter, ns, 'discrete', dnodes, 'eclass1', eclass1, 'eclass2', eclass2); prior0 = normalise(rand(Q,1)); transmat0 = mk_stochastic(rand(Q,Q)); obsmat0 = mk_stochastic(rand(Q,O)); bnet.CPD{1} = tabular_CPD(bnet, 1, prior0); bnet.CPD{2} = tabular_CPD(bnet, 2, obsmat0); bnet.CPD{3} = tabular_CPD(bnet, 3, transmat0);

Kansas State University Department of Computing and Information Sciences KDD Seminar Series, Fall 2002: Dynamic Bayesian Networks Representation of DBN in XML format //…a static BN(DAG) in XMLBIF format defining the //state-space at time slice 1 // a transition network(DAG) including two time slices t and t+1; // node has an additional attribute showing which time slice it // belongs to // only nodes in slice t+1 have CPDs

Kansas State University Department of Computing and Information Sciences KDD Seminar Series, Fall 2002: Dynamic Bayesian Networks The Semantics of a DBN First-order markov assumption: the parents of a node can only be in the same time slice or the previous time slice, i.e., arcs do not across slices Inter-slice arcs are all from left to right, reflecting the arrow of time Intra-slice arcs can be arbitrary as long as the overall DBN is a DAG Time-invariant assumption: the parameters of the CPDs don’t change over time The semantics of DBN can be defined by “unrolling” the 2TBN to T time slices The resulting joint probability distribution is then defined by

Kansas State University Department of Computing and Information Sciences KDD Seminar Series, Fall 2002: Dynamic Bayesian Networks DBN, HMM, and KFM HMM’s state space consists of a single random variable; DBN represents the hidden state in terms of a set of random variables KFM requires all the CPDs to be linear-Gaussian; DBN allows arbitrary CPDs HMMs and KFMs have a restricted topology; DBN allows much more general graph structures DBN generalizes HMM and KFM; has more expressive power

Kansas State University Department of Computing and Information Sciences KDD Seminar Series, Fall 2002: Dynamic Bayesian Networks DBN: Inference The goal of inference in DBNs is to compute –Filtering: r = t –Smoothing: r > t –Prediction: r < t –Viterbi: MPE

Kansas State University Department of Computing and Information Sciences KDD Seminar Series, Fall 2002: Dynamic Bayesian Networks DBN inference algorithms DBN inference algorithms extend HMM and KFM’s inference algorithms, and call BN inference algorithms as subroutines DBN inference is NP-hard Exact Inference algorithms: –Forwards-backwards smoothing algorithm (on any discrete-state DBN) –The frontier algorithm(sweep a Markov blanket, the frontier set F, across the DBN, first forwards and then backwards) –The interface algorithm (use only the set of nodes with outgoing arcs to the next time slice to d-separate the past from the future) –Kalman filtering and smoothing Approximate algorithms: –The Boyen-Koller(BK) algorithm (approximate the joint distribution over the interface as a product of marginals) –Factored frontier(FF) algorithm –Loopy propagation algorithm(LBP) –Kalman filtering and smoother –Stochastic sampling algorithm: importance sampling or MCMC(offline inference) Particle filtering(PF) (online)

Kansas State University Department of Computing and Information Sciences KDD Seminar Series, Fall 2002: Dynamic Bayesian Networks DBN: Learning The techniques for learning DBN are mostly straightforward extensions of the techniques for learning BNs; Parameter learning –Offline learning Parameters must be tied across time-slices The initial state of the dynamic system can be learned independently of the transition matrix –Online learning Add the parameters to the state space and then do online inference(filtering). Structure learning –The intra-slice connectivity must be a DAG –Learning the inter-slice connectivity is equivalent to the variable selection problem, since for each node in slice t, we must choose its parents from slice t-1. –Learning for DBNs reduces to feature selection if we assume the intra-slice connections are fixed Learning uses inference algorithms as subroutines

Kansas State University Department of Computing and Information Sciences KDD Seminar Series, Fall 2002: Dynamic Bayesian Networks DBN Learning Applications Learning genetic network topology using structural EM –Gene pathway models Inferring motifs using HHMMs –Motifs are short patterns which occur in DNA and have certain biological significance; {A, C G, T} * Inferring people’s goals using abstract HMMs –Inferring people’s intentional states by observing their behavior Modeling freeway traffic using coupled HMMs

Kansas State University Department of Computing and Information Sciences KDD Seminar Series, Fall 2002: Dynamic Bayesian Networks Summary DBN is a general state-space model to describe stochastic dynamic system HMMs and KFMs are special cases of DBNs DBNs have more “expressive power” DBN inference includes filtering, smoothing, prediction; uses BNs inference as subroutines DBN structure learning includes the learning of intra-slice connections and inter-slice connections DBN has a broad range of real world applications, especially in bioinformatics.

Kansas State University Department of Computing and Information Sciences KDD Seminar Series, Fall 2002: Dynamic Bayesian Networks References K. P. Murphy, "Dynamic Bayesian Networks: Representation, Inference and Learning“, PhD thesis. UC Berkeley, Computer Science Division, July Todd A. Stephenson, An Introduction to Bayesian Network Theory and Usage(2000) Zweig, Geoffrey Speech Recognition with Dynamic Bayesian Networks. Ph.D. Thesis, University of California, Berkeley. Applications of Bayesian Networkshttp:// K. Murphy, S. Mian, "Modelling Gene Expression Data using Dynamic Bayesian Networks," Technical Report, University of California, Berkeley, N. Friedman, K. Murphy, and S. Russel. Learning the structure of dynamic probabilistic networks. In 12th UAI, Kjrulff, U. (1992), A computational scheme for reasoning in dynamic probabilistic networks, Proceedings of the Eighth Conference on Uncertainty in Artificial Intelligence, , Morgan Kaufmann, San Francisco. Boyen, X. and Koller R, D. 1998: Tractable Inference for Complex Stochastic Processes. In UAI98.