WHY ARE DBNs SPARSE? Shaunak Chatterjee and Stuart Russell, UC Berkeley Sparsity in DBNs is counter-intuitive Consider the unrolled version of a sample.

Slides:



Advertisements
Similar presentations
CSE 473/573 Computer Vision and Image Processing (CVIP) Ifeoma Nwogu Lecture 27 – Overview of probability concepts 1.
Advertisements

State Estimation and Kalman Filtering CS B659 Spring 2013 Kris Hauser.
Gibbs Sampling Methods for Stick-Breaking priors Hemant Ishwaran and Lancelot F. James 2001 Presented by Yuting Qi ECE Dept., Duke Univ. 03/03/06.
1 Some Comments on Sebastiani et al Nature Genetics 37(4)2005.
Exact Inference in Bayes Nets
Dynamic Bayesian Networks (DBNs)
CSCI 121 Special Topics: Bayesian Networks Lecture #5: Dynamic Bayes Nets.
Reasoning Under Uncertainty: Bayesian networks intro Jim Little Uncertainty 4 November 7, 2014 Textbook §6.3, 6.3.1, 6.5, 6.5.1,
Multi-Task Compressive Sensing with Dirichlet Process Priors Yuting Qi 1, Dehong Liu 1, David Dunson 2, and Lawrence Carin 1 1 Department of Electrical.
Lirong Xia Approximate inference: Particle filter Tue, April 1, 2014.
Rao-Blackwellised Particle Filtering Based on Rao-Blackwellised Particle Filtering for Dynamic Bayesian Networks by Arnaud Doucet, Nando de Freitas, Kevin.
An Introduction to Variational Methods for Graphical Models.
Advanced Artificial Intelligence
1 Reasoning Under Uncertainty Over Time CS 486/686: Introduction to Artificial Intelligence Fall 2013.
Introduction to Mobile Robotics Bayes Filter Implementations Gaussian filters.
10/28 Temporal Probabilistic Models. Temporal (Sequential) Process A temporal process is the evolution of system state over time Often the system state.
CS 188: Artificial Intelligence Fall 2009 Lecture 20: Particle Filtering 11/5/2009 Dan Klein – UC Berkeley TexPoint fonts used in EMF. Read the TexPoint.
. PGM: Tirgul 8 Markov Chains. Stochastic Sampling  In previous class, we examined methods that use independent samples to estimate P(X = x |e ) Problem:
Computational Biology, Part 17 Biochemical Kinetics I Robert F. Murphy Copyright  1996, All rights reserved.
Stanford CS223B Computer Vision, Winter 2007 Lecture 12 Tracking Motion Professors Sebastian Thrun and Jana Košecká CAs: Vaibhav Vaish and David Stavens.
CS 547: Sensing and Planning in Robotics Gaurav S. Sukhatme Computer Science Robotic Embedded Systems Laboratory University of Southern California
Temporal Processes Eran Segal Weizmann Institute.
Part 2 of 3: Bayesian Network and Dynamic Bayesian Network.
Computer vision: models, learning and inference Chapter 10 Graphical Models.
Kalman Filtering Pieter Abbeel UC Berkeley EECS Many slides adapted from Thrun, Burgard and Fox, Probabilistic Robotics TexPoint fonts used in EMF. Read.
MAKING COMPLEX DEClSlONS
Markov Localization & Bayes Filtering
QUIZ!!  T/F: The forward algorithm is really variable elimination, over time. TRUE  T/F: Particle Filtering is really sampling, over time. TRUE  T/F:
1 Naïve Bayes Models for Probability Estimation Daniel Lowd University of Washington (Joint work with Pedro Domingos)
Using Fast Weights to Improve Persistent Contrastive Divergence Tijmen Tieleman Geoffrey Hinton Department of Computer Science, University of Toronto ICML.
Functions of Random Variables. Methods for determining the distribution of functions of Random Variables 1.Distribution function method 2.Moment generating.
Recap: Reasoning Over Time  Stationary Markov models  Hidden Markov models X2X2 X1X1 X3X3 X4X4 rainsun X5X5 X2X2 E1E1 X1X1 X3X3 X4X4 E2E2 E3E3.
Reasoning Under Uncertainty: Bayesian networks intro CPSC 322 – Uncertainty 4 Textbook §6.3 – March 23, 2011.
Probabilistic Robotics Bayes Filter Implementations Gaussian filters.
1 Robot Environment Interaction Environment perception provides information about the environment’s state, and it tends to increase the robot’s knowledge.
Probabilistic Robotics Bayes Filter Implementations.
Computational Biology, Part 15 Biochemical Kinetics I Robert F. Murphy Copyright  1996, 1999, 2000, All rights reserved.
Temporal Models Template Models Representation Probabilistic Graphical
UIUC CS 498: Section EA Lecture #21 Reasoning in Artificial Intelligence Professor: Eyal Amir Fall Semester 2011 (Some slides from Kevin Murphy (UBC))
Dynamic Bayesian Networks and Particle Filtering COMPSCI 276 (chapter 15, Russel and Norvig) 2007.
Computer Vision Group Prof. Daniel Cremers Autonomous Navigation for Flying Robots Lecture 6.1: Bayes Filter Jürgen Sturm Technische Universität München.
Continuous Variables Write message update equation as an expectation: Proposal distribution W t (x t ) for each node Samples define a random discretization.
-Arnaud Doucet, Nando de Freitas et al, UAI
Mobile Robot Localization (ch. 7)
Maximum a posteriori sequence estimation using Monte Carlo particle filters S. J. Godsill, A. Doucet, and M. West Annals of the Institute of Statistical.
CS Statistical Machine learning Lecture 24
CHAPTER 8 DISCRIMINATIVE CLASSIFIERS HIDDEN MARKOV MODELS.
The famous “sprinkler” example (J. Pearl, Probabilistic Reasoning in Intelligent Systems, 1988)
CPSC 422, Lecture 11Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 11 Oct, 2, 2015.
QUIZ!!  In HMMs...  T/F:... the emissions are hidden. FALSE  T/F:... observations are independent given no evidence. FALSE  T/F:... each variable X.
Tractable Inference for Complex Stochastic Processes X. Boyen & D. Koller Presented by Shiau Hong Lim Partially based on slides by Boyen & Koller at UAI.
The generalization of Bayes for continuous densities is that we have some density f(y|  ) where y and  are vectors of data and parameters with  being.
Exact Inference in Bayes Nets. Notation U: set of nodes in a graph X i : random variable associated with node i π i : parents of node i Joint probability:
1 CMSC 671 Fall 2001 Class #20 – Thursday, November 8.
Probability and Time. Overview  Modelling Evolving Worlds with Dynamic Baysian Networks  Simplifying Assumptions Stationary Processes, Markov Assumption.
Uncertain Observation Times Shaunak Chatterjee & Stuart Russell Computer Science Division University of California, Berkeley.
Rao-Blackwellised Particle Filtering for Dynamic Bayesian Network Arnaud Doucet Nando de Freitas Kevin Murphy Stuart Russell.
CS Statistical Machine learning Lecture 25 Yuan (Alan) Qi Purdue CS Nov
Reasoning over Time  Often, we want to reason about a sequence of observations  Speech recognition  Robot localization  User attention  Medical monitoring.
Instructor: Eyal Amir Grad TAs: Wen Pu, Yonatan Bisk Undergrad TAs: Sam Johnson, Nikhil Johri CS 440 / ECE 448 Introduction to Artificial Intelligence.
Tea – Time - Talks Every Friday 3.30 pm ICS 432. We Need Speakers (you)! Please volunteer. Philosophy: a TTT (tea-time-talk) should approximately take.
CS 541: Artificial Intelligence Lecture VIII: Temporal Probability Models.
CS 541: Artificial Intelligence Lecture VIII: Temporal Probability Models.
CS498-EA Reasoning in AI Lecture #23 Instructor: Eyal Amir Fall Semester 2011.
Probabilistic Reasoning Over Time
Kernel Stick-Breaking Process
Probabilistic Reasoning over Time
Instructors: Fei Fang (This Lecture) and Dave Touretzky
An Introduction to Variational Methods for Graphical Models
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 11
Presentation transcript:

WHY ARE DBNs SPARSE? Shaunak Chatterjee and Stuart Russell, UC Berkeley Sparsity in DBNs is counter-intuitive Consider the unrolled version of a sample DBN The longer timstep DBN becomes fully connected Sparsity in DBNs is counter-intuitive Consider the unrolled version of a sample DBN The longer timstep DBN becomes fully connected Fast variables in ∂-model and Observations in ∆-model Dynamic Bayesian Networks (DBNs) W hat are DBNs ? DBNs are a flexible and effective tool for representing and reasoning about stochastic systems that evolve over time. Special cases include hidden Markov models (HMMs), factorial HMMs, hierarchical HMMs, discrete-time Kalman filters and several other families of discrete-time models. The stochastic system’s state is represented by a set of variables X t for each time t ≥ 0 and the DBN represents the joint distribution over the variables {X 1, X 2, …, X ∞ }. Typically, it is assumed that the system’s dynamics do not change over time, so the joint distribution is captured by a 2-TBN (2-Timeslice Bayesian Network), which is a compact graphical representation of the state prior p(X 0 ) and the stochastic dynamics p(X t |X t+1 ). S tructured Dynamics : The dynamics are represented in factored form via a collection of local conditional models p(X i t+1 |∏(X i t+1 )) where ∏(X i t+1 )) are the parent variables of X i t+1 in slice t or t+1. I nference in DBNs : Exact inference is tractable for a few special cases, namely HMMs and Kalman Filter models. For general DBNs, the computational complexity for exact inference is exponential in the number of variables for a large enough time horizon (Murphy, 2002). Approximate inference is much more popular. Boyen-Koller (BK) algorithm and Particle Filtering algorithms have been widely used. Structure learning for DBNs has also been studied (Friedman et al, 1998). However, till date, DBNs are mostly constructed by hand. A pplications : DBNs have been extensively used in: Speech processing Traffic modeling Modeling gene expression data Figure tracking and in numerous other applications. Definitions: T imescale The timescale of a variable is the expected number of timesteps for which it stays in its current state (for a discrete state space). In a general DBN, let ∏(X t+1 ) denote the parents of X t+1 in the 2-TBN excluding X t. Let p k i,j = p(X t+1 =j| X t =i, ∏(X t+1 )=k). T i,k X = 1/(1- p k i,i ) is the timescale of X in state i when its parents are in state k. l X = min i,k T i,k X ; h X = max i,k T i,k X In a DBN with 2 variables X and Y, if l X >> h Y then Y is a fast variable with respect to X. The timescale separation between X and Y is given by the ratio l X /h Y. For a cluster of variables C = {X 1,…,X n }, the timescale bounds are defined by l C = min XiєC l Xi and h C = min XiєC h Xi. Larger timescale separations result in more accurate models for larger timesteps. S tationary distribution When l X >>h Y the stationary distribution of Y given X=k is the limiting distribution of Y if X is “frozen” at k. This is the steady-state approximation of Y and is also referred to as the equilibrium distribution. ∂-model to ∆-model ConversionTopology changing rules Marginalize out time slices t+1 and t+2 S tochastic differential equations (SDEs) to DBNs SDEs describe the stochastic dynamics of a system over an infinitesimally small timestep DBNs are approximate representations of the SDEs over a finite timestep Approximate since the exact model created by integrating the SDE over a finite timestep would result in a completely connected DBN Most DBNs modeling real-life stochastic processes are sparse The sparsity of these DBNs make them more tractable They are designed by humans who make implicit approximations H ow large a timestep? Critical decision in the design of a DBN Small enough so that fastest changing variable has a small probability of changing state Could result in gross inefficiency! Large timestep would be very efficient K ey Questions 1.How to choose an appropriate ∆? 2.Topology of ∆-model Any generally applicable rules? 3.Characterize the approximation error A pproximation scheme: Consider the 2-variable DBN where s is a slow variable and f is a fast variable (w.r.t. s) ∂ denotes a short timestep and ∆ denotes a long timestep. ∂-model Exact ∆-model Approximate ∆-model In the approximate ∆-model, s  f : stationary distribution of f for a fixed s ^ s  s : ( P(s t+1 |s t ) ) ∆/∂, where E rror Characterization: If the conditional probabilities of the ∂-model have the following structure then for є<<1 and times ∆/∂ upto O(1/є), the approximation scheme for s  s has an error of O(є). (Proof in paper) The error of the limiting distribution decays exponentially. Experiments pH control mechanism in the human body Rule 1: If f 1 and f 2 have no cross links in ∂-model then they have no cross links in ∆-model Rule 2: If f 1 and f 2 have cross links in ∂-model then they are linked in ∆-model Rule 3: If s 2 is a parent of f and f is parent of s 1 in ∂-model then s 2 is a parent of s 1 in ∆-model There are 4 different timescales in this DBN. Lighter shade represents slower timescale and darker shade represents faster timescale. Accuracy of approximate modelsSpeedup ←Fig 1. Avg. L2 error of joint belief vector Fig. 2.→ Accuracy in tracking the marginal distribution of pH G eneral Algorithm: Given the exact ∂-model, the approximation scheme can be used to create a sequence of DBNs for various values of ∆, depending on the number of timescale separable clusters. For the algorithm, we assume the DBN has n variables X 1, X 2,…, X n. 1. For each variable X i, determine l Xi and h Xi 2. Cluster the variables in {C1, …, Cm} (m≤n) such that є i = (h Ci /l C i+1 ) << 1, i.e. there is significant timescale separation between successive clusters 3. Repeat for i=1, 2,…, m ∆ i = ∆ i-1 O(1/ є i ) 3.2. C i is the fast cluster and C j (j>i) are slow clusters. Compute the stationary distribution of C i conditioned on each configuration of its slower parents In the worst case, all Cj’s become fully connected in the ∆ i model. If there are no links from C i to C j in the exact, then the {C j }  {C j } link is exact. Widely varying timescales: An overview C hemical Reactions : Michaelis-Menten kinetics makes the quasi-steady-state assumption that the concentration of substrate-bound enzyme changes much more slowly than that of product or substrate. Recent works separate slow and fast timescales in the chemical master equation (CME) yielding separate reduced CMEs (see Gomez-Uribe et al) G ene Regulatory Networks : Arkin et. al. proposed an abstraction methodology using rapid equilibrium and quasi- steady-state approximations. M athematics and P hysics : Homogenization to replace rapidly oscillating coefficients. Body temperature (BT) and Thermometer temperature (TT) Both variables are discretized (binary) for simplicity Body temp. has slow dynamics compared to the thermometer BT  TT link approximated by steady state distribution Bad approximation for 1-second timestep Good approximation for 60-second timestep