Bayesian Belief Networks: Computational Considerations for the Environmental Researcher Stephen Jensen & Alix I Gitelman Statistics Department Oregon State.

Slides:



Advertisements
Similar presentations
Bayesian Belief Propagation
Advertisements

NCeSS e-Stat quantitative node Prof. William Browne & Prof. Jon Rasbash University of Bristol.
What I am after from gR2002 Peter Green, University of Bristol, UK.
A Tutorial on Learning with Bayesian Networks
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for 1 Lecture Notes for E Alpaydın 2010.
Probabilistic models Jouni Tuomisto THL. Outline Deterministic models with probabilistic parameters Hierarchical Bayesian models Bayesian belief nets.
1 Some Comments on Sebastiani et al Nature Genetics 37(4)2005.
VARYING RESIDUAL VARIABILITY SEQUENCE OF GRAPHS TO ILLUSTRATE r 2 VARYING RESIDUAL VARIABILITY N. Scott Urquhart Director, STARMAP Department of Statistics.
Lauritzen-Spiegelhalter Algorithm
An Overview STARMAP Project I Jennifer Hoeting Department of Statistics Colorado State University
Exact Inference in Bayes Nets
Simulation Metamodeling using Dynamic Bayesian Networks in Continuous Time Jirka Poropudas (M.Sc.) Aalto University School of Science and Technology Systems.
Artificial Intelligence Chapter 19 Reasoning with Uncertain Information Biointelligence Lab School of Computer Sci. & Eng. Seoul National University.
Overview of Inference Algorithms for Bayesian Networks Wei Sun, PhD Assistant Research Professor SEOR Dept. & C4I Center George Mason University, 2009.
Robust sampling of natural resources using a GIS implementation of GRTS David Theobald Natural Resource Ecology Lab Dept of Recreation & Tourism Colorado.
Introduction to Sampling based inference and MCMC Ata Kaban School of Computer Science The University of Birmingham.
1 STARMAP: Project 2 Causal Modeling for Aquatic Resources Alix I Gitelman Stephen Jensen Statistics Department Oregon State University August 2003 Corvallis,
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Models for the Analysis of Discrete Compositional Data An Application of Random Effects Graphical Models Devin S. Johnson STARMAP Department of Statistics.
September, 13th gR2002, Vienna PAOLO GIUDICI Faculty of Economics, University of Pavia Research carried out within the laboratory: Statistical.
1 Accounting for Spatial Dependence in Bayesian Belief Networks Alix I Gitelman Statistics Department Oregon State University August 2003 JSM, San Francisco.
PAGE # 1 Presented by Stacey Hancock Advised by Scott Urquhart Colorado State University Developing Learning Materials for Surface Water Monitoring.
End of Chapter 8 Neil Weisenfeld March 28, 2005.
Distribution Function Estimation in Small Areas for Aquatic Resources Spatial Ensemble Estimates of Temporal Trends in Acid Neutralizing Capacity Mark.
Example For simplicity, assume Z i |F i are independent. Let the relative frame size of the incomplete frame as well as the expected cost vary. Relative.
Habitat association models  Independent Multinomial Selections (IMS): (McCracken, Manly, & Vander Heyden, 1998) Product multinomial likelihood with multinomial.
Distribution Function Estimation in Small Areas for Aquatic Resources Spatial Ensemble Estimates of Temporal Trends in Acid Neutralizing Capacity Mark.
Models for the Analysis of Discrete Compositional Data An Application of Random Effects Graphical Models Devin S. Johnson STARMAP Department of Statistics.
State-Space Models for Biological Monitoring Data Devin S. Johnson University of Alaska Fairbanks and Jennifer A. Hoeting Colorado State University.
Computer vision: models, learning and inference Chapter 10 Graphical Models.
1 Learning Materials for Surface Water Monitoring Gerald Scarzella.
Optimal Sample Designs for Mapping EMAP Data Molly Leecaster, Ph.D. Idaho National Engineering & Environmental Laboratory Jennifer Hoeting, Ph. D. Colorado.
Bayesian Analysis for Extreme Events Pao-Shin Chu and Xin Zhao Department of Meteorology School of Ocean & Earth Science & Technology University of Hawaii-
Applications of Nonparametric Survey Regression Estimation in Aquatic Resources F. Jay Breidt, Siobhan Everson-Stewart, Alicia Johnson, Jean D. Opsomer.
Statistical Models for Stream Ecology Data: Random Effects Graphical Models Devin S. Johnson Jennifer A. Hoeting STARMAP Department of Statistics Colorado.
Random Effects Graphical Models and the Analysis of Compositional Data Devin S. Johnson and Jennifer A. Hoeting STARMAP Department of Statistics Colorado.
Distribution Function Estimation in Small Areas for Aquatic Resources Spatial Ensemble Estimates of Temporal Trends in Acid Neutralizing Capacity Mark.
Distribution Function Estimation in Small Areas for Aquatic Resources Spatial Ensemble Estimates of Temporal Trends in Acid Neutralizing Capacity Mark.
Bayes Factor Based on Han and Carlin (2001, JASA).
A Brief Introduction to Graphical Models
Learning Structure in Bayes Nets (Typically also learn CPTs here) Given the set of random variables (features), the space of all possible networks.
Alternative statistical modeling of Pharmacokinetics and Pharmacodynamics A collaboration between Aalborg University and Novo Nordisk A/S Claus Dethlefsen.
1 Inferring structure to make substantive conclusions: How does it work? Hypothesis testing approaches: Tests on deviances, possibly penalised (AIC/BIC,
Learning Linear Causal Models Oksana Kohutyuk ComS 673 Spring 2005 Department of Computer Science Iowa State University.
DAMARS/STARMAP 8/11/03# 1 STARMAP YEAR 2 N. Scott Urquhart STARMAP Director Department of Statistics Colorado State University Fort Collins, CO
Graphical Models for Machine Learning and Computer Vision.
Learning the Structure of Related Tasks Presented by Lihan He Machine Learning Reading Group Duke University 02/03/2006 A. Niculescu-Mizil, R. Caruana.
Learning With Bayesian Networks Markus Kalisch ETH Zürich.
Computing & Information Sciences Kansas State University Data Sciences Summer Institute Multimodal Information Access and Synthesis Learning and Reasoning.
INTERVENTIONS AND INFERENCE / REASONING. Causal models  Recall from yesterday:  Represent relevance using graphs  Causal relevance ⇒ DAGs  Quantitative.
INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN © The MIT Press, Lecture.
Approximate Inference: Decomposition Methods with Applications to Computer Vision Kyomin Jung ( KAIST ) Joint work with Pushmeet Kohli (Microsoft Research)
Probabilistic models Jouni Tuomisto THL. Outline Deterministic models with probabilistic parameters Hierarchical Bayesian models Bayesian belief nets.
Exact Inference in Bayes Nets. Notation U: set of nodes in a graph X i : random variable associated with node i π i : parents of node i Joint probability:
04/21/2005 CS673 1 Being Bayesian About Network Structure A Bayesian Approach to Structure Discovery in Bayesian Networks Nir Friedman and Daphne Koller.
Bayesian Modelling Harry R. Erwin, PhD School of Computing and Technology University of Sunderland.
Dynamic Programming & Hidden Markov Models. Alan Yuille Dept. Statistics UCLA.
VARYING DEVIATION BETWEEN H 0 AND TRUE  SEQUENCE OF GRAPHS TO ILLUSTRATE POWER VARYING DEVIATION BETWEEN H 0 AND TRUE  N. Scott Urquhart Director, STARMAP.
Artificial Intelligence Chapter 19 Reasoning with Uncertain Information Biointelligence Lab School of Computer Sci. & Eng. Seoul National University.
Hierarchical Models. Conceptual: What are we talking about? – What makes a statistical model hierarchical? – How does that fit into population analysis?
Introduction to Sampling based inference and MCMC
INTRODUCTION TO Machine Learning 2nd Edition
CHAPTER 16: Graphical Models
Lecture on Bayesian Belief Networks (Basics)
Artificial Intelligence Chapter 19
CSCI 5822 Probabilistic Models of Human and Machine Learning
Filtering and State Estimation: Basic Concepts
Biointelligence Lab School of Computer Sci. & Eng.
Graduate School of Information Sciences, Tohoku University
TROUBLESOME CONCEPTS IN STATISTICS: r2 AND POWER
Presentation transcript:

Bayesian Belief Networks: Computational Considerations for the Environmental Researcher Stephen Jensen & Alix I Gitelman Statistics Department Oregon State University August, 2003 Bayesian Belief Networks (BBN) Bayesian Belief Networks are a class of models which can be used to describe succinctly the dependencies and interactions between large sets of variables. BBN have been used extensively in fields such as artificial intelligence and decision sciences. BBN are also well-suited to environmental research with its large numbers of variables and extensive dependencies and interactions. In fact, BBN are ideally suited to studying relationships between effluent limitations and water quality (e.g., Borsuk et al., 2003). BBN are graphical models—a set of nodes represents the random variables and a set of vertices (edges) represents relationships between the nodes. A BBN can contain directed or undirected vertices, and even a mixture of the two (some examples are shown below): Undirected BBN have their genesis in statistical physics work by Gibbs (1902). Additionally, they have been used through the years for modeling spatial interactions (Besag, 1974), interpreting hierarchical log-linear models by analogy to Markov random fields (Darroch et al., 1980). Directed BBN originated in path analysis (Wright 1921). They have seen much more use recently in the guise of causal networks (Pearl, 1986; Lauritzen & Speed, 1988). Estimating BBN Several directed BBN algorithms have been devised, including HUGIN (Andersen et al., 1989), TETRAD (Sprites et al., 1993) and DEAL (Bøttcher & Dethlefsen, 2003). To determine a BBN model for a set of data, a canned package such as HUGIN or DEAL can be used exclusively. These packages search across some space of likely models using either an exhaustive search or possibly a heuristic like a greedy search. The networks are scored in some manner to determine the best candidates. These packages are useful for structural learning—that is, understanding which nodes are connected to which. Using these packages, the probability of an edge being included in the graph is not given, and neither are standard error estimates provided for parameter (i.e., node probability) estimates. Furthermore, for efficiency, these canned packages all require frequent triangulation of a graph, which is known to be an NP-hard problem. Although restriction to decomposable models only requires a single triangulation. (Deshpande et al., 2001). An extension of this method is possible by first using HUGIN or another package first to find a set of candidate models (that is, to perform the structural learning), and then estimate the parameters of those models by using a Markov Chain Monte Carlo method (this is the parameter learning phase). In this way, varaibility estimates for the model parameters can be calculated. However, this approach is two-stage, and in it, probabilities for an edge being included in the model remain uncalculated. Others have simply assumed a structure for a BBN, and then used extant software to estimate the strength of the edges (Borsuk et al., 2003). Reversible jump Markov chain Monte Carlo (RJMCMC; Green, 1995) is a generalization of the Metropolis-Hastings algorithm to models with varying dimension of the parameter space. This makes it well suited for graphical model estimation, where the number (and direction) of edges may be unknown a priori. That is, using the RJMCMC algorithm, we can accomplish both structural and parameter learning. Furthermore, edge probabilities and estimate variability are calculated as a matter of course. RJ MCMC works well for discrete undirected graphical models (Dellaportas & Forster, 1999), but is computationally demanding. Efficiency is improved by restricting the search to decomposable models. Algorithms for undirected decomposable graphs (UDGs) exist in purely continuous (Giudici & Green, 1999) and purely discrete settings (Giudici et al., 2000). Fronk & Giudici (2000) provided an algorithm for directed acyclic graphs (DAGs). The RJ MCMC algorithms for graphical models involve three steps: 1. Select an edge for addition, removal or reversal. 2. Decide whether to make the structural change and update the structure and parameters. 3. Update the parameters in some fashion without making a structural change. RJ MCMC algorithms will often require the same re-triangulation of the graph as the traditional algorithms, making the structural updating quite computationally intensive. Restriction to decomposable models makes the computation easier, as the model structure can be updated in polynomial time (Desphande et al., 2001). Updating the parameters (step 3 above) is a simple Metropolis-Hastings step, which is rather quick. Recommendations Depending on the nature of prior information regarding the ecological systems one wishes to model using BBN, there are several computational approaches from which to choose. Assume “known” nodes, perform structural learning (e.g., DEAL, HUGIN). Assume “known” edges, perform parameter learning (e.g., WINBUGS). Perform a two-step approach—first learn the structure, then the parameters. Perform structural and parameter learning simultaneously (RJMCMC; e.g., BayesX). Tradeoffs for these approaches include computational expenses (in terms of programming costs and CPU time) and the feasibility (really, availability) of probability and uncertainty estimation. Four Node Example: Structural Learning vs RJMCMC Methods or “This is all very nice, Steve, but will it work?” In this example we consider a simple data set of four continuous variables, to which we will fit a directed graphical model. This model is fit by using a traditional algorithm (in this case DEAL), and also by an RJ MCMC algorithm (using the package BayesX). The data are Mid-Atlantic Integrated Assessment (MAIA) data set for Specifically, the variables are BUGIBI, an index of biotic integrity for macro-invertebrates; LAT, latitude of the sample point; ANC, acid neutralizing capacity; and SO4, sulfur dioxide. References Andersen, S.K., Olesen, K.G., Jensen, F.V. and Jensen, F. (1989). HUGIN – A shell for building Bayesian belief universes for expert systems, in Shafer, G. and Pearl, J. (eds) (1990). Readings in Uncertain Reasoning, Morgan Kaufmann Publishers, San Mateo, CA. BayesX. Besag, J. (1974). Spatial interaction and the statistical analysis of lattice systems (with discussion). J. Roy. Statist. Soc. Ser. B Borsuk, M.E., Stow, C.A., Reckhow, K.H. (2003) Integrated approach to total maximum daily load development for Neuse River Estuary using Bayesian probability network model. J. Water Res. Pl. & Mgmt. 129, Bøttcher, S.G. and Dethlefsen, C. (2003). DEAL: A package for learning Bayesian networks. Technical report, R Department of Mathematical Sciences, Aalborg University. BUGS. Darroch, J.N., Lauritzen, S.L. and Speed, T.P. (1980). Markov fields and log-linear interaction models for contingency tables. Ann. Stat Dawid, A. and Lauritzen, S. (1993). Hyper Markov laws in the statistical analysis of decomposable graphical models. Ann. Stat., DEAL. Dellaportas, P., Forster, J.J. (1999). Markov chain Monte Carlo model determination for hierarchical and graphical log-linear models. Biometrika Deshpande, A., Garofalakis, M. and Jordan, M.I. (2001). In J.Breese and D. Koller (eds)., Uncertainty in Artificial Intelligence (UAI), Proceedings of the Seventeenth Conference Fronk, E. and Giudici, P. (2000). Markov chain Monte Carlo model selection for DAG models. Technical report #118, Dept. of Political Economy and Quantitative Methods, University of Pavia. Gansner, E., Koutsofis, E., and North, S. Drawing graphs with dot. Gibbs, W. (1902). Elementary Principles of Statistical Mechanics. Yale University Press, New Haven, Connecticut. Giudici, P., Green, P.J. and Tarantola, C. (2000). Efficient model determination for discrete graphical models. Biometrika. To appear. Giudici, P. and Green, P.J. (1999). Decomposable graphical Gaussian model determination. Biometrika Green, P.J. (1995). Reversible jump Markov chain Monte Carlo computation and Bayesian model determination. Biometrika HUGIN. Lauritzen, S.L. (2003). gRaphical models -- A Software Perspective. Slides from DSC Lauritzen, S.L. and Spiegelhalter, D.J. (1988). Local computations with probabilities on graphical structures and their application to expert systems (with discussion). J. Roy. Stat. Soc. Ser. B Pearl, J. (1986). Fusion, propagation and structuring in belief networks. Artificial Intelligence Sprites, P., Glymour, C. and Scheines, R. (1993). Causation, Prediction and Search. Springer-Verlag, New York. TETRAD. Waagepetersen, R. and Sorensen, D. (2001). A tutorial on reversible jump MCMC with a view toward QTL-mapping. International Statistical Review Wright, S. (1921). Correlation and causation. J. Agric. Res Structural Learning Method Reversible Jump MCMC Method Probability: 0.130Probability: Directed GraphUndirected GraphChain Graph This research is funded by U.S.EPA – Science To Achieve Results (STAR) Program Cooperative Agreement #CR The work reported here was developed under the STAR Research Assistance Agreement CR awarded by the U.S. Environmental Protection Agency (EPA) to Colorado State University. This poster has not been formally reviewed by EPA. The views expressed here are solely those of the authors and the STARMAP, the Program they represents. EPA does not endorse any products or commercial services mentioned in this presentation. Probability: 0.107Probability: Relative Score = 1Relative Score = Relative Score = Relative Score = This poster is brought to you by: