1 in data …uncertainty and complexity in models and.

Slides:



Advertisements
Similar presentations
SJS SDI_141 Design of Statistical Investigations Stephen Senn 14 Case Control Studies.
Advertisements

What I am after from gR2002 Peter Green, University of Bristol, UK.
1 Structure and Uncertainty Peter Green, University of Bristol, 10 July 2003.
A Tutorial on Learning with Bayesian Networks
State Estimation and Kalman Filtering CS B659 Spring 2013 Kris Hauser.
Discovering Cyclic Causal Models by Independent Components Analysis Gustavo Lacerda Peter Spirtes Joseph Ramsey Patrik O. Hoyer.
© Department of Statistics 2012 STATS 330 Lecture 32: Slide 1 Stats 330: Lecture 32.
Bayesian inference “Very much lies in the posterior distribution” Bayesian definition of sufficiency: A statistic T (x 1, …, x n ) is sufficient for 
Introduction of Probabilistic Reasoning and Bayesian Networks
EE462 MLCV Lecture Introduction of Graphical Models Markov Random Fields Segmentation Tae-Kyun Kim 1.
Categorical Data. To identify any association between two categorical data. Example: 1,073 subjects of both genders were recruited for a study where the.
1 in data, and …uncertainty and complexity in models.
1 Graphical Models in Data Assimilation Problems Alexander Ihler UC Irvine Collaborators: Sergey Kirshner Andrew Robertson Padhraic Smyth.
Graphical Models Lei Tang. Review of Graphical Models Directed Graph (DAG, Bayesian Network, Belief Network) Typically used to represent causal relationship.
Goal: Reconstruct Cellular Networks Biocarta. Conditions Genes.
Learning Bayesian Networks
Today Logistic Regression Decision Trees Redux Graphical Models
. Approximate Inference Slides by Nir Friedman. When can we hope to approximate? Two situations: u Highly stochastic distributions “Far” evidence is discarded.
Bayesian Networks Alan Ritter.
Computer vision: models, learning and inference Chapter 10 Graphical Models.
1 Inference About a Population Variance Sometimes we are interested in making inference about the variability of processes. Examples: –Investors use variance.
Learning In Bayesian Networks. Learning Problem Set of random variables X = {W, X, Y, Z, …} Training set D = { x 1, x 2, …, x N }  Each observation specifies.
Bayes Net Perspectives on Causation and Causal Inference
Machine Learning CUNY Graduate Center Lecture 21: Graphical Models.
Read R&N Ch Next lecture: Read R&N
Latent Variable Models Christopher M. Bishop. 1. Density Modeling A standard approach: parametric models  a number of adaptive parameters  Gaussian.
A Brief Introduction to Graphical Models
Machine Learning Lecture 23: Statistical Estimation with Sampling Iain Murray’s MLSS lecture on videolectures.net:
Bayesian networks Classification, segmentation, time series prediction and more. Website: Twitter:
Incorporating heterogeneity in meta-analyses: A case study Liz Stojanovski University of Newcastle Presentation at IBS Taupo, New Zealand, 2009.
第十讲 概率图模型导论 Chapter 10 Introduction to Probabilistic Graphical Models
Bayesian Networks What is the likelihood of X given evidence E? i.e. P(X|E) = ?
2 Syntax of Bayesian networks Semantics of Bayesian networks Efficient representation of conditional distributions Exact inference by enumeration Exact.
Introduction to Bayesian Networks
1 Generative and Discriminative Models Jie Tang Department of Computer Science & Technology Tsinghua University 2012.
2006 Summer Epi/Bio Institute1 Module IV: Applications of Multi-level Models to Spatial Epidemiology Instructor: Elizabeth Johnson Lecture Developed: Francesca.
Ch 8. Graphical Models Pattern Recognition and Machine Learning, C. M. Bishop, Revised by M.-O. Heo Summarized by J.W. Nam Biointelligence Laboratory,
© Department of Statistics 2012 STATS 330 Lecture 20: Slide 1 Stats 330: Lecture 20.
1 structure in models and data A graphic account of Peter Green, University of Bristol RSS Manchester Local Group, 5 June 2002.
Bayesian Multivariate Logistic Regression by Sean O’Brien and David Dunson (Biometrics, 2004 ) Presented by Lihan He ECE, Duke University May 16, 2008.
Lecture 13: Linkage Analysis VI Date: 10/08/02  Complex models  Pedigrees  Elston-Stewart Algorithm  Lander-Green Algorithm.
Computing & Information Sciences Kansas State University Data Sciences Summer Institute Multimodal Information Access and Synthesis Learning and Reasoning.
Latent Class Regression Model Graphical Diagnostics Using an MCMC Estimation Procedure Elizabeth S. Garrett Scott L. Zeger Johns Hopkins University
The famous “sprinkler” example (J. Pearl, Probabilistic Reasoning in Intelligent Systems, 1988)
Learning In Bayesian Networks. General Learning Problem Set of random variables X = {X 1, X 2, X 3, X 4, …} Training set D = { X (1), X (2), …, X (N)
Probabilistic Graphical Models seminar 15/16 ( ) Haim Kaplan Tel Aviv University.
Lecture 2: Statistical learning primer for biologists
The generalization of Bayes for continuous densities is that we have some density f(y|  ) where y and  are vectors of data and parameters with  being.
Organization of statistical research. The role of Biostatisticians Biostatisticians play essential roles in designing studies, analyzing data and.
1 CMSC 671 Fall 2001 Class #20 – Thursday, November 8.
BIOSTATISTICS Lecture 2. The role of Biostatisticians Biostatisticians play essential roles in designing studies, analyzing data and creating methods.
Probability and odds Suppose we a frequency distribution for the variable “TB status” The probability of an individual having TB is frequencyRelative.
04/21/2005 CS673 1 Being Bayesian About Network Structure A Bayesian Approach to Structure Discovery in Bayesian Networks Nir Friedman and Daphne Koller.
CS Statistical Machine learning Lecture 25 Yuan (Alan) Qi Purdue CS Nov
Introduction on Graphic Models
1 Statistical Analysis Professor Lynne Stokes Department of Statistical Science Lecture #1 Chi-square Contingency Table Test.
1 Part09: Applications of Multi- level Models to Spatial Epidemiology Francesca Dominici & Scott L Zeger.
Computational methods for inferring cellular networks II Stat 877 Apr 17 th, 2014 Sushmita Roy.
1 Module IV: Applications of Multi-level Models to Spatial Epidemiology Francesca Dominici & Scott L Zeger.
Institute of Statistics and Decision Sciences In Defense of a Dissertation Submitted for the Degree of Doctor of Philosophy 26 July 2005 Regression Model.
CS 2750: Machine Learning Review
CS 2750: Machine Learning Directed Graphical Models
Constrained Hidden Markov Models for Population-based Haplotyping
Qian Liu CSE spring University of Pennsylvania
Correlation – Regression
Learning Bayesian Network Models from Data
Markov Random Fields Presented by: Vladan Radosavljevic.
Class #16 – Tuesday, October 26
Markov Networks.
Presentation transcript:

1 in data …uncertainty and complexity in models and

2 What do I mean by structure? The key idea is conditional independence: x and z are conditionally independent given y if p(x,z|y) = p(x|y)p(z|y) … implying, for example, that p(x|y,z) = p(x|y) CI turns out to be a remarkably powerful and pervasive idea in probability and statistics

3 How to represent this structure? The idea of graphical modelling: we draw graphs in which nodes represent variables, connected by lines and arrows representing relationships We separate logical (the graph) and quantitative (the assumed distributions) aspects of the model

4 Regression Graphical models Contingency tables Spatial statistics Sufficiency Markov chains Covariance selection Statistical physics Genetics AI

5 Graphical modelling [1] Assuming structure to do probability calculations Inferring structure to make substantive conclusions Structure in model building Inference about latent variables

6 Basic DAG in general: for example:

7 A natural DAG from genetics ABAO OO

8 A natural DAG from genetics ABAO OO AOABAO

9 DNA forensics example (thanks to Julia Mortera) A blood stain is found at a crime scene A body is found somewhere else! There is a suspect DNA profiles on all three - crime scene sample is a ‘mixed trace’: is it a mix of the victim and the suspect?

10 DNA forensics in Hugin Disaggregate problem in terms of paternal and maternal genes of both victim and suspect. Assume Hardy-Weinberg equilibrium We have profiles on 8 STR markers - treated as independent (linkage equilibrium)

11 DNA forensics The data: 2 of 8 markers show more than 2 alleles at crime scene  mixture of 2 or more people

12 DNA forensics in Hugin

13 DNA forensics Population gene frequencies for D7S820 (used as ‘prior’ on ‘founder’ nodes):

14

15 DNA forensics Results (suspect+victim vs. unknown+victim):

16 Graphical modelling [2] Assuming structure to do probability calculations Inferring structure to make substantive conclusions Structure in model building Inference about latent variables

17 Conditional independence graph draw an (undirected) edge between variables  and  if they are not conditionally independent given all other variables   

18 Infant mortality example Data on infant mortality from 2 clinics, by level of ante-natal care (Bishop, Biometrics, 1969) :

19 Infant mortality example Same data broken down also by clinic:

20 Analysis of deviance Resid Resid Df Deviance Df Dev P(>|Chi|) NULL Clinic e-19 Ante Survival e-169 Clinic:Ante e-44 Clinic:Survival e-05 Ante:Survival Clinic:Ante:Survival e

21 Infant mortality example ante clinic survival survival and clinic are dependent and ante and clinic are dependent but survival and ante are CI given clinic

22 Prognostic factors for coronary heart disease strenuous physical work? family history of CHD? strenuous mental work? blood pressure > 140? smoking? ratio of  and  lipoproteins >3? Analysis of a 2 6 contingency table (Edwards & Havranek, Biometrika, 1985)

23 Graphical modelling [3] Assuming structure to do probability calculations Inferring structure to make substantive conclusions Structure in model building Inference about latent variables

24 Modelling with undirected graphs Directed acyclic graphs are a natural representation of the way we usually specify a statistical model - directionally: disease  symptom past  future parameters  data ….. However, sometimes (e.g. spatial models) there is no natural direction

25 Scottish lip cancer data The rates of lip cancer in 56 counties in Scotland have been analysed by Clayton and Kaldor (1987) and Breslow and Clayton (1993) (the analysis here is based on the example in the WinBugs manual)

26 Scottish lip cancer data (2) The data include a covariate measuring the percentage of the population engaged in agriculture, fishing, or forestry, and the "position'' of each county expressed as a list of adjacent counties. the observed and expected cases (expected numbers based on the population and its age and sex distribution in the county),

27 Scottish lip cancer data (3) CountyObsExpxSMR Adjacent casescases(% in counties agric.) ,9,11, , ,24,30,33,45,55

28 Model for lip cancer data (1) Graph observed counts random spatial effects covariate regression coefficient relative risks

29 Model for lip cancer data Data: Link function: Random spatial effects: Priors: (2) Distributions

30 WinBugs for lip cancer data Bugs and WinBugs are systems for estimating the posterior distribution in a Bayesian model by simulation, using MCMC Data analytic techniques can be used to summarise (marginal) posteriors for parameters of interest

31 WinBugs for lip cancer data Dynamic traces for some parameters:

32 WinBugs for lip cancer data Posterior densities for some parameters:

33 Graphical modelling [4] Assuming structure to do probability calculations Inferring structure to make substantive conclusions Structure in model building Inference about latent variables

34 Latent variable problems variable unknownvariable known edges known value set known value set unknown edges unknown

35 Hidden Markov models z0z0 z1z1 z2z2 z3z3 z4z4 y1y1 y2y2 y3y3 y4y4 e.g. Hidden Markov chain observed hidden

36 relative risk parameters Hidden Markov models Richardson & Green (2000) used a hidden Markov random field model for disease mapping observed incidence expected incidence hidden MRF

37 Larynx cancer in females in France SMRs

38 Latent variable problems variable unknownvariable known edges known value set knownvalue set unknown edges unknown

39 Ion channel model choice Hodgson and Green, Proc Roy Soc Lond A, 1999

40 Example hidden continuous time models O2O2 O1O1 C1C1 C2C2 O1O1 O2O2 C1C1 C2C2 C3C3

41 Ion channel model DAG levels & variances model indicator transition rates hidden state data binary signal

42 levels & variances model indicator transition rates hidden state data binary signal O1O1 O2O2 C1C1 C2C2 C3C3 * * * * * * * * * * *

43 Posterior model probabilities O1O1 C1C1 O2O2 O1O1 C1C1 O2O2 O1O1 C1C1 C2C2 O1O1 C1C1 C2C

44 ‘Alarm’ network Learning a Bayesian network, for an ICU ventilator management system, from cases on 37 variables (Spirtes & Meek, 1995)

45 Latent variable problems variable unknown variable known edges known value set knownvalue set unknown edges unknown

46 Wisconsin students college plans sessex pe iq cp 10,318 high school seniors (Sewell & Shah, 1968, and many authors since) 5 categorical variables: sex (2) socioeconomic status (4) IQ (4) parental encouragement (2) college plans (2)

47 sessex pe iq cp 5 categorical variables: sex (2) socioeconomic status (4) IQ (4) parental encouragement (2) college plans (2) (Vastly) most probable graph according to an exact Bayesian analysis by Heckerman (1999)

48 decompos h sessex pe iq cp Heckerman’s most probable graph with one hidden variable