Data Mining Cardiovascular Bayesian Networks Charles Twardy †, Ann Nicholson †, Kevin Korb †, John McNeil ‡ (Danny Liew ‡, Sophie Rogers ‡, Lucas Hope.

Slides:



Advertisements
Similar presentations
Bayes rule, priors and maximum a posteriori
Advertisements

How would you explain the smoking paradox. Smokers fair better after an infarction in hospital than non-smokers. This apparently disagrees with the view.
Naïve Bayes. Bayesian Reasoning Bayesian reasoning provides a probabilistic approach to inference. It is based on the assumption that the quantities of.
1 Some Comments on Sebastiani et al Nature Genetics 37(4)2005.
CPSC 502, Lecture 15Slide 1 Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 15 Nov, 1, 2011 Slide credit: C. Conati, S.
1 Knowledge Engineering for Bayesian Networks Ann Nicholson School of Computer Science and Software Engineering Monash University.
1 Statistical Modeling  To develop predictive Models by using sophisticated statistical techniques on large databases.
Evaluating Diagnostic Accuracy of Prostate Cancer Using Bayesian Analysis Part of an Undergraduate Research course Chantal D. Larose.
1 Knowledge Engineering for Bayesian Networks. 2 Probability theory for representing uncertainty l Assigns a numerical degree of belief between 0 and.
Model Assessment, Selection and Averaging
Knowledge Engineering for Bayesian Networks
1 Knowledge Engineering for Bayesian Networks Ann Nicholson School of Computer Science and Software Engineering Monash University.
Prediction Models in Medicine Clinical Decision Support The Road Ahead Chapter 10.
1 Knowledge Engineering for Bayesian Networks Ann Nicholson School of Computer Science and Software Engineering Monash University.
Data Mining Cardiovascular Bayesian Networks Charles Twardy †, Ann Nicholson †, Kevin Korb †, John McNeil ‡ (Danny Liew ‡, Sophie Rogers ‡, Lucas Hope.
1 Using Bayesian networks for Water Quality Prediction in Sydney Harbour Ann Nicholson Shannon Watson, Honours 2003 Charles Twardy, Research Fellow School.
x – independent variable (input)
Software Engineering Laboratory1 Introduction of Bayesian Network 4 / 20 / 2005 CSE634 Data Mining Prof. Anita Wasilewska Hiroo Kusaba.
Knowledge Engineering a Bayesian Network for an Ecological Risk Assessment (KEBN-ERA) Owen Woodberry Supervisors: Ann Nicholson Kevin Korb Carmel Pollino.
1 Knowledge Engineering for Bayesian Networks Ann Nicholson School of Computer Science and Software Engineering Monash University.
Incorporating Prior Information in Causal Discovery Rodney O'Donnell, Jahangir Alam, Bin Han, Kevin Korb and Ann Nicholson.
Using Bayesian Networks to Predict Water Quality in Sydney Harbour Final Presentation Name: Shannon Watson Supervisors: Ann Nicholson & Charles Twardy.
Data Mining CS 341, Spring 2007 Lecture 4: Data Mining Techniques (I)
Bayesian networks practice. Semantics e.g., P(j  m  a   b   e) = P(j | a) P(m | a) P(a |  b,  e) P(  b) P(  e) = … Suppose we have the variables.
Knowledge Engineering a Bayesian Network for an Ecological Risk Assessment (KEBN-ERA) Owen Woodberry Supervisors: Ann Nicholson Kevin Korb Carmel Pollino.
Modeling Gene Interactions in Disease CS 686 Bioinformatics.
Learning Bayesian Networks
Evaluation of Results (classifiers, and beyond) Biplav Srivastava Sources: [Witten&Frank00] Witten, I.H. and Frank, E. Data Mining - Practical Machine.
Validation of predictive regression models Ewout W. Steyerberg, PhD Clinical epidemiologist Frank E. Harrell, PhD Biostatistician.
Graphical models for combining multiple sources of information in observational studies Nicky Best Sylvia Richardson Chris Jackson Virgilio Gomez Sara.
The Bayesian Web Adding Reasoning with Uncertainty to the Semantic Web
Graphical Causal Models: Determining Causes from Observations William Marsh Risk Assessment and Decision Analysis (RADAR) Computer Science.
Multiple Choice Questions for discussion
B. RAMAMURTHY EAP#2: Data Mining, Statistical Analysis and Predictive Analytics for Automotive Domain CSE651C, B. Ramamurthy 1 6/28/2014.
Modeling Menstrual Cycle Length in Pre- and Peri-Menopausal Women Michael Elliott Xiaobi Huang Sioban Harlow University of Michigan School of Public Health.
A Brief Introduction to Graphical Models
Comparing the Parallel Automatic Composition of Inductive Applications with Stacking Methods Hidenao Abe & Takahira Yamaguchi Shizuoka University, JAPAN.
by B. Zadrozny and C. Elkan
Soft Computing Lecture 17 Introduction to probabilistic reasoning. Bayesian nets. Markov models.
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
Bayesian networks Classification, segmentation, time series prediction and more. Website: Twitter:
“PREDICTIVE MODELING” CoSBBI, July Jennifer Hu.
Prediction of Malignancy of Ovarian Tumors Using Least Squares Support Vector Machines C. Lu 1, T. Van Gestel 1, J. A. K. Suykens 1, S. Van Huffel 1, I.
Various topics Petter Mostad Overview Epidemiology Study types / data types Econometrics Time series data More about sampling –Estimation.
Bayesian Networks for Data Mining David Heckerman Microsoft Research (Data Mining and Knowledge Discovery 1, (1997))
LOGISTIC REGRESSION A statistical procedure to relate the probability of an event to explanatory variables Used in epidemiology to describe and evaluate.
Empirical Research Methods in Computer Science Lecture 7 November 30, 2005 Noah Smith.
Today Ensemble Methods. Recap of the course. Classifier Fusion
Learning With Bayesian Networks Markus Kalisch ETH Zürich.
© 2003 By Default! A Free sample background from Slide 1 PCI Risk Model Comparisons An alternative model for case level estimation.
METU Informatics Institute Min720 Pattern Classification with Bio-Medical Applications Lecture notes 9 Bayesian Belief Networks.
Slides for “Data Mining” by I. H. Witten and E. Frank.
Ch15: Decision Theory & Bayesian Inference 15.1: INTRO: We are back to some theoretical statistics: 1.Decision Theory –Make decisions in the presence of.
Lecture 2: Statistical learning primer for biologists
Heart Disease Example Male residents age Two models examined A) independence 1)logit(╥) = α B) linear logit 1)logit(╥) = α + βx¡
Designing Factorial Experiments with Binary Response Tel-Aviv University Faculty of Exact Sciences Department of Statistics and Operations Research Hovav.
NTU & MSRA Ming-Feng Tsai
Bayesian Optimization Algorithm, Decision Graphs, and Occam’s Razor Martin Pelikan, David E. Goldberg, and Kumara Sastry IlliGAL Report No May.
04/21/2005 CS673 1 Being Bayesian About Network Structure A Bayesian Approach to Structure Discovery in Bayesian Networks Nir Friedman and Daphne Koller.
Modeling of Core Protection Calculator System Software February 28, 2005 Kim, Sung Ho Kim, Sung Ho.
Ch 1. Introduction Pattern Recognition and Machine Learning, C. M. Bishop, Updated by J.-H. Eom (2 nd round revision) Summarized by K.-I.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Overview G. Jogesh Babu. R Programming environment Introduction to R programming language R is an integrated suite of software facilities for data manipulation,
Bootstrap and Model Validation
Meredith L. Wilcox FIU, Department of Epidemiology/Biostatistics
Chapter Six Normal Curves and Sampling Probability Distributions
Pattern Recognition and Machine Learning
CS639: Data Management for Data Science
Mathematical Foundations of BME Reza Shadmehr
Is Statistics=Data Science
Presentation transcript:

Data Mining Cardiovascular Bayesian Networks Charles Twardy †, Ann Nicholson †, Kevin Korb †, John McNeil ‡ (Danny Liew ‡, Sophie Rogers ‡, Lucas Hope † ) † School of Computer Science & Software Engineering ‡ Dept. of Epidemilogy & Preventive Medicine Monash University

Overview Medical Experts 2 epidemiological models 1. Knowledge Engineering Causal discovery (CaMML) + Other learners 3. Evaluation 2. Data Mining Busselton Study data Problem: assessment of risk for coronary heart disease (CHD) Bayesian network software (Netica)

Knowledge Engineering BNs from the medical literature l The Australian Busselton Study »every 3 years, , > 8,000 participants »mortality followup via WA death register + manually »Cox proportional-hazards model, 2,258 from 1978 cohort »CHD event base rates: 23% for men, 14% for women l The German PROCAM Study » , followup every 2 years, > 25,000 participants »Scoring model (based on Cox), ~5,000 men »CHD event base rates: ~6% General question: are models transferable across populations?

Bayesian networks (BNs) l Use probability theory for representing uncertainty l Represents a probability distribution graphically (directed acyclic graphs) l Nodes: random variables (discrete, continuous) l Arcs indicate conditional dependencies between variables »P(X,Y,Z) can be decomposed to P(X)P(Y|X)P(Z|X) l Conditional Probability Distribution (CPD) »Associated with each variable, probability of each state given parent states l BN inference »Evidence: observation of specific state »Task: compute the posterior probabilities for query node(s) given evidence.

The Busselton BN: nodes

The Busselton BN: arcs predictor variables uninformative 10-year risk of CHD event P(S,B,Al,At) =P(S)P(B|S)P(Al|S)P(At|S) BNs summarize the joint distribution All nodes have an associated conditional prob. distribution

The Busselton BN: discretization discretization choices binary nodes

The Busselton BN: reasoning

Bad cholesterol Heavy smoking Normal

The Busselton BN: reasoning More risk factors !

A risk assessment tool for clinicians l Previous tool: TAKEHEART l Combine risk assessment (probability) with costs.

Risk Assessment Tool: example Young, predictor not observed – don’t treat old, predictor not observed – treatNot so old, predictor not observed – treat Young, predictor observed – don’t treat

PROCAM BN

CaMML: a causal learner l Developed at Monash University l Data mines BNs from epidemiological data l Minimum message length (MML) metric: Trades-off complexity vs goodness of fit l MCMC search over model space

CaMML: example BN

Evaluation l Predicting 10 year risk of CHD using Busselton data l Split data training/testing l 10 fold cross validation l Metrics: »Predictive Accuracy »ROC Curves (area under curve): correct classification vs false positives »Bayesian Information Reward (BIR) l Using Weka: Java environment for machine learning tools and techniques

Predictive accuracy l Examining each joint observation in the sample l Adding any available evidence for the other nodes l Updating the network l Use value with highest probability as predicted value l Compare predicted value with the actual value

Information Reward l Rewards calibration of probabilities l Zero reward for just reporting priors l Unbounded below for a bad prediction l Bounded above by a maximum that depends on priors Reward = 0 Repeat If I == correct state IR += log ( 1 / p[i] ) else IR += log ( 1 / 1 - p[i] )

Experimental Evaluation l Experiment 1: »Compare Busselton, PROCAM and CaMML BNs l Experiment 2 »Compare CaMML and other standard machine learners (from Weka)

Evaluation: Weka learners l Naïve Bayes l J48 (version of C4.5) l CaMML –Causal BN learner, using MML metric l AODE l TAN l Logistic Pr=1/3

Experiment 1: ROC Results Area under curve (AUC) priors No-one at risk! Everyone at risk! Extremes:

Experiment 1: Bayesian Info Reward

Experiment 2: ROC Results

Experiment 2: Bayesian Info Reward

Summary of Results Experiment I (Models of whole data) l PROCAM model does at least as well as Busselton » On Busselton data » For both "relative" (ROC) and "absolute" (BIR) risk l CaMML Models do as well »But much simpler: only 4 nodes matter to CHD10! Experiment II (Cross-validation of learners) l Logistic regression does best on both metrics »Statistically powerful: only 1 parameter per arc »No search required: structure is given »No discretization necessary

Conclusions l Busselton & PROCAM models appear to perform equally well on Busselton data, using an absolute risk measure (BIR) from the literature l CaMML results suggest the data have high variance and are too weak to support inference to complex models. Combining data would help.

Future directions l Improve data mining by »Adding prior knowledge to search »Assessing whether data sources can be combined; if so, do so l Investigate combination of continuous and discrete variables in data mining and modeling l Develop new TAKEHEART model using BNs (taking the best from experts, literature, data mining) »with intervention modeling (Causal Reckoner) »with decision support »with GUI, usable by clinicians

References l G. Assmann, P. Cullen and H. Schulte. Simple scoring scheme for calculating the risk of acute coronary events based on the 10-year follow-up of the Prospective Cardiovascular Munster (PROCAM) study. Circulation, 105(3): , l M.W. Knuiman, H.T. Vu and H. C. Bartholomew. Multivariate risk estimation for coronary heart disease: the Busselton Health Study, Australian & New Zealand Journal of Public Health, 22: , l C.S. Wallace and K.B. Korb. Learning Linear Causal Models by MML Sampling, In A. Gammerman, editor, Causal Models and Intelligent Data Management, pages Springer-Verlag, l C.R. Twardy, A.E. Nicholson, K.B. Korb and J. McNeil. Data Mining Cardiovascular Bayesian Networks. Technical report 2004/165. School of Computer Science and Software Engineering, Monash University, l C.R. Twardy, A.E. Nicholson and K.B. Korb. Knowledge engineering cardiovascular Bayesian networks from the literature, Technical Report 2005/170, School of CSSE, Monash University, 2005.