Predicting Primary Myocardial Infarction from Electronic Health Records -Jitong Lou.

Slides:



Advertisements
Similar presentations
Learning Relational Probability Trees Jennifer Neville David Jensen Lisa Friedland Michael Hay Presented by Andrew Tjang.
Advertisements

Classification. Introduction A discriminant is a function that separates the examples of different classes. For example – IF (income > Q1 and saving >Q2)
1 Statistical Modeling  To develop predictive Models by using sophisticated statistical techniques on large databases.
What is Statistical Modeling
Overview Full Bayesian Learning MAP learning
Ensemble Learning: An Introduction
Data Mining with Decision Trees Lutz Hamel Dept. of Computer Science and Statistics University of Rhode Island.
Bagging LING 572 Fei Xia 1/24/06. Ensemble methods So far, we have covered several learning methods: FSA, HMM, DT, DL, TBL. Question: how to improve results?
Kernel Methods Part 2 Bing Han June 26, Local Likelihood Logistic Regression.
CS Bayesian Learning1 Bayesian Learning. CS Bayesian Learning2 States, causes, hypotheses. Observations, effect, data. We need to reconcile.
DATA MINING : CLASSIFICATION. Classification : Definition  Classification is a supervised learning.  Uses training sets which has correct answers (class.
Predicting Income from Census Data using Multiple Classifiers Presented By: Arghya Kusum Das Arnab Ganguly Manohar Karki Saikat Basu Subhajit Sidhanta.
Processing of large document collections Part 2 (Text categorization) Helena Ahonen-Myka Spring 2006.
by B. Zadrozny and C. Elkan
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
1 Logistic Regression Adapted from: Tom Mitchell’s Machine Learning Book Evan Wei Xiang and Qiang Yang.
Bayesian networks Classification, segmentation, time series prediction and more. Website: Twitter:
Data Mining: Classification & Predication Hosam Al-Samarraie, PhD. Centre for Instructional Technology & Multimedia Universiti Sains Malaysia.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.6: Linear Models Rodney Nielsen Many of.
CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.
ESL Chap1 - Introduction Statistical Learning Problems Identify the risk factors for prostate cancer, based on clinical and demographic variables.
Computer Vision Lecture 6. Probabilistic Methods in Segmentation.
Multi-Speaker Modeling with Shared Prior Distributions and Model Structures for Bayesian Speech Synthesis Kei Hashimoto, Yoshihiko Nankaku, and Keiichi.
Active learning Haidong Shi, Nanyi Zeng Nov,12,2008.
Chapter1: Introduction Chapter2: Overview of Supervised Learning
Chapter 20 Classification and Estimation Classification – Feature selection Good feature have four characteristics: –Discrimination. Features.
IMPORTANCE OF STATISTICS MR.CHITHRAVEL.V ASST.PROFESSOR ACN.
Classification Ensemble Methods 1
Machine Learning 5. Parametric Methods.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Statistical Significance Hypothesis Testing.
BAYESIAN LEARNING. 2 Bayesian Classifiers Bayesian classifiers are statistical classifiers, and are based on Bayes theorem They can calculate the probability.
Ch 1. Introduction Pattern Recognition and Machine Learning, C. M. Bishop, Updated by J.-H. Eom (2 nd round revision) Summarized by K.-I.
Machine Learning Usman Roshan Dept. of Computer Science NJIT.
Overview G. Jogesh Babu. R Programming environment Introduction to R programming language R is an integrated suite of software facilities for data manipulation,
Kelci J. Miclaus, PhD Advanced Analytics R&D Manager JMP Life Sciences
Bootstrap and Model Validation
Data Science Credibility: Evaluating What’s Been Learned
Machine Learning: Ensemble Methods
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
CS 9633 Machine Learning Support Vector Machines
Chapter 7. Classification and Prediction
Math 6330: Statistical Consulting Class 5
Evaluating Classifiers
Machine Learning Logistic Regression
LECTURE 33: STATISTICAL SIGNIFICANCE AND CONFIDENCE (CONT.)
Online Conditional Outlier Detection in Nonstationary Time Series
Econometrics ITFD Week 8.
Boosting and Additive Trees (2)
Boosting and Additive Trees
An Enhanced Support Vector Machine Model for Intrusion Detection
Machine Learning Basics
Data Mining Lecture 11.
Overview of Supervised Learning
Some Epidemiological Studies
CS548 Fall 2017 Decision Trees / Random Forest Showcase by Yimin Lin, Youqiao Ma, Ran Lin, Shaoju Wu, Bhon Bunnag Showcasing work by Cano,
Machine Learning Logistic Regression
Machine Learning Week 1.
Data Mining Practical Machine Learning Tools and Techniques
Martijn Schuemie, Peter Rijnbeek, Jenna Reps, Marc Suchard
Instance Based Learning
Multiple Decision Trees ISQS7342
Support Vector Machines
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
The loss function, the normal equation,
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Mathematical Foundations of BME Reza Shadmehr
Parametric Methods Berlin Chen, 2005 References:
Ch 3. Linear Models for Regression (2/2) Pattern Recognition and Machine Learning, C. M. Bishop, Previously summarized by Yung-Kyun Noh Updated.
Linear Discrimination
Is Statistics=Data Science
Presentation transcript:

Predicting Primary Myocardial Infarction from Electronic Health Records -Jitong Lou

Objective Myocardial infarctions(MIs), known as heart attacks, are commonly and deadly, causing one in six deaths overall in the U.S. totaling 400,000 per year Longitudinal methods such as case-control studies, cross-sectional studies, cohort studies, and randomized controlled trials have limitations Expensive Provide the risk assessment for just one particular variable Baseline variables are important to determine risk Make measurements at fixed points in time Conduct fewer studies, analyze the effects of many variables, make reliable predictions without the regular checkups and baseline variables.

Electronic Health Records(EHRs) Data EHRs are an emerging relational domain with large potential to improve clinical outcomes The EHR database consists of tables including patient information such as diagnoses, drugs, labs, and genetic information.

Electronic Health Records(EHRs) Data For example, Framingham Heat Study (FRS) data is recalculated every two years, where one based on the HER would be updated as new clinical event occurs

Relational Data Relational data violate two assumptions of conventional classification techniques. First, algorithms designed for propositional data assume the data are independent and identically distributed (i.i.d.). Relational data, on the other hand, have dependencies both as a result of direct relations (e.g. a patient is using a certain treatment) and through chaining multiple relations together (e.g. all the records of the same patient). Second, algorithms for propositional data assume that the data instances are recorded in homogeneous structures (a fixed set of fields for each object), but relational data “instances” are usually more varied and complex. For example, some patients may take 2 kinds of medicine and others may take 10. A relational classification technique needs to contend with dependent, heterogeneous data instances for both learning and inference.

Relational Data A relational classification model takes collection of subgraphs as input, that is, single target object to be classified and other objects/links in the related neighborhood. Construct a classifier to predict target class label(MI or not) given: The attributes(such as cholesterol levels, gender, smoking status, blood pressure, history of alcoholism and valve replacement) of target object The attributes(such as treatment effect, price) of other objects in neighborhood(cardiac drugs the patient is using) The degree attributes/summary statistics counting object and neighborhood objects

Relational Functional Gradient Boosting(RFGB) Assume that the training examples are of the form (𝑥 𝑖 , 𝑦 𝑖 ) for 𝑖 = 1, ..., 𝑁 and y i ∈{0,1} where 𝑦 𝑖 = 1 indicates MI and x represents the set of all observations about the current patient 𝑖. The goal is to fit a model 𝑃 𝑦 𝑥 𝑖 ∝ e Ψ(y,x) The standard method of supervised learning is based on gradient descent directly on the parameters where the learning algorithm starts with initial parameters and computes the gradient of the likelihood function The key difference between RFGB and the standard method is that the gradients are computed directly on the functions instead of parameters

Relational Functional Gradient Boosting(RFGB) RFGB starts with an initial potential Ψ 0 and iteratively adds gradients Δ 𝑖 . After 𝑚 iterations, the potential is given by Ψ 𝑚 = Ψ 0 + Δ 1 +…+ Δ 𝑚 Δ 𝑚 is the functional gradient at iteration 𝑚 and Δ 𝑚 = 𝜂 𝑚 × 𝐸 𝑥,𝑦 [ 𝜕𝑙𝑜𝑔𝑃 𝑦 𝑥; Ψ 𝑚−1 𝜕Ψ ​ Ψ= Ψ 𝑚−1 ] The expectation 𝐸 𝑥,𝑦 cannot be computed as the joint distribution 𝑃(𝑥,𝑦) is unknown, so the functional gradient are computed for each training example 𝑖 given as ( 𝑥 𝑖 , 𝑦 𝑖 ). Now this set of local gradients forms a set of training samples for the gradient at iteration 𝑚. The direction of (relational) regression tree ℎ 𝑚 on the training samples [( 𝑥 𝑖 , 𝑦 𝑖 ), Δ 𝑚 ( 𝑦 𝑖 ; 𝑥 𝑖 )] will approximate the true function gradient The functional gradient with respect to Ψ 𝑦 𝑖 =1; 𝑥 𝑖 of the likelihood for each example ( 𝑥 𝑖 , 𝑦 𝑖 ) can be shown to be 𝜕𝑙𝑜𝑔𝑃 𝑦 𝑖 ; 𝑥 𝑖 𝜕Ψ( 𝑦 𝑖 =1; 𝑥 𝑖 ) =𝐼 𝑦 𝑖 =1; 𝑥 𝑖 −𝑃( 𝑦 𝑖 =1; 𝑥 𝑖 )

Relational Functional Gradient Boosting(RFGB) 𝐹 𝑚 𝑘 : model for predicate 𝑘 at iteration 𝑚 𝑆 𝑘 : training samples for Δ(𝑘) Δ 𝑚 𝑘 : components of the functional gradient at iteration 𝑚 𝐿: number of leaves in each functional gradient Δ 𝑚 (𝑘)

Model Comparisons The paper compared RFGB models to boosted decision trees (AdaBoostM1 (Ada); default parameters) and RPTs with decision tree learners (J48; C=0.25, M=2). Other common models were also included: Naive Bayes (NB; default parameters), tree-augmented naive Bayes (TAN; Simple Estimator), support vector machines (SVMs; linear kernel, C=1.0; radial basis function kernel, C=250007, G=0.01), and random forests (RF; 10 trees, default parameters).

Results The best cross-validated predictor of primary MI according to AUC-ROC was the RFGB model The RPT model did not score as well, ranking in the middle of the propositional learners. RFGB and RPT models significantly outperformed their direct propositional analogs (Boosted Tree and Tree models, respectively). The Bayesian model (NB; TAN) scores may be somewhat inflated because only features known to be CHD risk factors were specifically chosen for this analysis.

Results A false negative incurs the costs of untreated human morbidity, and usually expensive, delayed treatments, so models with many false negatives (that is, low recall) cannot be accepted In the high-recall region, RFGB gives the highest precision.

References Weiss, J. C., Natarajan, S., Peissig, P. L., McCarty, C. A., & Page, D. (2012). Machine learning for personalized medicine: Predicting primary myocardial infarction from electronic health records. AI Magazine, 33(4), 33-45. Natarajan, S.; Khot, T.; Kersting, K.; Guttmann, B.; and Shavlik, J. 2011b. Gradient-Based Boosting for Statistical Relational Learning: The Relational Dependency Network Case. Machine Learning 86(1): 25–56. Neville, J.; Jensen, D.; Friedland, L.; and Hay, M. 2003. Learning Relational Probability Trees. In Proceedings of the 9th Knowledge Discovery and Data Mining (KDD) Conference. New York: Association for Computing Machinery.

Thank you!