Constructing a Predictor to Identify Drug and Adverse Event Pairs

Slides:

Advertisements

Similar presentations

Statistical Machine Learning- The Basic Approach and Current Research Challenges Shai Ben-David CS497 February, 2007.

Advertisements

ECG Signal processing (2)

Random Forest Predrag Radenković 3237/10

Cognitive Modelling – An exemplar-based context model Benjamin Moloney Student No:

Intelligent Systems Lab. Recognizing Human actions from Still Images with Latent Poses Authors: Weilong Yang, Yang Wang, and Greg Mori Simon Fraser University,

Paper presentation for CSI5388 PENGCHENG XI Mar. 23, 2005

Standard electrode arrays for recording EEG are placed on the surface of the brain. Detection of High Frequency Oscillations Using Support Vector Machines:

Predicting the Semantic Orientation of Adjective Vasileios Hatzivassiloglou and Kathleen R. McKeown Presented By Yash Satsangi.

How Science Works Glossary AS Level. Accuracy An accurate measurement is one which is close to the true value.

Text Classification With Labeled and Unlabeled Data Presenter: Aleksandar Milisic Supervisor: Dr. David Albrecht.

Biomedical research methods. What are biomedical research methods? An integrated approach using chemical, mathematical and computer simulations, in vitro.

Prediction model building and feature selection with SVM in breast cancer diagnosis Cheng-Lung Huang, Hung-Chang Liao, Mu- Chen Chen Expert Systems with.

Kernel Methods A B M Shawkat Ali 1 2 Data Mining ¤ DM or KDD (Knowledge Discovery in Databases) Extracting previously unknown, valid, and actionable.

1 CS 391L: Machine Learning: Experimental Evaluation Raymond J. Mooney University of Texas at Austin.

Xiangnan Kong,Philip S. Yu Multi-Label Feature Selection for Graph Classification Department of Computer Science University of Illinois at Chicago.

Visual Categorization With Bags of Keypoints Original Authors: G. Csurka, C.R. Dance, L. Fan, J. Willamowski, C. Bray ECCV Workshop on Statistical Learning.

DNA Microarray Data Analysis using Artificial Neural Network Models. by Venkatanand Venkatachalapathy (‘Venkat’) ECE/ CS/ ME 539 Course Project.

Team Dogecoin: An Experience in Predicting Hospital Readmissions Acknowledgements The Problem Hospitals in the UK must keep track of which patients, once.

Support Vector Machines and Gene Function Prediction Brown et al PNAS. CS 466 Saurabh Sinha.

Molecular Classification of Cancer Class Discovery and Class Prediction by Gene Expression Monitoring.

Preventing Errors in Medicine

Iterative similarity based adaptation technique for Cross Domain text classification Under: Prof. Amitabha Mukherjee By: Narendra Roy Roll no: Group:

Enhanced Regulatory Sequence Prediction Using Gapped k-mer Features 王荣 14S

Combining Evolutionary Information Extracted From Frequency Profiles With Sequence-based Kernels For Protein Remote Homology Detection Name: ZhuFangzhi.

A Framework to Predict the Quality of Answers with Non-Textual Features Jiwoon Jeon, W. Bruce Croft(University of Massachusetts-Amherst) Joon Ho Lee (Soongsil.

On the Optimality of the Simple Bayesian Classifier under Zero-One Loss Pedro Domingos, Michael Pazzani Presented by Lu Ren Oct. 1, 2007.

Improving compound–protein interaction prediction by building up highly credible negative samples Toward more realistic drug-target interaction predictions.

Kelci J. Miclaus, PhD Advanced Analytics R&D Manager JMP Life Sciences

Mathematical Derivation of Probability

Gene Expression Database (GXD)

PREDICT 422: Practical Machine Learning

Robert Anderson SAS JMP

How to forecast solar flares?

Selecting the Best Measure for Your Study

Hyunghoon Cho, Bonnie Berger, Jian Peng Cell Systems

Boosted Augmented Naive Bayes. Efficient discriminative learning of

8. Causality assessment:

An Artificial Intelligence Approach to Precision Oncology

CRF &SVM in Medication Extraction

An Empirical Study of Learning to Rank for Entity Search

G. Suarez, J. Soares, S. Lopez, I. Obeid and J. Picone

Glenn Fung, Murat Dundar, Bharat Rao and Jinbo Bi

Alan Qi Thomas P. Minka Rosalind W. Picard Zoubin Ghahramani

Wei Wei, PhD, Zhanglong Ji, PhD, Lucila Ohno-Machado, MD, PhD

Natural Language Processing of Knee MRI Reports

9. Introduction to signal detection

An Enhanced Support Vector Machine Model for Intrusion Detection

Lesson ANOVA - D Two-Way ANOVA.

Schizophrenia Classification Using

Introduction Feature Extraction Discussions Conclusions Results

Features & Decision regions

Extra Tree Classifier-WS3 Bagging Classifier-WS3

Support Vector Machine (SVM)

Black Box Warning What You Need To Know.

Hyperparameters, bias-variance tradeoff, validation

Predict Failures with Developer Networks and Social Network Analysis

Learning Algorithm Evaluation

II.3 An Example and Analyzing Interactions

Using Multilingual Neural Re-ranking Models for Low Resource Target Languages in Cross-lingual Document Detection Using Multilingual Neural Re-ranking.

Fast Sequences of Non-spatial State Representations in Humans

Model Enhanced Classification of Serious Adverse Events

Single Sample Expression-Anchored Mechanisms Predict Survival in Head and Neck Cancer Yang et al Presented by Yves A. Lussier MD PhD The University.

MAS 622J Course Project Classification of Affective States - GP Semi-Supervised Learning, SVM and kNN Hyungil Ahn

Level of Evidence Lecture 4.

Introduction to Basic Research Methods

Regulatory Perspective of the Use of EHRs in RCTs

Hyunghoon Cho, Bonnie Berger, Jian Peng Cell Systems

Support Vector Machines 2

Advisor: Dr.vahidipour Zahra salimian Shaghayegh jalali Dec 2017

Presenter: Donovan Orn

Presentation transcript:

Constructing a Predictor to Identify Drug and Adverse Event Pairs Shah Lab Rick Huang, Bell Wang, Elsie Gyang MSc, Nigam Shah PhD Stanford School of Medicine, Stanford, CA Abstract Introduction (cont.) Methods & Materials (cont.) Results As a result, it becomes essential to create a model to accurately predict positive or negative drug-AE pairs. To create such a model, we must extract certain features from reliable databases that allow us to accurately classify drug-AE pairs as positive or negative. Using private clinical notes, we hope to extract features that significantly increases our predictor’s performance in addition to public database features. The FDA drug approval process aims to ensure that medications are safe for use. Even so, adverse, or undesired, events can still result. Given the severity of drug adverse events, it is imperative to develop ways of identifying potential adverse events to raise potential safety concerns. While public databases have already been used to build predictive models to identify drug-adverse event (AE) pairs, we show that clinical notes are also a strong source for predicting drug-AE pairs. From known usages in Medi-Span Drug Indications Database, we are able to construct a "gold standard" of known positive and negative drug-AE pairs. Using the National Medi-Span and Drugbank databases, we compute sixteen features including the cosine and Jaccard similarity index between related drugs, diseases, pathways, and categories. We compute this by considering a matrix of drugs with boolean indications of whether or not they are associated with a certain disease, pathway, or category. In addition, we extract nine features from clinical notes extracted from the Stanford Translational Research Integrated Database Environment (STRIDE), containing more than 2 million patients for a total of twenty-five features. We train a support vector machine model using the radial basis function kernel on the gold standard to predict positive or negative drug-AE pairs based on all features, only clinical note features, and only database features. While our predictor on all features achieved an accuracy of 96% in predicting positive and negative drug-AE pairs, we compared the performance from using clinical note features compared with database features to find that our model significantly improved by including the clinical note features. Overall, our hypothesis was supported, as the results show that using clinical note features in addition to public database features builds a stronger model to predict drug-AE pairs. Figure 4. Histograms of accuracy of model trained on only clinical note-based features and only public database-based features. Difference significance p << 0.01. Figure 1. C (y-axis) and sigma (x-axis) are mapped against the fraction error (z-axis) for the SVM model on the cross-validation set. We optimize based on overall accuracy, so we minimize the error. Hypothesis The final accuracy resulted from running the optimal model on the testing set. A two pairs test was performed to measure differences between the cross-validation accuracy and the testing accuracy. Another two pairs test was performed to measure differences between the accuracy of models using the clinical features and without clinical features. We hypothesize that it is possible to construct an accurate model to predict whether a certain drug-AE pair is positive or negative based on its features, with features extracted from clinical notes strengthening the prediction.. Methods & Materials Figure 5. Histograms of accuracy of model on cross-validation set and testing set for only clinical note-based features. Difference significance p < 0.01. The Medi-Span Drug Indication Database included mappings of known drug-AE pairs, which formed our “gold standard.” We construct features for our gold standard using empirical features such as mention count from the STRIDE5 database. We also include 16 other features such as similarity factors included from Medi-Span and DrugBank.2 Using MatLab, the “gold standard” features were all normalized using z-scores. We normalize unavailable features to the mean. This “gold standard” was then randomly split into a training set, a cross-validation set, and a testing set. An SVM using the RBF kernel from “kernlab” in R was run on the train set to create a model, and the model was run on the cross-validation set to determine initial accuracy. Constants C and sigma for the RBF kernel were varied to maximize this initial accuracy on the cross-validation set. Results Conclusions and Future Work Our hypothesis is supported in that we created an accurate model based on clinical note and public database features to identify positive drug-disease pairs. Features from clinical notes strengthen the prediction more than public database features. However, the features from clinical notes alone trained a model that overfit cross-validation data more than combining clinical note features and public database features to train a model. We next analyze general trends appearing between drugs and predicted AEs to determine potentially threatening AEs, and create a function dependent on correlation strength to give direction to research specific relations more in-depth. Figure 2. Histogram of accuracy of n=30 simulations on each split set. Mean accuracy of model is 96.64%. p>0.01 but p<0.05 for difference between cross-validation and test sets. Introduction Selected References While 21% of drug prescriptions are off-label prescriptions, only 27% of off-label drug use have evidence of being safe. Usage of off-label drugs can result in an AE. Roughly 30% of hospital stays include a patient suffering from an ADE, with around 2 million patients suffering from an AE reaction, and up to a hundred thousand patients succumb to AEs. In addition, over 75 billion dollars are spent treating AEs.1 Dataset Used Average Accuracy of Clinical and Database Features Average Accuracy of Clinical Features Only Average Accuracy of Database Features Only Training Dataset 99.4515% 99.6157% 93.3449% Cross-Val. Dataset 97.2444% 98.9058% 88.6475% Testing Dataset 97.1113% 98.6901% 88.6182% 1Ahmad SR. Adverse drug event monitoring at the Food and Drug Administration: your report can make a difference. J Gen Intern Med. 2003;18(1):57–60. 2Jung K, LePendu P, Chen WS, Iyer SV, Readhead B, et al. (2014) Automated Detection of Off-Label Drug Use. PLoS ONE 9(2): e89324. doi:10.1371/journal.pone.0089324 Acknowledgements The authors would like to thank the Stanford Institutes of Medical Research Summer Research Program and the members of the Shah lab for continued support and aid in this research. The author would also like to thank the Stanford Medical Hospital for providing information on patients from clinical notes. Figure 3. Averages over 30 trials of training data for different sets of features. Testing data is the measure for overall accuracy of the data.