Presentation is loading. Please wait.

Presentation is loading. Please wait.

Case Study for Clinical Relevancy: Asthma Scott T. Weiss, M.D., M.S. BRIGHAM AND WOMEN’S HOSPITAL HARVARD MEDICAL SCHOOL Professor of Medicine Harvard.

Similar presentations


Presentation on theme: "Case Study for Clinical Relevancy: Asthma Scott T. Weiss, M.D., M.S. BRIGHAM AND WOMEN’S HOSPITAL HARVARD MEDICAL SCHOOL Professor of Medicine Harvard."— Presentation transcript:

1 Case Study for Clinical Relevancy: Asthma Scott T. Weiss, M.D., M.S. BRIGHAM AND WOMEN’S HOSPITAL HARVARD MEDICAL SCHOOL Professor of Medicine Harvard Medical School Director, Center for Genomic Medicine Director, Program in Bioinformatics Associate Director, Channing Laboratory Brigham and Women’s Hospital Boston, MA

2 Outline Context: focus on process and data Overview of Asthma DBP Smoking as an example of the data issues Predicting COPD in those with asthma Predicting asthma exacerbations Genetic prediction of asthma exacerbations current status DNA collection Lessons Learned Conclusions

3 Context Channing Lab - extensive genetics & pharmacogenetics resources focused on airways diseases Faculty with clinical, epidemiology, genetic, and bioinformatics training and experience multidisciplinary research collaborative track record Good i2b2 driver: from bench to clinic Strong focus and direction for Cores

4 Broad Goals of Channing Program in Predictive Medicine Genetic variation  clinical practice  Disease risk (asthma diagnosis)  Natural history (exacerbations)  Individual response to medication (pharmacogenetics) Develop predictive tests (genetic and nongenetic) in Channing populations Validate these tests in Partners asthma cohort (PAC) at least as proof of concept

5 I2B2 Airways DBP: Overview RPDR Partners Clinical Services Extract data from Airways Disease patients Extract relevant quantitative and coded phenotypes Extract important phenotypes from text: NLP Predict clinical outcomes after adjustment for covariates RPDR: Recruit, validate, genotype Develop statistical models

6 Before we start Numerous important covariates e.g. age, tobacco, comorbidities, medications Adjust outcomes for covariates Some (eg age, gender,Dx, encounter) readily available Obtained through Core 4 Others require substantial effort e.g. medications, tobacco use, comorbid conditions Collaboration - NLP experts in Core 1

7 Phenotypes from text Extract specific data items –Medication –Smoking status –Diagnoses (Co-morbidity) Extract findings to assist with case selection Extract findings to assist with clinical predictions

8 Smoking Status- Examples HOSPITAL COURSE:... It was recommended that she receive …We also added Lactinax, oral form of Lactobacillus acidophilus to attempt a repopulation of her gut. SH: widow,lives alone,2 children,no tob/alcohol. BRIEF RESUME OF HOSPITAL COURSE: 63 yo woman with COPD, 50 pack-yr tobacco (quit 3 wks ago), spinal stenosis,... SOCIAL HISTORY: Negative for tobacco, alcohol, and IV drug abuse. SOCIAL HISTORY: The patient is a nonsmoker. No alcohol. SOCIAL HISTORY: The patient is married with four grown daughters, uses tobacco, has wine with dinner. Smoker Non-Smoker SOCIAL HISTORY: The patient lives in rehab, married. Unclear smoking history from the admission note… Past Smoker ??? Hard to pick

9 Smoking -Text Processing 952 Past smoker 427 Never smoked 146 Denies smoking Cases per class 50No. Attributes 261 Control cases 1010 Current Smoker 5No.Classes 2796No. Cases Manually classified

10 Smoking Status Raw sample ~ 20,000 reports Feature extraction >3000 Feature selection 25 - 1000 “Gold standard” sample cases ~ 2,800 Correct classification rate 46 - 81% (compared to Gold Standard) Preliminary results

11 Smoking Status 80.46231CV 10xNaïve BayesStemmed one-gram 80.92917CV 10xNaïve BayesStemmed one-gram 70.7325Split 2/3Naïve BayesBi-gram 49.5725Split 2/3SVMBi-gram 78.0250Split 2/3Naïve BayesOne-gram 25 50 No. Features Split 2/3 Test Cases Naïve Bayes SVM Classification Method 79.70One-gram More … 65.05Tri-gram 44.63Tri-gram % Correctly Classified Data Set Increase, combine features should improve performance Baseline performance Preliminary results

12 Feature Analysis  Classification  Clustering  Statistical Analysis  … Data Mining Pipeline “Raw” Patient Data --------- ------ --------- ------ --------- ------ --------- ------ --------- ------ --------- ------ Text Processing  Word/pattern filters  Stemming  Lexicon matching  Parsing  … Data Extraction “Smart Data”  Medications  Smoking status  Co-morbidity

13 Asthma Preceding COPD Significant overlap of asthma and COPD DX Common denominator = smoking Asthma is known to precede and predict the development of COPD independent of smoking Could we develop a multivariate clinical predictor that would predict which asthmatics would get COPD?

14 Study Design Source: Partners Healthcare Research Patient Data Repository (RPDR). RPDR: MGH, BWH, etc clinical repository for researchers. Training: 9349 asthmatics (843 COPD, 8506 controls) first encounter 1988 1998. Test: A future set of 992 asthmatics (46 COPD, 946 controls) first encounter from 1999-2002.

15 Data Collection Criteria: Patients observed for at least 5 years, at least 18 at the first encouter, and race, sex, height, weight, and smoking available. Comorbodities: International Classification of Diseases, 9th Revision (ICD-9) codes as admission diagnosis or ER primary diagnosis (104) COPD: ICD-9 code for “Chronic Bronchitis”, “Emphysema” “Chronic Airways Obstruction, not otherwise specified.”

16 Analysis Model: A Bayesian network was generated from the training set of 9349 asthmatics (843 COPD, 8506 controls) encountered between1988 and 1998 from 104 comoribities and race, gender, age, smoking. Results: The risk of COPD is modulated by gender, race, and smoking history, and 14 comorbidities: Viral and chlamydial infections, diabetes mellitus, volume depletion, acute myocardial infarction, intermediate coronary syndrome, cardiac dysrhythmias, heart failure, acute upper respiratory infections, acute bronchitis and bronchiolitis, pneumonia, early or threatened labor, normal delivery, shortness of breath, respiratory distress.

17 Network Model

18 Validation Propagation: a Bayesian network can compute the probability distribution of any variable given an instance of some or all the other variables. Test data: a future set of 992 asthmatics (46 COPD, 946 controls) first encounter from 1999-2002. Prediction: for each patient, predict the probability of COPD given the other elements in the network (co- morbidities and demographics). Validation: compare the predicted with the observed COPD status.

19 Predictive Validation

20 One variable at the time

21 Asthma Exacerbations Asthma attacks involve worsening of asthma symptoms including bronchoconstriction and inflammatory response Major cause of morbidity and mortality in asthma 11.7 million Americans have an exacerbation every year (3.9 million children) In US children, exacerbations are the third leading cause of hospitalizations (198,000 occurrences per year) Cost of asthma exacerbations US=4 billion dollars, Partners=20 million dollars

22

23

24

25 RPDR Exacerbation Prediction

26 Genetic Prediction of Asthma Exacerbation Objective Predict asthma exacerbation from genetic data Subjects 290 CAMP participants Not on steroids Followed for 10+ years Have genetic data available Phenotype Case: Reported overnight hospitalization(s) (n=83) Control: No overnight hospitalizations or ER visits (n=207) Genotype 2443 SNPs from 349 candidate genes In Hardy-Weinberg equilibrium among controls Minor allele frequency > 0.05

27 Exacerbation Model 132 of 2443 SNPs in 55 of 349 genes predict exacerbation

28 Validation Method: Prediction on fitted values Result: Area under the ROC curve (AUROC) is 0.97 AUROC = 0.97 AUROC measures accuracy as trade-off between sensitivity and specificity AUROCRating 0.5 - 0.6Fail 0.6 - 0.7Poor 0.7 - 0.8Fair 0.8 - 0.9Good 0.9 - 1.0Excellent

29 Cross-Validation Method: 20-fold cross-validation to test robustness 1.Data is split into 20 groups 2.One group is used as independent and remaining 19 are used to quantify the model 3.(2) is repeated until each group has been independent set Result: AUROC is 0.84 (good) AUROC = 0.84

30 Partners Asthma DNA collection #1 Recruit Partners asthma patients Partners Asthma Center, NWH, MGH High quality spirometric phenotyping Blood for DNA extraction and storage Children and adults High cost (>$1000/subject) Low intensity 6 months only 100 subjects recruited Doctors and patients need education

31 Partners Asthma DNA collection #2 Recruit Partners asthma cohort patients Leverage CRIMSON blood samples Leverage data mart for phenotype data Blood for DNA extraction and storage Children and adults cases and controls low cost (<$30/subject) High intensity 9 months >3000 subjects recruited

32 Figure 1 Data Flow for Asthma DBP ChanningRPDR ADMPN# Send to RPD converts ADMPN# to MRN sends to pathology Pathology (Crimson) MRN Crimson ID# ADMPN sends back to Channing with sample for DNA extraction Figure 1 Legend Deidentified data file analyzed by Channing subjects for DNA collection selected. File sent to RPDR converted back to MR# and sent to Crimson. Samples identified and given Crimson ID# ≡ ADMPN and sample Sent back to Channing.

33 Recruitment for DBP from Crimson at BWH: Asthma Cases by Utilization and Race

34 Recruitment for DBP from Crimson at BWH: Asthma Cases and Controls by Race

35 Summary of Samples to 04/07/08 59High Caucasian: 880Controls African American: 222Low African American: 1,341Controls Caucasian: 454Low Caucasian: 111High African American: Running total:

36 Lessons learned 1 Get what you ask for Regular meetings, regular meetings Negotiate your demands Tools are not enough Leverage your peers Recruiting patients is hard work IRB is hard work

37 Lessons learned 2 You can never have enough statistics or bioinformatics Genotyping and its technologies are secondary The RPDR data are dirty! Listen to Shawn Be flexible

38 Summary: Airways disease as a driver for i2b2 “Typical” complex disease challenge Big impact on health care system Potential for large clinical impact Core 1: Extracting phenotypes from free text; statistical models Core 2: Viewer for CRC Core 4: Data provisioning

39 Conclusions The stronger the existing program, the more successful the I2B2 collaboration Communication is key Fit the question to the data not the other way around Data access will be an issue for the future

40 Collaborators (and what they did) Scott, Zak, John, and Susanne: money, project management, IRB, and big picture Ross: Channing bioinformatics, file structures, geek to geek translation with the cores, beta testing, 850 collection, IRB, links to other genetic bioinformatics tools and projects Shawn and Vivian: asthma and control data mart Anne, LJ, James: nongenetic predictors in CAMP Marco and Blanca: nongenetic predictors in PAC Marco and Blanca: genetic predictors in CAMP Marco and Blanca: genetic predictors in PAC Lynn: Crimson

41 Acknowledgments: Ross LazarusSusanne Churchill Blanca E. HimesAnne Fuhlbrigge Marco F. RamoniLJ Wei Isaac KohaneJames Sigornivitch Shawn MurphyLynn Bry


Download ppt "Case Study for Clinical Relevancy: Asthma Scott T. Weiss, M.D., M.S. BRIGHAM AND WOMEN’S HOSPITAL HARVARD MEDICAL SCHOOL Professor of Medicine Harvard."

Similar presentations


Ads by Google