Selecting the Right Predictors

Slides:



Advertisements
Similar presentations
Nursing Diagnosis: Definition
Advertisements

Andrea M. Landis, PhD, RN UW LEAH
Allison Dunning, M.S. Research Biostatistician
Benchmarking Clinicians Farrokh Alemi, Ph.D.. Why should it be done? Hiring, promotion, and management decisions Help clinicians improve.
Modeling Uncertainty Farrokh Alemi, Ph.D. Saturday, February 21, 2004.
Diagnostic Method Diagnosis Diagnosis means `through knowledge` and entails acquisition of data about the patient and their complaints using the senses:
1. Review- What is Science Explain- What kinds of understandings does science contribute about the natural world Form an Opinion- Do you think that scientists.
Reading the Dental Literature
Predicting Readmissions (and other outcomes) Doesn’t Take a PhD John Showalter, MD MSIS Chief Health Information Officer University of Mississippi Medical.
Chapter 51 Experiments, Good and Bad. Chapter 52 Experimentation u An experiment is the process of subjecting experimental units to treatments and observing.
EVIDENCE BASED MEDICINE
© 2014 wheresjenny.com Diagnosing a disease. © 2014 wheresjenny.com Diagnosing a disease The soundness of body and mind is called Health The soundness.
DOCUMENTATION GUIDELINES FOR E/M SERVICES
A Summary Of Key Findings From A National Survey Of Voters. #07160.
Multiple Choice Questions for discussion
Association between Systolic Blood Pressure and Congestive Heart Failure in Hypertensive Patients Mrs. Sutheera Intajarurnsan Doctor of Public Health Student.
TRANSLATING VISITS INTO PATIENTS USING AMBULATORY VISIT DATA (Hypertensive patient case study) by Esther Hing, M.P.H. and Julia Holmes, Ph.D U.S. DEPARTMENT.
Dr Laura Bonnett Department of Biostatistics. UNDERSTANDING SURVIVAL ANALYSIS.
EVAL 6970: Cost Analysis for Evaluation Dr. Chris L. S. Coryn Nick Saxton Fall 2014.
Epidemiology The Basics Only… Adapted with permission from a class presentation developed by Dr. Charles Lynch – University of Iowa, Iowa City.
Introduction to Control Charts: XmR Chart
ECON ECON Health Economic Policy Lab Kem P. Krueger, Pharm.D., Ph.D. Anne Alexander, M.S., Ph.D. University of Wyoming.
Implementing the DxCG Likelihood of Hospitalization Model in Kaiser Permanente Leslee J Budge, MBA
Improving Hypertension Quality Measurement Using Electronic Health Records S Persell, AN Kho, JA Thompson, DW Baker Feinberg School of Medicine Northwestern.
Risk Assessment Farrokh Alemi, Ph.D.. Session Objectives 1.Discuss the role of risk assessment in the TQM process. 2.Describe the five severity indices.
Office of Statewide Health Planning and Development Day for Night: Hospital Admissions for Day Surgery Patients in California, 2005 Mary Tran, PhD, MPH.
Association between Systolic Blood Pressure and Congestive Heart Failure Complication among Hypertensive and Diabetic Hypertensive Patients Mrs. Sutheera.
Acknowledgements Contact Information Anthony Wong, MTech 1, Senthil K. Nachimuthu, MD 1, Peter J. Haug, MD 1,2 Patterns and Rules  Vital signs medoids.
SW 983 Missing Data Treatment Most of the slides presented here are from the Modern Missing Data Methods, 2011, 5 day course presented by the KUCRMDA,
1 DECISION MAKING Suppose your patient (from the Brazilian rainforest) has tested positive for a rare but serious disease. Treatment exists but is risky.
Association between Systolic Blood Pressure and Congestive Heart Failure in Hypertensive Patients Mrs. Sutheera Intajarurnsan Doctor of Public Health Student.
Postgraduate books recommended by Degree Management and Postgraduate Education Bureau, Ministry of Education Medical Statistics (the 2nd edition) 孙振球 主.
Arriving at a Medical Diagnosis Dr. Gary Mumaugh.
EPI 5344: Survival Analysis in Epidemiology Week 6 Dr. N. Birkett, School of Epidemiology, Public Health & Preventive Medicine, University of Ottawa 03/2016.
© 2010 Jones and Bartlett Publishers, LLC. Chapter 12 Clinical Epidemiology.
Advance Care Planning in dementia Dr Karen Harrison Dening Head of Research & Evaluation Dementia UK GSF 2016.
Instructional Objectives:
Group 7 Hospital Readmission Predictive Analytics
Analyze ICD-10 Diagnosis Codes with Stata
Unit 4: Monitoring Data Quality For HIV Case Surveillance Systems
CASE-CONTROL STUDIES Ass.Prof. Dr Faris Al-Lami MB,ChB MSc PhD FFPH
Least Squares Regression
Analysis of Covariance (ANCOVA)
Things To Consider While Choosing A Medical Diagnostics Lab.
Dead Man Visiting Farrokh Alemi, PhD Narrated by …
Fenglong Ma1, Jing Gao1, Qiuling Suo1
Jeffrey E. Korte, PhD BMTRY 747: Foundations of Epidemiology II
Log Linear Modeling of Independence
Dr. Muhammad Ajmal Zahid Chairman, Department of Psychiatry,
MEDICAL CERTIFICATION OF Cause of death THE ROLE OF THE REVIEW COMMITTEE Samoa 2017.
SQL for Predicting from Likelihood Ratios
Causal Networks Farrokh Alemi, PhD.
Entity Relationship Diagrams
SQL for Calculating Likelihood Ratios
SQL for Cleaning Data Farrokh Alemi, Ph.D.
Rank Order Function Farrokh Alemi, Ph.D.
Propagation Algorithm in Bayesian Networks
Constructing a Multi-Morbidity Index from Simulated Data
Wednesday, September 21, 2016 Farrokh Alemi, PhD.
Indexing & Computational Efficiency
Benchmarking Clinicians using Data Balancing
Managing Medical Records Lesson 1:
Regression Assumptions
Improving Overlap Farrokh Alemi, Ph.D.
Benchmarking Clinicians using Data Balancing
Cancer is not a risk factor for bullous pemphigoid
Regression Assumptions
Stratified Covariate Balancing Using R
Lesson Overview 1.1 What Is Science?.
Effect Modifiers.
Presentation transcript:

Selecting the Right Predictors Organized by Farrokh Alemi, Ph.D. Narrated by Yara Alemi In predicting an outcome from data in electronic health records, decisions have to be made about which set of predictors should be included in the model. This section helps you think through the selection of predictors. This brief presentation was organized by Dr. Alemi.

Types of Predictors Cross Join If you have the right variables you can predict anything. In an electronic health record, we have hundreds of thousands of variables, so some of them are the right variables. In this section, we describe how to select among them.

Cross Join There are many different types of predictors in electronic health records. One can use diagnoses, treatment, or medications to predict the outcomes for a patient. One could use patient characteristics like their address or gender to improve the predictions. Vital signs can be used. There are hundreds of thousands of potential predictors in an electronic health record.

Cross Join In the past we have primarily relied on use of diagnoses, or in other words patients’ history of illness, to predict various outcomes.

Diagnoses Dx Cross Join In predicting mortality of patients within 6 months, medical history , i.e. the patients diagnostic codes, was more predictive than laboratory values or physiological markers

Diagnoses Dx More Predictive than Heart Ejection Fraction Cross Join For example, in predicting 6-month mortality from heart failure, patients’ diagnoses was more predictive that heart ejection fraction.

Diagnoses Dx More Predictive than Chronological Age Cross Join Diagnoses are also more predictive than chronological age in predicting mortality within 6-months. It does not matter how old you are but what illnesses you have.

Diagnoses Dx More Predictive than Laboratory Values Cross Join Diagnoses are also more predictive than laboratory values. Many lab values can be easily controlled through medications. A hypertensive patient may show as normal if he has controlled his medications.

Comprehensive Selective Versus Cross Join There are thousands of diagnoses. A careful choice needs to be made. Historically, scientists have relied on clinicians to select a set of variables known to affect the outcome of interest. The approach that we prefer is to use all diagnoses. This will lead to the difficult situation where thousands of predictors are in the model but it has the advantage that no relevant piece of information is missing.

Including All Is More Accurate Cross Join Including All Is More Accurate Selective Versus Comprehensive In predicting mortality, models that rely on all diagnoses, without grouping them into categories, have proven to be more accurate than models that are selective in their approach or that break diagnoses into homogenous categories.

Rare Predictors Cross Join In statistical modeling, for example in regression equations, a common practice is to discard predictors that rarely occur. The logic is that these rare predictors occur too infrequently to make a difference for an average patient. In electronic health records, this is not advised..

In electronic health records, we have thousands of rare predictors In electronic health records, we have thousands of rare predictors. Ignoring one has a negligible effect but ignoring thousands of rare predictors will have a large impact on accuracy of predictions for the average patient. Furthermore, ignoring these predictors will reduce accuracy in subset of patients who experience these rare diseases. Therefore we do not recommend that rare predictors should be excluded from the models. This yields a statistical model with thousands of variables, most of which occur in rare situations. The model will be accurate but difficult to manage as there are so many variables. If the choice is between accuracy and ease, we would rather take the accurate route.

Obvious Predictors Cross Join A related issue is whether we should keep obvious predictors, things in which the prediction task is trivial. For example, predicting from coma that the patient will die. For another example, a patient with diabetic neuropathy is clearly diabetic. No need to predict if the patient has undiagnosed diabetes or will have diabetes in the future; clearly, he is diabetic, after all the word diabetic is in the name of the disease the patient is reporting to have had.

Cross Join Obvious predictors should be kept in the model for two reasons: (1) errors in these cases will lead to clinicians ridiculing the model and abandoning its use. It is important not to miss the prediction in obvious cases, by deleting these clues you make it harder for the computer to remain accurate in obvious situations.

Cross Join Second, in electronic health record crucial information may be missing and obvious predictors can adjust for situations where the information is missing. In our example, it may be that a patient is hospitalized with diabetic neuropathy but for this patient no diabetes was recorded. Diabetes is usually observed in an outpatient setting. It is possible that the doctor who sees this patient does not use the same electronic health record as the hospitals record. As a consequence, this piece of information is not available in our records. Keeping obvious predictors helps the system address missing information.

After the Fact Tautology Cross Join After the Fact Tautology Statisticians are concerned with use of a variable that occurs after an outcome to predict the outcome. On the surface, such predictions look tautological.

For example, if we want to predict whether a patient will develop diabetes, then all complications of diabetes or consequences of diabetes are tautological predictors. They should not be part of the analysis.

Consequences & Complications of Diabetes Undiagnosed Diabetes Consequences & Complications of Diabetes Prediction In contrast, if we are trying to detect whether a patient has already developed an illness. In these situations, we detect diabetes by its consequences or even complications. For example, undiagnosed diabetes, or diabetes not previously reported in electronic health record, can be detected by seeing if the patient has complications of diabetes such as renal illness.

Remove Later Predictors Detection Backward Look Keep Later Predictors Prediction Forward Look Remove Later Predictors Detection and prediction utilize different set of predictors. In predictive models, we are looking forward to establish risk of future events. In these models, only predictors that occur before the outcome can be used. Predictors that occur after the outcome should be removed. In detection, we are looking backwards to see if a diagnosis was missed, in these models diagnoses before and after the outcome of interest can be used.

Association or Causal Cross Join When evaluating predictive models, the practice is to divide the data into two sets: training and validation. The parameters of the predictive model are estimated in the training data set but the model is tested in the validation set. In the training set all diagnoses are included as predictors of the outcome. This means that diagnoses that occur after the outcome or before the outcome are included in estimating the association between the predictor and the outcome.

Avoid Time Travel Cross Join In the testing or validation situation, we no longer have the luxury of including variables that occur after the outcome is known. Here we want to rely only on predictors that occur prior to the outcome. Therefore, it is important to exclude any diagnosis that occurs after the outcome. This information is available in the electronic health record but not in real life. In real life, we are making a prediction about the likelihood of the outcome before the outcome has occurred. Therefore, we do not have access to any diagnosis or other information that occur after the outcome.

Avoid Diagnoses on Causal Path Cross Join Avoid Diagnoses on Causal Path Sometimes the available data are reasonable but should be ignored in the context of the analysis planned. Even though the data are correct, nothing is wrong about them, nevertheless they should be ignored.

Cross Join If we are studying the impact of treatment on survival, one must drop complications of treatment from multivariate analysis. Including these variables will distort the estimated impact of treatment on survival. In electronic health records, complications are diagnoses that occur after treatment. Same diagnosis before treatment is considered medical history, or at time of treatment is considered comorbidity, but after treatment it is considered a complication. The statistical advice requires us to drop some of the diagnoses and retain others. Here we see an example, where because the patient had an infection and was overweight, a large dose of antibiotic was given, which distorted the microbes in the patient’s gut, and the patient developed diabetes. If we keep diabetes in our multivariate model, then the effect of antibiotic on survival will be distorted. In these situations, we want to keep comorbidities, i.e. over weight and infection, but not the treatment complication.  

Diagnoses and medical history, with some exceptions, are some of the best predictors to include in the analysis Diagnoses and medical history, with some exceptions, are some of the best predictors to include in the analysis