Machine Learning Risk Adjustment of the C-section Rate: Impact by Provider Cynthia J. Sims MD, Obstetrics, Gynecology & Reproductive Sciences, Magee Womens.

Slides:



Advertisements
Similar presentations
On the application of GP for software engineering predictive modeling: A systematic review Expert systems with Applications, Vol. 38 no. 9, 2011 Wasif.
Advertisements

On Comparing Classifiers: Pitfalls to Avoid and a Recommended Approach Author: Steven L. Salzberg Presented by: Zheng Liu.
When Using DOPPS Slides. DOPPS Slide Use Guidelines.
Forecasting Using the Simple Linear Regression Model and Correlation
Statistics.  Statistically significant– When the P-value falls below the alpha level, we say that the tests is “statistically significant” at the alpha.
Regression Analysis Once a linear relationship is defined, the independent variable can be used to forecast the dependent variable. Y ^ = bo + bX bo is.
Recurrent PID, Subsequent STI, and Reproductive Health Outcomes: Findings from the PID Evaluation and Clinical Health (PEACH) Study Maria Trent, MD, MPH.
Machine Learning Risk Adjustment of the C-section Rate: Impact by Provider Cynthia J. Sims MD, Obstetrics, Gynecology & Reproductive Sciences, Magee Womens.
Using Machine Learning to Model Standard Practice: Retrospective Analysis of Group C-Section Rate via Bagged Decision Trees Rich Caruana Cornell CS Stefan.
Chapter 3 Analysis of Variance
Evaluating Hypotheses
Clustered or Multilevel Data
Chapter 9 Audit Sampling: An Application to Substantive Tests of Account Balances McGraw-Hill/Irwin ©2008 The McGraw-Hill Companies, All Rights Reserved.
Special Topic: Missing Values. Missing Values Common in Real Data  Pneumonia: –6.3% of attribute values are missing –one attribute is missing in 61%
How Science Works Glossary AS Level. Accuracy An accurate measurement is one which is close to the true value.
© 2000 Prentice-Hall, Inc. Chap Forecasting Using the Simple Linear Regression Model and Correlation.
What is the Most Efficient Data Extraction Method for Quality Improvement and Research in Cardiology?: A Comparison of REMIND Artificial Intelligence Software.
1 BA 555 Practical Business Analysis Review of Statistics Confidence Interval Estimation Hypothesis Testing Linear Regression Analysis Introduction Case.
Spreadsheet Modeling & Decision Analysis A Practical Introduction to Management Science 5 th edition Cliff T. Ragsdale.
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 12: Multiple and Logistic Regression Marshall University.
1 What’s the Goal? Jeff Thompson, MD MPH Chief Medical Officer, Washington State Health Care Authority.
Hypothesis Testing in Linear Regression Analysis
Predictive Modeling Project Stephen P. D’Arcy Professor of Finance University of Illinois at Urbana-Champaign ORMIR Presentation October 26, 2005.
Multiple Choice Questions for discussion
LOT QUALITY ASSURANCE SAMPLING (LQAS). What is LQAS A sampling method that:  Is simple, in-expensive, and probabilistic.  Combines two standard statistical.
Lecture 14 Multiple Regression Model
Learning Objective Chapter 14 Correlation and Regression Analysis CHAPTER fourteen Correlation and Regression Analysis Copyright © 2000 by John Wiley &
Environmental Science Bellringers
Patient-centered, Purposeful Public Reporting David Share, MD, MPH vice president, Value Partnerships Blue Cross Blue Shield of Michigan
Linkage between SSCAS data and mortality data. Patients’ outcome Determined by: Prior health and personal characteristics Severity of illness Effectiveness.
OPIM 303-Lecture #8 Jose M. Cruz Assistant Professor.
EE325 Introductory Econometrics1 Welcome to EE325 Introductory Econometrics Introduction Why study Econometrics? What is Econometrics? Methodology of Econometrics.
Doing Data Science – Chapter 12: Epidemiology Vast amounts of individual patient medical data is available – Detailed – visits, prescriptions, outcomes,
What do we know about overall trends in patient safety in the USA? Patrick S. Romano, MD MPH Professor of Medicine and Pediatrics University of California,
Hospitalization Prediction From Health Care Claims Adithya Renduchintala, Benjamin Martin, & Lance Legel University of Colorado Boulder  Data Mining 
Multilevel Data in Outcomes Research Types of multilevel data common in outcomes research Random versus fixed effects Statistical Model Choices “Shrinkage.
June 9, 2008 Making Mortality Measurement More Meaningful Incorporating Advanced Directives and Palliative Care Designations Eugene A. Kroch, Ph.D. Mark.
Wang Y 1,2, Damaraju S 1,3,4, Cass CE 1,3,4, Murray D 3,4, Fallone G 3,4, Parliament M 3,4 and Greiner R 1,2 PolyomX Program 1, Department.
ALI R. RAHIMI, BOBBY WRIGHTS, MD, HOSSEIN AKHONDI, MD & CHRISTIAN M. RICHARD, MSC Clinical Correlation Between Effective Anticoagulants & Risk of Stroke:
Acknowledgements Contact Information Anthony Wong, MTech 1, Senthil K. Nachimuthu, MD 1, Peter J. Haug, MD 1,2 Patterns and Rules  Vital signs medoids.
CEN st Lecture CEN 4021 Software Engineering II Instructor: Masoud Sadjadi Monitoring (POMA)
Catherine Y. Spong, M.D. Eunice Kennedy Shriver National Institute of Child Health and Human Development March 7, 2013 Research Issues in the Assessment.
CORRELATION: Correlation analysis Correlation analysis is used to measure the strength of association (linear relationship) between two quantitative variables.
Evaluating Risk Adjustment Models Andy Bindman MD Department of Medicine, Epidemiology and Biostatistics.
Chapter Thirteen Copyright © 2006 John Wiley & Sons, Inc. Bivariate Correlation and Regression.
Introduction Hereditary predisposition (mutations in BRCA1 and BRCA2 genes) contribute to familial breast cancers. Eighty percent of the.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 10 Comparing Two Groups Section 10.1 Categorical Response: Comparing Two Proportions.
26134 Business Statistics Week 4 Tutorial Simple Linear Regression Key concepts in this tutorial are listed below 1. Detecting.
Formula for Linear Regression y = bx + a Y variable plotted on vertical axis. X variable plotted on horizontal axis. Slope or the change in y for every.
Physiological Data Analysis of Neuro-Critical Patients Using Markov Models By Shashwat Bhoop sb3758.
P7:Advanced Audit & Assurance (INT). 2 Section D: Audit of Historical Financial Information Designed to give you knowledge and application of: D1. i.
Chapter 4 Basic Estimation Techniques
Physician Performance Measures: Like It Or Not?
Chapter 9 Audit Sampling: An Application to Substantive Tests of Account Balances McGraw-Hill/Irwin ©2008 The McGraw-Hill Companies, All Rights Reserved.
Discussion/Presentation of Park and Basu: “Alternative Evaluation Metrics for Risk Adjustment Models” Stephen P. Ryan, Olin.
10.2 Regression If the value of the correlation coefficient is significant, the next step is to determine the equation of the regression line which is.
-A National Population-Base Study 313
Chapter 11 Simple Regression
PHQ2 Screening Negative PHQ2 Screening Positive
CHAPTER fourteen Correlation and Regression Analysis
Eliminating Reproductive Risk Factors and Reaping Female Education and Work Benefits: A Constructed Cohort Analysis of 50 Developing Countries Qingfeng.
Introductory Econometrics
SocioEconomic Position Contact:
Organizational culture in cardiovascular care in Chinese hospitals: a descriptive cross-sectional study Emily S. Yin, Nicholas S. Downing, Xi Li, Sara.
Predicting Pneumonia & MRSA in Hospital Patients
Component 2: The Culture of Health Care
Somi Jacob and Christian Bach
Research Techniques Made Simple: Interpreting Measures of Association in Clinical Research Michelle Roberts PhD,1,2 Sepideh Ashrafzadeh,1,2 Maryam Asgari.
Regression and Clinical prediction models
Presentation transcript:

Machine Learning Risk Adjustment of the C-section Rate: Impact by Provider Cynthia J. Sims MD, Obstetrics, Gynecology & Reproductive Sciences, Magee Womens Hospital, Pittsburgh, PA Rich Caruana, Peng Jia, Radu Stefan Niculescu, Matt Troup, Carnegie Mellon University, Pittsburgh, PA R. Bharat Rao, Data Mining Group, Siemens Corporate Research, Inc. Princeton, NJ 08540

Objective: We observed a significant variation in C-section rates for 17 physician groups, 13% to 23%. The objective of this study was to determine how much of the observed variation was due to differences in the patient sub-population and how much was due to differences inherent to the group practices. Method: We studied a population of 22,176 patients ( ) stratified by provider group. We trained a machine-learning decision-tree model on all 22,176 patients. The model had an accuracy of 90%, and an ROC area of Care was taken to prevent over-fitting. The decision-tree model was applied to the patients in each group to determine the aggregate risk for C-section for the sub-population predicted by average physician practice as represented by the 17 physician groups.

Results: 1. Little of the observed variation in C-section rate was attributable to variation in the patient sub-populations (the correlation between the observed C-section rates and the rates predicted by the machine learning model was only 0.21). 2. After adjusting for patient sub-population risk, we found that several groups had differences between actual and predicted rates that were highly significant. 3. Raw C-section rates are misleading. Some groups with a high rate had a high risk patient population that justified the high rate. Other groups with a high rate did not have high risk patient populations. Conclusions: There was significant variation in the C-section rate of the different sub-populations. (See table to right.) Only a fraction of the observed variation was explained by differences in predicted risk for C-section of the population. When determining which groups have high c- section rates, it is important to adjust for the relative risk of the different sub-populations. The raw, unadjusted cesarean section rate of different sub-populations can be misleading. We conclude that the substantial differences among the groups were not predicted by patient risk.

MACHINE LEARNING DECISION TREE MODEL TRAINED ON 22,176 CASES RESUBSTITUTION ROC AREA

Observed and Predicted C-Section Rates for 17 Physician Groups Sorted by Observed C-Section Rates. Physician Groups 7, 8, and 10 are particularly Interesting. Last Column is Estimated C-Section Rate that Would Result if the Physician Group Treated all 22,176 Patients.

G M A E K J H O D F Scatter Plot Comparing the Observed C-Section Rate in the 17 Physician Groups With the C-Section Rates Predicted for Those Groups by the Decision Tree

Hypothesis: The observed variation in C-section rates for physician groups is inherent to the group practice and not due to differences in the patient sub-population. The Population: 22,176 patients ( ). Stratified by provider groups. 17 provider groups. Conclusions: The substantial differences among groups were not predicted by patient risk. Significant variation in the C-section rate of the different provider group sub-populations.

Future Work: Evaluate methods for machine learning group comparison. Compare decision tree model with a Neural Network model. Best evidence that c-section rate can be lowered without adversely affecting the results comes from countries with lower c-section rates but comparable outcomes. We intend to apply the same techniques to a medical database of one of these countries.