Machine Learning Risk Adjustment of the C-section Rate: Impact by Provider Cynthia J. Sims MD, Obstetrics, Gynecology & Reproductive Sciences, Magee Womens.

Slides:



Advertisements
Similar presentations
On the application of GP for software engineering predictive modeling: A systematic review Expert systems with Applications, Vol. 38 no. 9, 2011 Wasif.
Advertisements

Significance Testing.  A statistical method that uses sample data to evaluate a hypothesis about a population  1. State a hypothesis  2. Use the hypothesis.
Forecasting Using the Simple Linear Regression Model and Correlation
Inference for Regression
Regression Analysis Once a linear relationship is defined, the independent variable can be used to forecast the dependent variable. Y ^ = bo + bX bo is.
© The McGraw-Hill Companies, Inc., 2000 CorrelationandRegression Further Mathematics - CORE.
1 Learning Entity Specific Models Stefan Niculescu Carnegie Mellon University November, 2003.
Using Machine Learning to Model Standard Practice: Retrospective Analysis of Group C-Section Rate via Bagged Decision Trees Rich Caruana Cornell CS Stefan.
Chapter 3 Analysis of Variance
Chapter 9 Audit Sampling: An Application to Substantive Tests of Account Balances McGraw-Hill/Irwin ©2008 The McGraw-Hill Companies, All Rights Reserved.
Chapter Topics Types of Regression Models
Use of REMIND Artificial Intelligence Software for Rapid Assessment of Adherence to Disease Specific Management Guidelines in Acute Coronary Syndromes.
Correlation A correlation exists between two variables when one of them is related to the other in some way. A scatterplot is a graph in which the paired.
Machine Learning Risk Adjustment of the C-section Rate: Impact by Provider Cynthia J. Sims MD, Obstetrics, Gynecology & Reproductive Sciences, Magee Womens.
Descriptive and Inferential Statistics.  You’re already familiar with statistics through radio, television, newspapers, and magazines:  Eating 10 g.
© 2000 Prentice-Hall, Inc. Chap Forecasting Using the Simple Linear Regression Model and Correlation.
What is the Most Efficient Data Extraction Method for Quality Improvement and Research in Cardiology?: A Comparison of REMIND Artificial Intelligence Software.
Spreadsheet Modeling & Decision Analysis A Practical Introduction to Management Science 5 th edition Cliff T. Ragsdale.
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 12: Multiple and Logistic Regression Marshall University.
Correlation and Linear Regression
Correlation Scatter Plots Correlation Coefficients Significance Test.
BASIC STATISTICS: AN OXYMORON? (With a little EPI thrown in…) URVASHI VAID MD, MS AUG 2012.
Hypothesis Testing in Linear Regression Analysis
LOT QUALITY ASSURANCE SAMPLING (LQAS). What is LQAS A sampling method that:  Is simple, in-expensive, and probabilistic.  Combines two standard statistical.
Lecture 14 Multiple Regression Model
Learning Objective Chapter 14 Correlation and Regression Analysis CHAPTER fourteen Correlation and Regression Analysis Copyright © 2000 by John Wiley &
Linkage between SSCAS data and mortality data. Patients’ outcome Determined by: Prior health and personal characteristics Severity of illness Effectiveness.
OPIM 303-Lecture #8 Jose M. Cruz Assistant Professor.
© The McGraw-Hill Companies, Inc., Chapter 11 Correlation and Regression.
Doing Data Science – Chapter 12: Epidemiology Vast amounts of individual patient medical data is available – Detailed – visits, prescriptions, outcomes,
Multilevel Data in Outcomes Research Types of multilevel data common in outcomes research Random versus fixed effects Statistical Model Choices “Shrinkage.
June 9, 2008 Making Mortality Measurement More Meaningful Incorporating Advanced Directives and Palliative Care Designations Eugene A. Kroch, Ph.D. Mark.
1 Chapter 12 Simple Linear Regression. 2 Chapter Outline  Simple Linear Regression Model  Least Squares Method  Coefficient of Determination  Model.
METHODS Setting Wichita State University Physician Assistant Program Study population WSU PA graduating class of 2003 and 2004 (n=84) Study design Retrospective.
This material is approved for public release. Distribution is limited by the Software Engineering Institute to attendees. Sponsored by the U.S. Department.
CORRELATION: Correlation analysis Correlation analysis is used to measure the strength of association (linear relationship) between two quantitative variables.
© Copyright McGraw-Hill Correlation and Regression CHAPTER 10.
Lecture 9-1 Analysis of Variance
Evaluating Risk Adjustment Models Andy Bindman MD Department of Medicine, Epidemiology and Biostatistics.
Introduction Hereditary predisposition (mutations in BRCA1 and BRCA2 genes) contribute to familial breast cancers. Eighty percent of the.
Chapter 5: Credibility. Introduction Performance on the training set is not a good indicator of performance on an independent set. We need to predict.
Formula for Linear Regression y = bx + a Y variable plotted on vertical axis. X variable plotted on horizontal axis. Slope or the change in y for every.
© The McGraw-Hill Companies, Inc., Chapter 10 Correlation and Regression.
Aron, Aron, & Coups, Statistics for the Behavioral and Social Sciences: A Brief Course (3e), © 2005 Prentice Hall Chapter 10 Introduction to the Analysis.
Fundamentals of Data Analysis Lecture 4 Testing of statistical hypotheses pt.1.
Physiological Data Analysis of Neuro-Critical Patients Using Markov Models By Shashwat Bhoop sb3758.
Chapter 11 Linear Regression and Correlation. Explanatory and Response Variables are Numeric Relationship between the mean of the response variable and.
Statistics for Managers Using Microsoft Excel 3rd Edition
10.2 Regression If the value of the correlation coefficient is significant, the next step is to determine the equation of the regression line which is.
8th grade science Scientific Method.
Statistics for the Social Sciences
-A National Population-Base Study 313
CHAPTER fourteen Correlation and Regression Analysis
Correlation and Regression
Everyone thinks they know this stuff
Chapter 11 Analysis of Variance
Modeling Medical Records of Diabetes using Markov Decision Processes
Saturday, August 06, 2016 Farrokh Alemi, PhD.
Correlation and Regression
Do Now: Answer the following in your Science Notebook using complete sentences.
Linear Regression and Correlation
Somi Jacob and Christian Bach
Linear Regression and Correlation
Chapter 10 Introduction to the Analysis of Variance
CS639: Data Management for Data Science
Chapter 15 Analysis of Variance
Mark L. Homer, PhD, MMSc, Nathan P. Palmer, PhD, Kathe P
Presentation transcript:

Machine Learning Risk Adjustment of the C-section Rate: Impact by Provider Cynthia J. Sims MD, Obstetrics, Gynecology & Reproductive Sciences, Magee Womens Hospital, Pittsburgh, PA 15213 Rich Caruana, Peng Jia, Radu Stefan Niculescu, Matt Troup, Carnegie Mellon University, Pittsburgh, PA 15213 R. Bharat Rao, Data Mining Group, Siemens Corporate Research, Inc. Princeton, NJ 08540 ABSTRACT RESULTS Objective: We observed a significant variation in C-section rates for 17 physician groups, 13% to 23%. The objective of this study was to determine how much of the observed variation was due to differences in the patient sub-population and how much was due to differences inherent to the group practices. Method: We studied a population of 22,176 patients (1995-1997) stratified by provider group. We trained a machine-learning decision-tree model on all 22,176 patients. The model had an accuracy of 90%, and an ROC area of 0.92. Care was taken to prevent over-fitting. The decision-tree model was applied to the patients in each group to determine the aggregate risk for C-section for the sub-population predicted by average physician practice as represented by the 17 physician groups. Results: 1) Little of the observed variation in C-section rate was attributable to variation in the patient sub-populations (the correlation between the observed C-section rates and the rates predicted by the machine learning model was only 0.21). 2) After adjusting for patient sub-population risk, we found that several groups had differences between actual and predicted rates that were highly significant. 3) Raw C-section rates are misleading. Some groups with a high rate had a high risk patient population that justified the high rate. Other groups with a high rate did not have high risk patient populations. Conclusions: There was significant variation in the C-section rate of the different sub-populations. (See table to right.) Only a fraction of the observed variation was explained by differences in predicted risk for C-section of the population. When determining which groups have high c-section rates, it is important to adjust for the relative risk of the different sub-populations. The raw, unadjusted c-section rate of different sub-populations can be misleading. We conclude that the substantial differences among the groups was not predicted by patient risk. O G M A E K D J F H Observed and Predicted C-Section Rates for 17 Physician Groups Sorted by Observed C-Section Rates. Physician Groups 7, 8, and 10 are particularly Interesting. Last Column is Estimated C-Section Rate that Would Result if the Physician Group Treated all 22,156 Patients. Scatter Plot Comparing the Observed C-Section Rate in the 17 Physician Groups With the C-Section Rates Predicted for Those Groups by the Decision Tree Hypothesis: The observed variation in C-section rates for physician groups is inherent to the group practice and not due to differences in the patient sub-population. The Population: 22,176 patients (1995-1997). Stratified by provider groups. 17 provider groups. Conclusions: Significant variation in the C-section rate of the different provider- group sub-populations. The substantial differences among groups was not predicted by patient risk. Evaluate methods for machine learning group comparison. MACHINE LEARNING DECISION TREE MODEL TRAINED ON 22,176 CASES RESUBSTITUTION ROC AREA