A SAS Macro to Calculate the C-statistic Bill O’Brien BCBSMA SAS Users Group March 10, 2015.

Slides:



Advertisements
Similar presentations
Continued Psy 524 Ainsworth
Advertisements

Significance Testing.  A statistical method that uses sample data to evaluate a hypothesis about a population  1. State a hypothesis  2. Use the hypothesis.
Imbalanced data David Kauchak CS 451 – Fall 2013.
Curva ROC figuras esquemáticas Curva ROC figuras esquemáticas Prof. Ivan Balducci FOSJC / Unesp.
Understanding & Managing Risk
Overview of Logistics Regression and its SAS implementation
STAT 135 LAB 14 TA: Dongmei Li. Hypothesis Testing Are the results of experimental data due to just random chance? Significance tests try to discover.
ROC Statistics for the Lazy Machine Learner in All of Us Bradley Malin Lecture for COS Lab School of Computer Science Carnegie Mellon University 9/22/2005.
Assignmnet: Simple Random Sampling With Replacement Some Solutions.
Data mining and statistical learning, lecture 5 Outline  Summary of regressions on correlated inputs  Ridge regression  PCR (principal components regression)
Midterm Review Goodness of Fit and Predictive Accuracy
Sampling and Randomness
CHAPTER 8 Testing VaR Results to Ensure Proper Risk Measurement.
Assignmnet: Simple Random Sampling With Replacement Some Solutions.
Mental Health Study Example Alachua County, Florida Purpose: Relate mental impairment to two explanatory variables, the severity of life and socioeconomic.
PROC SQL – Select Codes To Master For Power Programming Codes and Examples from SAS.com Nethra Sambamoorthi, PhD Northwestern University Master of Science.
Decision Tree Models in Data Mining
Section 1.2 Continued Discrimination in the Workplace: Inference through Simulation: Discussion.
X 11 X 12 X 13 X 21 X 22 X 23 X 31 X 32 X 33. Research Question Are nursing homes dangerous for seniors? Does admittance to a nursing home increase risk.
1 Evaluating Model Performance Lantz Ch 10 Wk 5, Part 2 Right – Graphing is often used to evaluate results from different variations of an algorithm. Depending.
1 Power and Sample Size in Testing One Mean. 2 Type I & Type II Error Type I Error: reject the null hypothesis when it is true. The probability of a Type.
3.1 & 3.2: Fundamentals of Probability Objective: To understand and apply the basic probability rules and theorems CHS Statistics.
SAS SQL Part 2 Alan Elliott. Dealing with Missing Values Title "Dealing with Missing Values in SQL"; PROC SQL; select INC_KEY,GENDER, RACE, INJTYPE, case.
Copyright © 2003, SAS Institute Inc. All rights reserved. Cost-Sensitive Classifier Selection Ross Bettinger Analytical Consultant SAS Services.
3-2 Random Variables In an experiment, a measurement is usually denoted by a variable such as X. In a random experiment, a variable whose measured.
Data Mining: Classification & Predication Hosam Al-Samarraie, PhD. Centre for Instructional Technology & Multimedia Universiti Sains Malaysia.
Kevin Kennedy, MS Saint Luke’s Hospital, Kansas City, MO
4.3 Binomial Distributions. Red Tiles and Green Tiles in a Row You have 4 red tiles and 3 green tiles. You need to select 4 tiles. Repeated use of a tiles.
Using Weighted Data Donald Miller Population Research Institute 812 Oswald Tower, December 2008.
MEASURES OF TEST ACCURACY AND ASSOCIATIONS DR ODIFE, U.B SR, EDM DIVISION.
Jennifer Lewis Priestley Presentation of “Assessment of Evaluation Methods for Prediction and Classification of Consumer Risk in the Credit Industry” co-authored.
Chi-Square Analysis Test of Homogeneity. Sometimes we compare samples of different populations for certain characteristics. This data is often presented.
1 Risk Assessment Tests Marina Kondratovich, Ph.D. OIVD/CDRH/FDA March 9, 2011 Molecular and Clinical Genetics Panel for Direct-to-Consumer (DTC) Genetic.
Copyright © 2006, SAS Institute Inc. All rights reserved. A Sampler of What's New in Base SAS 9.2
Evaluating Results of Learning Blaž Zupan
Can we distinguish wet years from dry years? Simon Mason Seasonal Forecasting Using the Climate Predictability Tool Bangkok, Thailand,
Limited Dependent Variables Ciaran S. Phibbs. Limited Dependent Variables 0-1, small number of options, small counts, etc. 0-1, small number of options,
Journal: 1)Suppose you guessed on a multiple choice question (4 answers). What was the chance that you marked the correct answer? Explain. 2)What is the.
What is randomization and how does it solve the causality problem? 2.3.
Evaluating Risk Adjustment Models Andy Bindman MD Department of Medicine, Epidemiology and Biostatistics.
5: Preferred Upload Formats Getting Excel files to Metafor for correlations, standardized mean differences, and binary studies Meta-analysis in R with.
The ‘SKIP’ Macro Plagiarized from a paper by Paul Grant Private Healthcare Systems, Inc. given at SUGI 23, Nashville, TN, 1998.
Statistics for Engineer. Statistics  Deals with  Collection  Presentation  Analysis and use of data to make decision  Solve problems and design.
Reliability performance on language tests is also affected by factors other than communicative language ability. (1) test method facets They are systematic.
Heart Disease Example Male residents age Two models examined A) independence 1)logit(╥) = α B) linear logit 1)logit(╥) = α + βx¡
Logistic Regression Saed Sayad 1www.ismartsoft.com.
1 SMU EMIS 7364 NTU TO-570-N Control Charts Basic Concepts and Mathematical Basis Updated: 3/2/04 Statistical Quality Control Dr. Jerrell T. Stracener,
Quality Control  Statistical Process Control (SPC)
Please hand in homework on Law of Large Numbers Dan Gilbert “Stumbling on Happiness”
Chapter 5 – Evaluating Predictive Performance Data Mining for Business Analytics Shmueli, Patel & Bruce.
SHRUG, F EB 2013: N ETWORKING EXERCISE Many Ways to Solve a SAS Problem.
SAS ® 101 Based on Learning SAS by Example: A Programmer’s Guide Chapter 26 By Tasha Chapman, Oregon Health Authority.
Working Efficiently with Large SAS® Datasets Vishal Jain Senior Programmer.
Methods of Assigning Probabilities l Classical Probability; l Empirical Probability; and l Subjective Probability l P (A) = N(A) / N l P (X) = ƒ (X) /
Biostatistics Class 2 Probability 2/1/2000.
Bootstrap and Model Validation
Latent Class Analysis Computing examples
Chapter 3 Probability.
Statistical Process Control
Figure Legend: From: Fixations on low-resolution images
Using Simulation to Estimate Probabilities
Evaluating Results of Learning
Advanced Analytics Using Enterprise Miner
Simultaneous Inferences and Other Regression Topics
Market Research (Sampling)
Never Cut and Paste Again
Trigger %macro check_trigger_run;
Can we distinguish wet years from dry years?
Data tmp; do i=1 to 10; output; end; run; proc print data=tmp;
Using Two-Way Frequency Tables (4.2.2)
Presentation transcript:

A SAS Macro to Calculate the C-statistic Bill O’Brien BCBSMA SAS Users Group March 10, 2015

C-statistic measures discrimination A key component in the assessment of risk algorithm performance is its ability to distinguish subjects who will develop an event (“cases”) from those who will not (“controls”). This concept, known as discrimination, has been well studied and quantified for binary outcomes using measures such as the estimated area under the Receiver Operating Characteristics (ROC) curve (AUC), which is also referred to as a “C-statistic” (Uno 2011)

data Admissions; length patientID $ 3 predicted 8 actual 3; do k=1 to 100; patientID=put(k,z3.); actual=(rand("uniform") gt.8); predicted=rand("normal",0.2,0.08)+(actual*0.07); output; end; run; Sample Dataset patientIDpredictedactual % % % % %0 PriorPost

Discrimination Slope proc tabulate data=admissions; class actual; var predicted; tables predicted*(n*f=2.0 mean*f=percent8.1), actual; run; actual 01 predictedN8317 Mean20.7%27.0%

Algorithm 1.Cartesian join all events to all non-events 2.Assign row a value of 1 if Pr(event) > Pr(non-event); 0 if not; 0.5 for tie 3.Take the average

Calculating the c-statistic %macro cstatistic(dsn,predvar,event); proc sql; select (sum(cVal))/count format=8.2 as c into: cStat from (select case when t1.LL>t2.LL then 1 when t1.LL=t2.LL then 0.5 else 0 end as cVal,count(*) as count from (select &predvar as LL from &dsn where &event=1) as t1, (select &predvar as LL from &dsn where &event=0) as t2 ); quit; %mend cstatistic; %cstatistic(admissions,predicted,actual); Cartesian join: All events to all non-events +1 if the model assigned higher p to event vs. non- event % of time model discriminated correctly

Result c 0.72 There is a 0.72 probability of the model assigning a higher predicted probability to a randomly selected event case, compared with a randomly selected non-event case = no better than chance 0.60 = poor 0.70 = reasonable 0.80 = strong (Hosner & Lemeshow 2000)

Further Reading