Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 The Receiver Operating Characteristic (ROC) Curve EPP 245 Statistical Analysis of Laboratory Data.

Similar presentations


Presentation on theme: "1 The Receiver Operating Characteristic (ROC) Curve EPP 245 Statistical Analysis of Laboratory Data."— Presentation transcript:

1 1 The Receiver Operating Characteristic (ROC) Curve EPP 245 Statistical Analysis of Laboratory Data

2 November 30, 2006EPP 245 Statistical Analysis of Laboratory Data 2 Binary Classification Suppose we have two groups for which each case is a member of one or the other, and that we know the correct classification (“truth”). Suppose we have a prediction method that produces a single numerical value, and that small values of that number suggest membership in group 1 and large values suggest membership in group 2

3 November 30, 2006EPP 245 Statistical Analysis of Laboratory Data 3 If we pick a cutpoint t, we can assign any case with a predicted value ≤ t to group 1 and the others to group 2. For that value of t, we can compute the number correctly assigned to group 2 and the number incorrectly assigned to group 2 (true positives and false positives). For t small enough, all will be assigned to group 2 and for t large enough all will be assigned to group 1. The ROC curve is a plot of true positives vs. false positives

4 November 30, 2006EPP 245 Statistical Analysis of Laboratory Data 4 Juul's IGF data Description: The 'juul' data frame has 1339 rows and 6 columns. It contains a reference sample of the distribution of insulin-like growth factor (IGF-I), one observation per subject in various ages with the bulk of the data collected in connection with school physical examinations. Variables: age a numeric vector (years). menarche a numeric vector. Has menarche occurred (code 1: no, 2: yes)? sex a numeric vector (1: boy, 2: girl). igf1 a numeric vector. Insulin-like growth factor ($mu$g/l). tanner a numeric vector. Codes 1-5: Stages of puberty a.m. Tanner. testvol a numeric vector. Testicular volume (ml).

5 November 30, 2006EPP 245 Statistical Analysis of Laboratory Data 5 Predicting Menarche Subset Juul data to only females between 8 and 20 years old Predict menarch from age as a quantitative variable and Tanner score as a qualitative variable using dummy variables Menarch re-coded to be 0/1

6 November 30, 2006EPP 245 Statistical Analysis of Laboratory Data 6. logistic men1 age tan2 tan3 tan4 tan5 Logistic regression Number of obs = 519 LR chi2(5) = 568.74 Prob > chi2 = 0.0000 Log likelihood = -75.327218 Pseudo R2 = 0.7906 ------------------------------------------------------------------------------ men1 | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- age | 3.944062.7162327 7.56 0.000 2.762915 5.630151 tan2 |.0444044.0486937 -2.84 0.005.0051761.3809341 tan3 |.1369598.095596 -2.85 0.004.0348712.5379227 tan4 |.6969611.3898228 -0.65 0.519.2328715 2.085935 tan5 | 9.169558 7.638664 2.66 0.008 1.791671 46.9287 ------------------------------------------------------------------------------. predict pmen (option p assumed; Pr(men1)). predict pmen1, xb

7 November 30, 2006EPP 245 Statistical Analysis of Laboratory Data 7. histogram pmen. graph export pmenhist.wmf. histogram pmen if men1==0, title("Pre-Menarch"). graph export pmenhist0.wmf. histogram pmen if men1==1, title("Post-Menarch"). graph export pmenhist1.wmf. histogram pmen1. graph export pmen1hist.wmf. hist pmen1 if men1==0, title("Pre-Menarche"). graph export pmen1hist0.wmf. hist pmen1 if men1==1, title("Post-Menarche"). graph export pmen1hist1.wmf. lroc Logistic model for men1 number of observations = 519 area under ROC curve = 0.9867. graph export pmenroc.wmf

8 November 30, 2006EPP 245 Statistical Analysis of Laboratory Data 8

9 November 30, 2006EPP 245 Statistical Analysis of Laboratory Data 9

10 November 30, 2006EPP 245 Statistical Analysis of Laboratory Data 10

11 November 30, 2006EPP 245 Statistical Analysis of Laboratory Data 11

12 November 30, 2006EPP 245 Statistical Analysis of Laboratory Data 12

13 November 30, 2006EPP 245 Statistical Analysis of Laboratory Data 13

14 November 30, 2006EPP 245 Statistical Analysis of Laboratory Data 14


Download ppt "1 The Receiver Operating Characteristic (ROC) Curve EPP 245 Statistical Analysis of Laboratory Data."

Similar presentations


Ads by Google