1 היחידה לייעוץ סטטיסטי אוניברסיטת חיפה פרופ’ בנימין רייזר פרופ’ דוד פרג’י גב’ אפרת ישכיל
2 היחידה לייעוץ סטטיסטי אוניברסיטת חיפה Usual Regression Model y i = 0 + 1 x i + i,i=1,…,n i ~ N(0 2 independent) The model can be extended to many x’s: y i = 0 + 1 x 1i + 2 x 2i + … + p x pi + i Some of the x’s may be categorical (defined by dummies).
3 היחידה לייעוץ סטטיסטי אוניברסיטת חיפה Many times the y’s are binary, for example: 1) yes/no 2) alive/dead 3) success/failure
4 היחידה לייעוץ סטטיסטי אוניברסיטת חיפה E(Y i )= p i p i is a function of the x’s, approaching 1 from below and 0 from above:
5 היחידה לייעוץ סטטיסטי אוניברסיטת חיפה There are other possible functions with this form (such as probit) - which are not discussed here.
6 היחידה לייעוץ סטטיסטי אוניברסיטת חיפה Passengers on the Titanic Description The data give the survival status of 679 passengers on the Titanic, together with their names, age, sex and passenger class. Variable Description Name: Recorded name of passenger Passenger class: 1st, 2nd or 3rd Age: Age in years Gender: 0 = male, 1 = female Survived: 1 = Yes, 0 = No
7 היחידה לייעוץ סטטיסטי אוניברסיטת חיפה One binary explanatory variable: An example: GENDER Effect on the Survival of Passengers on the Titanic GENDER is define by x:x=1 for female, and x=0 for male Survival is defined by: Yes=survived and No=didn't survive
8 היחידה לייעוץ סטטיסטי אוניברסיטת חיפה Odds OR=1 indicates no effect of x on y.
9 היחידה לייעוץ סטטיסטי אוניברסיטת חיפה Odds and the Logistic Model 1 =log(OR) measures the effect of x. 1 =0 (or equivalently OR=1) implies no effect of x on y. log(Odds Ratio)
10 היחידה לייעוץ סטטיסטי אוניברסיטת חיפה Method of Estimation: Maximum Likelihood Motivation for MLE (Maximum Likelihood Estimator): Value of the parameters which maximizes the probability of observing the data we in fact observed. For individual i we have a Bernoulli distribution: p i are functions of x.
11 היחידה לייעוץ סטטיסטי אוניברסיטת חיפה Numerical optimization: gives estimates and estimates of their variances and covariances. x’s can be continuous o r categorical.
12 היחידה לייעוץ סטטיסטי אוניברסיטת חיפה Logistic Regression using SAS Software The LOGISTIC Procedure: Response Variable: SURVIVED Response Levels: 2 Number of Observations: 679 Link Function: Logit Response Profile Ordered Value SURVIVED Count
13 היחידה לייעוץ סטטיסטי אוניברסיטת חיפה Logistic Regression Model: Survived = GENDER Model Fitting Information and Testing Global Null Hypothesis BETA=0 Intercept Intercept and Criterion Only Covariates Chi-Square for Covariates AIC SC LOG L with 1 DF (p=0.0001)(*) Score with 1 DF (p=0.0001)(*) Analysis of Maximum Likelihood Estimates Parameter Standard Wald Pr > Standardized Odds Variable DF Estimate Error Chi-Square Chi-Square Estimate Ratio INTERCPT ( 0 ) GENDER ( 1 ) (*) (*)tests GENDER effect. ^ ^
14 היחידה לייעוץ סטטיסטי אוניברסיטת חיפה Odds(Female) = / = log(3.6970) = 1.3 GENDER=1: 0 + 1 *1 = = 1.3 = log{Odds(Female)} Odds(Male) = / = log(0.2721) = -1.3 GENDER=0: 0 + 1 *0 = -1.3 = log{Odds(Male)} Odds Ratio = Odds(Female)/Odds(Male) = / = log(OR) = log(13.58) = = 1 ^ ^ ^ ^ ^ ^^ ^^ ^^
15 היחידה לייעוץ סטטיסטי אוניברסיטת חיפה Logistic Regression Model: Survived = GENDER (Continued) Parameter Estimates and 95% Confidence Intervals Profile Likelihood Confidence Limits Parameter Variable Estimate Lower Upper INTERCPT GENDER Wald Confidence Limits Parameter Variable Estimate Lower Upper INTERCPT GENDER
16 היחידה לייעוץ סטטיסטי אוניברסיטת חיפה Logistic Regression Model: Survived = AGE Model Fitting Information and Testing Global Null Hypothesis BETA=0 Intercept Intercept and Criterion Only Covariates Chi-Square for Covariates AIC SC LOG L with 1 DF (p=0.0569)(*) Score with 1 DF (p=0.0575)(*) Analysis of Maximum Likelihood Estimates Parameter Standard Wald Pr > Standardized Odds Variable DF Estimate Error Chi-Square Chi-Square Estimate Ratio INTERCPT AGE (*) (*)tests AGE effect.
17 היחידה לייעוץ סטטיסטי אוניברסיטת חיפה
18 היחידה לייעוץ סטטיסטי אוניברסיטת חיפה Odds Ratio for a continuous explanatory variable x if p k = probability to survive for x=k and odds(k)=p k /(1-p k ), then: which means that e 1 = odds(k+1)/odds(k) is the Odds-Ratio for an increment of one unit on x.
19 היחידה לייעוץ סטטיסטי אוניברסיטת חיפה Logistic Regression Model: Survived = AGE (Continued) Parameter Estimates and 95% Confidence Intervals Profile Likelihood Confidence Limits Parameter Variable Estimate Lower Upper INTERCPT AGE Wald Confidence Limits Parameter Variable Estimate Lower Upper INTERCPT AGE
20 היחידה לייעוץ סטטיסטי אוניברסיטת חיפה Logistic Regression Model: Survived = GENDER and AGE Model Fitting Information and Testing Global Null Hypothesis BETA=0 Intercept Intercept and Criterion Only Covariates Chi-Square for Covariates AIC SC LOG L with 2 DF (p=0.0001)(*) Score with 2 DF (p=0.0001)(*) Analysis of Maximum Likelihood Estimates Parameter Standard Wald Pr > Standardized Odds Variable DF Estimate Error Chi-Square Chi-Square Estimate Ratio INTERCPT GENDER AGE (*)tests GENDER+AGE effect simultaneously.
21 היחידה לייעוץ סטטיסטי אוניברסיטת חיפה Logistic Regression Model: Survived = GENDER and AGE (continued) Parameter Estimates and 95% Confidence Intervals Profile Likelihood Confidence Limits Parameter Variable Estimate Lower Upper INTERCPT GENDER AGE
22 היחידה לייעוץ סטטיסטי אוניברסיטת חיפה Logistic Regression Model: Survived = GENDER and AGE (continued) Likelihood Ratio Test: for AGE effect in addition to GENDER effect -2logL(GENDER) - {-2logL(GENDER+AGE)} = = < (df=1) = 3.84 meaning there is no additional effect of AGE over GENDER.
23 היחידה לייעוץ סטטיסטי אוניברסיטת חיפה Logistic Regression Model: Survived = GENDER and AGE (continued) p 1 = Female probability of survival (GENDER=1) p 0 = Male probability of survival (GENDER=0) ^
24 היחידה לייעוץ סטטיסטי אוניברסיטת חיפה Logistic Regression Model: Survived = GENDER and AGE (continued) Female Male
25 היחידה לייעוץ סטטיסטי אוניברסיטת חיפה Logistic Regression Model: Survived = GENDER, AGE and Interaction(AGE*GEN) Model Fitting Information and Testing Global Null Hypothesis BETA=0 Intercept Intercept and Criterion Only Covariates Chi-Square for Covariates AIC SC LOG L with 3 DF (p=0.0001) Score with 3 DF (p=0.0001) Analysis of Maximum Likelihood Estimates Parameter Standard Wald Pr > Standardized Odds Variable DF Estimate Error Chi-Square Chi-Square Estimate Ratio INTERCPT GENDER AGE AGE*GEN
26 היחידה לייעוץ סטטיסטי אוניברסיטת חיפה Survived = GENDER, AGE and Interaction(AGE*GEN) (continued) Parameter Estimates and 95% Confidence Intervals Profile Likelihood Confidence Limits Parameter Variable Estimate Lower Upper INTERCPT GENDER AGE AGE*GEN
27 היחידה לייעוץ סטטיסטי אוניברסיטת חיפה Survived = GENDER, AGE and Interaction(Continued) Odds Ratio for GENDER = GENDER AGE AGE*GENDER = female probability of survival (GENDER=1) = male probability of survival (GENDER=0) = AGE AGE = AGE = AGE OR for GENDER is a function of AGE: ^ ^ ^
28 היחידה לייעוץ סטטיסטי אוניברסיטת חיפה Survived = GENDER, AGE and Interaction (Continued) Male Female Male Female
29 היחידה לייעוץ סטטיסטי אוניברסיטת חיפה Survived = GENDER, AGE and Interaction (Continued) Prediction: Survived Didn’t survive Correct Prediction = ( )/679 = 0.786
30 היחידה לייעוץ סטטיסטי אוניברסיטת חיפה Survived = GENDER, AGE and Interaction (Continued) Sensitivity =proportion of survivors who were correctly predicted to have survived = 207/296 = Specificity =proportion of Non-survivors who were correctly predicted to have not survived = 327/383 = False Pos. =Proportion of those predicted to survive who in fact did not survive = 56/263 = False Neg. =Proportion of those predicted not to survive, who in fact survived = 89/416 = 0.214
31 היחידה לייעוץ סטטיסטי אוניברסיטת חיפה Logistic Regression Model: Survived = GENDER, AGE and Interaction(AGE*GEN) (Continued) Classification Table Correct Incorrect Percentages Prob Non- Non- Sensi- Speci- False False Level Event Event Event Event Correct tivity ficity POS NEG
32 היחידה לייעוץ סטטיסטי אוניברסיטת חיפה C-Statistic = 0.814
33 היחידה לייעוץ סטטיסטי אוניברסיטת חיפה Dummy Variables Passenger Class CLASS1 CLASS2 1 st Class nd Class rd Class 0 0
34 היחידה לייעוץ סטטיסטי אוניברסיטת חיפה Logistic Regression Model: Survived = CLASS1 and CLASS2 Using LOGISTIC Procedure Model Fitting Information and Testing Global Null Hypothesis BETA=0 Intercept Intercept and Criterion Only Covariates Chi-Square for Covariates AIC SC LOG L with 2 DF (p=0.0001) Score with 2 DF (p=0.0001) Analysis of Maximum Likelihood Estimates Parameter Standard Wald Pr > Standardized Odds Variable DF Estimate Error Chi-Square Chi-Square Estimate Ratio INTERCPT CLASS CLASS
35 היחידה לייעוץ סטטיסטי אוניברסיטת חיפה Logistic Regression Model: Survived = PCLASS Using GENMOD Procedure Analysis Of Parameter Estimates Parameter DF Estimate Std Err ChiSquare Pr>Chi INTERCEPT PCLASS 1st PCLASS 2nd PCLASS 3rd LR Statistics For Type 3 Analysis Source DF ChiSquare Pr>Chi PCLASS
36 היחידה לייעוץ סטטיסטי אוניברסיטת חיפה Survived = GENDER, AGE, PCLASS and interactions Using GENMOD Procedure LR Statistics For Type 3 Analysis Source DF ChiSquare Pr>Chi GENDER AGE GENDER*AGE PCLASS GENDER*PCLASS AGE*PCLASS GENDER*AGE*PCLASS
37 היחידה לייעוץ סטטיסטי אוניברסיטת חיפה Survived = GENDER, AGE, PCLASS and interactions Excluding the 3-order interaction: GENDER*AGE*PCLASS LR Statistics For Type 3 Analysis Source DF ChiSquare Pr>Chi GENDER AGE GENDER*AGE PCLASS GENDER*PCLASS AGE*PCLASS
38 היחידה לייעוץ סטטיסטי אוניברסיטת חיפה C-Statistic = 0.879
39 היחידה לייעוץ סטטיסטי אוניברסיטת חיפה Survived = GENDER, AGE, PCLASS and 2nd-order interactions (continued) Analysis Of Parameter Estimates Parameter DF Estimate Std Err ChiSquare Pr>Chi INTERCEPT GENDER AGE GENDER*AGE PCLASS 1st PCLASS 2nd PCLASS 3rd GENDER*PCLASS 1st GENDER*PCLASS 2nd GENDER*PCLASS 3rd AGE*PCLASS 1st AGE*PCLASS 2nd AGE*PCLASS 3rd
40 היחידה לייעוץ סטטיסטי אוניברסיטת חיפה Survived = GENDER, AGE, PCLASS and 2nd-order interactions (continued) = GENDER AGE GENDER*AGE CLASS CLASS GENDER*CLASS GENDER*CLASS AGE*CLASS AGE*CLASS2 Hence, there are 6 models: one for each combination of GENDER and PCLASS. For example: GENDER=1 (Female) and PCLASS=1: = ( )*AGE = AGE
41 היחידה לייעוץ סטטיסטי אוניברסיטת חיפה Survived = GENDER, AGE, PCLASS and 2nd-order interactions (continued)
42 היחידה לייעוץ סטטיסטי אוניברסיטת חיפה Survived = GENDER, AGE, PCLASS and 2nd-order interactions (continued)