Download presentation
Presentation is loading. Please wait.
Published byWilla Arlene McCarthy Modified over 9 years ago
1
Logistic regression
2
Recall the simple linear regression model: y = 0 + 1 x + where we are trying to predict a continuous dependent variable y from a continuous independent variable x. This model can be extended to Multiple linear regression model: y = 0 + 1 x 1 + 2 x 2 + … + + p x p + Here we are trying to predict a continuous dependent variable y from a several continuous dependent variables x 1, x 2, …, x p.
3
Now suppose the dependent variable y is binary. It takes on two values “Success” (1) or “Failure” (0) This is the situation in which Logistic Regression is used We are interested in predicting a y from a continuous dependent variable x.
4
Example We are interested how the success (y) of a new antibiotic cream is curing “acne problems” and how it depends on the amount (x) that is applied daily. The values of y are 1 (Success) or 0 (Failure). The values of x range over a continuum
5
The logisitic Regression Model Let p denote P[y = 1] = P[Success]. This quantity will increase with the value of x. The ratio: is called the odds ratio This quantity will also increase with the value of x, ranging from zero to infinity. The quantity: is called the log odds ratio
6
Example: odds ratio, log odds ratio Suppose a die is rolled: Success = “roll a six”, p = 1/6 The odds ratio The log odds ratio
7
The logisitic Regression Model i. e. : In terms of the odds ratio Assumes the log odds ratio is linearly related to x.
8
The logisitic Regression Model or Solving for p in terms x.
9
Interpretation of the parameter 0 (determines the intercept) p x
10
Interpretation of the parameter 1 (determines when p is 0.50 (along with 0 )) p x when
11
Also when is the rate of increase in p with respect to x when p = 0.50
12
Interpretation of the parameter 1 (determines slope when p is 0.50 ) p x
13
The data The data will for each case consist of 1.a value for x, the continuous independent variable 2.a value for y (1 or 0) (Success or Failure) Total of n = 250 cases
15
Estimation of the parameters The parameters are estimated by Maximum Likelihood estimation and require a statistical package such as SPSS
16
Using SPSS to perform Logistic regression Open the data file:
17
Choose from the menu: Analyze -> Regression -> Binary Logistic
18
The following dialogue box appears Select the dependent variable (y) and the independent variable (x) (covariate). Press OK.
19
Here is the output The Estimates and their S.E.
20
The parameter Estimates
21
Interpretation of the parameter 0 (determines the intercept) Interpretation of the parameter 1 (determines when p is 0.50 (along with 0 ))
22
Another interpretation of the parameter 1 is the rate of increase in p with respect to x when p = 0.50
23
The Multiple Logistic Regression model
24
Here we attempt to predict the outcome of a binary response variable Y from several independent variables X 1, X 2, … etc
25
Multiple Logistic Regression an example In this example we are interested in determining the risk of infants (who were born prematurely) of developing BPD (bronchopulmonary dysplasia) More specifically we are interested in developing a predictive model which will determine the probability of developing BPD from X 1 = gestational Age and X 2 = Birthweight
26
For n = 223 infants in prenatal ward the following measurements were determined 1.X 1 = gestational Age (weeks), 2.X 2 = Birth weight (grams) and 3.Y = presence of BPD
27
The data
28
The results
29
Graph: Showing Risk of BPD vs GA and BrthWt
30
Discrete Multivariate Analysis Analysis of Multivariate Categorical Data
31
Example 1 In this study we examine n = 1237 individuals measuring X, Systolic Blood Pressure and Y, Serum Cholesterol
32
Example 2 The following data was taken from a study of parole success involving 5587 parolees in Ohio between 1965 and 1972 (a ten percent sample of all parolees during this period).
33
The study involved a dichotomous response Y –Success (no major parole violation) or –Failure (returned to prison either as technical violators or with a new conviction) based on a one-year follow-up. The predictors of parole success included are: 1.type of committed offence (Person offense or Other offense), 2.Age (25 or Older or Under 25), 3.Prior Record (No prior sentence or Prior Sentence), and 4.Drug or Alcohol Dependency (No drug or Alcohol dependency or Drug and/or Alcohol dependency).
34
The data were randomly split into two parts. The counts for each part are displayed in the table, with those for the second part in parentheses. The second part of the data was set aside for a validation study of the model to be fitted in the first part.
35
Table
36
Analysis of a Two-way Frequency Table:
37
Frequency Distribution (Serum Cholesterol and Systolic Blood Pressure)
38
Joint and Marginal Distributions (Serum Cholesterol and Systolic Blood Pressure) The Marginal distributions allow you to look at the effect of one variable, ignoring the other. The joint distribution allows you to look at the two variables simultaneously.
39
Conditional Distributions ( Systolic Blood Pressure given Serum Cholesterol ) The conditional distribution allows you to look at the effect of one variable, when the other variable is held fixed or known.
40
Conditional Distributions (Serum Cholesterol given Systolic Blood Pressure)
41
GRAPH: Conditional distributions of Systolic Blood Pressure given Serum Cholesterol
42
Notation: Let x ij denote the frequency (no. of cases) where X (row variable) is i and Y (row variable) is j.
43
Different Models The Multinomial Model: Here the total number of cases N is fixed and x ij follows a multinomial distribution with parameters ij
44
The Product Multinomial Model: Here the row (or column) totals R i are fixed and for a given row i, x ij follows a multinomial distribution with parameters j|i
45
The Poisson Model: In this case we observe over a fixed period of time and all counts in the table (including Row, Column and overall totals) follow a Poisson distribution. Let ij denote the mean of x ij.
46
Independence
47
Multinomial Model if independent and The estimated expected frequency in cell (i,j) in the case of independence is:
48
The same can be shown for the other two models – the Product Multinomial model and the Poisson model namely The estimated expected frequency in cell (i,j) in the case of independence is: Standardized residuals are defined for each cell:
49
The Chi-Square Statistic The Chi-Square test for independence Reject H 0 : independence if
50
Table Expected frequencies, Observed frequencies, Standardized Residuals 2 = 20.85 (p = 0.0133)
51
Example In the example N = 57,407 cases in which individuals were victimized twice by crimes were studied. The crime of the first victimization (X) and the crime of the second victimization (Y) were noted. The data were tabulated on the following slide
52
Table 1: Frequencies
53
Table 2: Standardized residuals
54
Table 3: Conditional distribution of second victimization given the first victimization (%)
55
Log Linear Model
56
Recall, if the two variables, rows (X) and columns (Y) are independent then and
57
In general let then where (1) Equation (1) is called the log-linear model for the frequencies x ij.
58
Note: X and Y are independent if In this case the log-linear model becomes
59
Three-way Frequency Tables
60
Example Data from the Framingham Longitudinal Study of Coronary Heart Disease (Cornfield [1962]) Variables 1.Systolic Blood Pressure (X) –< 127, 127-146, 147-166, 167+ 2.Serum Cholesterol –<200, 200-219, 220-259, 260+ 3.Heart Disease –Present, Absent The data is tabulated on the next slide
61
Three-way Frequency Table
62
Log-Linear model for three-way tables Let ijk denote the expected frequency in cell (i,j,k) of the table then in general where
63
Hierarchical Log-linear models for categorical Data For three way tables The hierarchical principle: If an interaction is in the model, also keep lower order interactions and main effects associated with that interaction
64
1.Model: (All Main effects model) ln ijk = u + u 1(i) + u 2(j) + u 3(k) i.e. u 12(i,j) = u 13(i,k) = u 23(j,k) = u 123(i,j,k) = 0. Notation: [1][2][3] Description: Mutual independence between all three variables.
65
2.Model: ln ijk = u + u 1(i) + u 2(j) + u 3(k) + u 12(i,j) i.e. u 13(i,k) = u 23(j,k) = u 123(i,j,k) = 0. Notation: [12][3] Description: Independence of Variable 3 with variables 1 and 2.
66
3.Model: ln ijk = u + u 1(i) + u 2(j) + u 3(k) + u 13(i,k) i.e. u 12(i,j) = u 23(j,k) = u 123(i,j,k) = 0. Notation: [13][2] Description: Independence of Variable 2 with variables 1 and 3.
67
4.Model: ln ijk = u + u 1(i) + u 2(j) + u 3(k) + u 23(j,k) i.e. u 12(i,j) = u 13(i,k) = u 123(i,j,k) = 0. Notation: [23][1] Description: Independence of Variable 3 with variables 1 and 2.
68
5.Model: ln ijk = u + u 1(i) + u 2(j) + u 3(k) + u 12(i,j) + u 13(i,k) i.e. u 23(j,k) = u 123(i,j,k) = 0. Notation: [12][13] Description: Conditional independence between variables 2 and 3 given variable 1.
69
6.Model: ln ijk = u + u 1(i) + u 2(j) + u 3(k) + u 12(i,j) + u 23(j,k) i.e. u 13(i,k) = u 123(i,j,k) = 0. Notation: [12][23] Description: Conditional independence between variables 1 and 3 given variable 2.
70
7.Model: ln ijk = u + u 1(i) + u 2(j) + u 3(k) + u 13(i,k) + u 23(j,k) i.e. u 12(i,j) = u 123(i,j,k) = 0. Notation: [13][23] Description: Conditional independence between variables 1 and 2 given variable 3.
71
8.Model: ln ijk = u + u 1(i) + u 2(j) + u 3(k) + u 12(i,j) + u 13(i,k) + u 23(j,k) i.e. u 123(i,j,k) = 0. Notation: [12][13][23] Description: Pairwise relations among all three variables, with each two variable interaction unaffected by the value of the third variable.
72
9.Model: (the saturated model) ln ijk = u + u 1(i) + u 2(j) + u 3(k) + u 12(i,j) + u 13(i,k) + u 23(j,k) + u 123(i,j,k) Notation: [123] Description: No simplifying dependence structure.
73
Hierarchical Log-linear models for 3 way table ModelDescription [1][2][3] Mutual independence between all three variables. [1][23] Independence of Variable 1 with variables 2 and 3. [2][13] Independence of Variable 2 with variables 1 and 3. [3][12] Independence of Variable 3 with variables 1 and 2. [12][13] Conditional independence between variables 2 and 3 given variable 1. [12][23] Conditional independence between variables 1 and 3 given variable 2. [13][23] Conditional independence between variables 1 and 2 given variable 3. [12][13] [23] Pairwise relations among all three variables, with each two variable interaction unaffected by the value of the third variable. [123] The saturated model
74
Maximum Likelihood Estimation Log-Linear Model
75
For any Model it is possible to determine the maximum Likelihood Estimators of the parameters Example Two-way table – independence – multinomial model or
76
Log-likelihood where With the model of independence
77
and with also
78
Let Now
79
Since
80
Now or
81
Hence and Similarly Finally
82
Hence Now and
83
Hence Note or
84
Comments Maximum Likelihood estimates can be computed for any hierarchical log linear model (i.e. more than 2 variables) In certain situations the equations need to be solved numerically For the saturated model (all interactions and main effects), the estimate of ijk… is x ijk….
85
Goodness of Fit Statistics These statistics can be used to check if a log-linear model will fit the observed frequency table
86
Goodness of Fit Statistics The Chi-squared statistic The Likelihood Ratio statistic: d.f. = # cells - # parameters fitted We reject the model if 2 or G 2 is greater than
87
Example: Variables 1.Systolic Blood Pressure (B) Serum Cholesterol (C) Coronary Heart Disease (H)
88
MODEL DF LIKELIHOOD- PROB. PEARSON PROB. RATIO CHISQ CHISQ ----- -- ----------- ------- ------- ------- B,C,H. 24 83.15 0.0000 102.00 0.0000 B,CH. 21 51.23 0.0002 56.89 0.0000 C,BH. 21 59.59 0.0000 60.43 0.0000 H,BC. 15 58.73 0.0000 64.78 0.0000 BC,BH. 12 35.16 0.0004 33.76 0.0007 BH,CH. 18 27.67 0.0673 26.58 0.0872 n.s. CH,BC. 12 26.80 0.0082 33.18 0.0009 BC,BH,CH. 9 8.08 0.5265 6.56 0.6824 n.s. Goodness of fit testing of Models Possible Models: 1. [BH][CH] – B and C independent given H. 2. [BC][BH][CH] – all two factor interaction model
89
Model 1: [BH][CH] Log-linear parameters Heart disease -Blood Pressure Interaction
90
Multiplicative effect Log-Linear Model
91
Heart Disease - Cholesterol Interaction
92
Multiplicative effect
93
Model 2: [BC][BH][CH] Log-linear parameters Blood pressure-Cholesterol interaction:
94
Multiplicative effect
95
Heart disease -Blood Pressure Interaction
96
Multiplicative effect
97
Heart Disease - Cholesterol Interaction
98
Multiplicative effect
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.