Download presentation
Presentation is loading. Please wait.
Published byShawn Owens Modified over 9 years ago
1
Logistic Regression Database Marketing Instructor: N. Kumar
2
Logistic Regression vs TGDA Two-Group Discriminant Analysis Implicitly assumes that the Xs are Multivariate Normally (MVN) Distributed This assumption is violated if Xs are categorical variables Logistic Regression does not impose any restriction on the distribution of the Xs Logistic Regression is the recommended approach if at least some of the Xs are categorical variables
3
Data
4
Contingency Table Type of Stock LargeSmallTotal Preferred10212 Not Preferred 11112 Total111324
5
Basic Concepts Probability Probability of being a preferred stock = 12/24 = 0.5 Probability that a company’s stock is preferred given that the company is large = 10/11 = 0.909 Probability that a company’s stock is preferred given that the company is small = 2/13 = 0.154
6
Concepts … contd. Odds Odds of a preferred stock = 12/12 = 1 Odds of a preferred stock given that the company is large = 10/1 = 10 Odds of a preferred stock given that the company is small = 2/11 = 0.182
7
Odds and Probability Odds(Event) = Prob(Event)/(1-Prob(Event)) Prob(Event) = Odds(Event)/(1+Odds(Event))
8
Logistic Regression Take Natural Log of the odds: ln(odds(Preferred|Large)) = ln(10) = 2.303 ln(odds(Preferred|Small)) = ln(0.182) = -1.704 Combining these relationships ln(odds(Preferred|Size)) = -1.704 + 4.007*Size Log of the odds is a linear function of size The coefficient of size can be interpreted like the coefficient in regression analysis
9
Interpretation Positive sign ln(odds) is increasing in size of the company i.e. a large company is more likely to have a preferred stock vis-à-vis a small company Magnitude of the coefficient gives a measure of how much more likely
10
General Model ln(odds) = 0 + 1 X 1 + 2 X 2 +…+ k X K (1) Recall: Odds = p/(1-p) ln(p/1-p) = 0 + 1 X 1 + 2 X 2 +…+ k X K (2) p =
11
Logistic Function
12
Estimation Coefficients in the regression model are estimated by minimizing the sum of squared errors Since, p is non-linear in the parameter estimates we need a non-linear estimation technique Maximum-Likelihood Approach Non-Linear Least Squares
13
Maximum Likelihood Approach Conditional on parameter , write out the probability of observing the data Write this probability out for each observation Multiply the probability of each observation out to get the joint probability of observing the data condition on Find the that maximizes the conditional probability of realizing this data
14
Logistic Regression Logistic Regression with one categorical explanatory variable reduces to an analysis of the contingency table
15
Interpretation of Results Look at the –2 Log L statistic Intercept only: 33.271 Intercept and Covariates: 17.864 Difference: 15.407 with 1 DF (p=0.0001) Means that the size variable is explaining a lot
16
Do the Variables Have a Significant Impact? Like testing whether the coefficients in the regression model are different from zero Look at the output from Analysis of Maximum Likelihood Estimates Loosely, the column Pr>Chi-Square gives you the probability of realizing the estimate in the Parameter estimate column if the estimate were truly zero – if this value is < 0.05 the estimate is considered to be significant
17
Other things to Look for Akaike’s Information Criterion (AIC), Schwartz’s Criterion (SC) – this like Adj- R 2 – so there is a penalty for having additional covariates The larger the difference between the second and third columns – the better the model fit
18
Interpretation of the Parameter Estimates ln(p/(1-p)) = -1.705 + 4.007*Size p/(1-p) = e (-1.705) e (4.007*Size) For a unit increase in size, odds of being a favored stock go up by e 4.007 = 54.982
19
Predicted Probabilities and Observed Responses The response variable (success) classifies an observation into an event or a no-event A concordant pair is defined as that pair formed by an event with a PHAT higher than that of the no-event Higher the Concordant pair % the better
20
Classification For a set of new observations where you have information on size alone You can use the model to predict the probability that success = 1 i.e. the stock is favored If PHAT > 0.5 success = 1else success=2
21
Logistic Regression with multiple independent variables Independent variables a mixture of continuous and categorical variables
22
Data
23
General Model ln(odds) = 0 + 1 Size + 2 FP ln(p/1-p) = 0 + 1 Size + 2 FP p =
24
Estimation & Interpretation of the Results Identical to the case with one categorical variable
25
Summary Logistic Regression or Discriminant Analysis Techniques differ in underlying assumptions about the distribution of the explanatory (independent) variables Use logistic regression if you have a mix of categorical and continuous variables
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.