When and why to use Logistic Regression? The response variable has to be binary or ordinal. Predictors can be continuous, discrete, or combinations of variables. Predictors do not have to be normally distributed Predictors does not have to be linearly related. estimated probabilities lie between 0 and 1. Non-linear relationships between the response and predictors
A non-parametric method that requires no specific distribution of the errors. Offers easy model-building or variable selection procedures. Parameter estimates are obtained by maximum likelihood methods When and why to use Logistic Regression?
The Logistic Model If (p) is the probability of the event, then odds of the event is : The simple logistic model is based on a linear relationship between natural logarithm (ln) of the odds of an event and independent variables logit of y {
Using the laws of exponents and logs to express (p) in terms of L : and : The Logistic Model
L=ln(o) probability (s shaped curve) The Logistic Model
Basic interpretation of . When x 1 = x and x 2 = x +1, then the log odds changes by amount which means that, the odds becomes exp( ) times the original.
Basic interpretation of .
Data Form The data could be collected either : Individually ( binary data ) As a group (if there are more observations on each x value) In this case, it is sufficient to report the total number of ‘1’s at each x value.
Example 1: binary data agemastitisageMastitis
OR P(mastitis)=1 Example 1: binary data
For age 22 month
OR compares the odds of an event for two cows, one with and the one with With X values 1 unit apart :
Odds ratios range from 0 to positive infinity O R < 1 = (P) <.50, OR > 1 = ( p ) >.50. Odds ratios
Deviance Measure of deviation between the estimated and observed values analogous to SS residual for linear model
Example 2 : grouped data Age group Number in group Mastitis in group
95% confidence limits 95% confidence interval around the odds ratio : ?
Summary of main points Logistic regression model is used to analyze the association between a binary outcome and one or many determinants. The determinants can be binary, categorical or continuous measurements The model is logit (p) = log[p / (1-p)] = a + bX where X is a factor, and a and b must be estimated
Thank you for your attention
About logit Logit this is the natural log of an odds ratio; often called a log odds even though it really is a log odds ratio. The logit scale is linear and functions much like a z- score scale. LOGITS ARE CONTINOUS, LIKE Z SCORES p = 0.50 logit = 0 p = 0.70 logit = 0.84 p = 0.30 logit = -0.84
More on odds ratios Gender difference in preference for sport. A group of 57 men and 167 women were asked to make preference for sport. The results are as follows: Question: Is there a gender effect on the preference ? GenderlikeDislikeALL Men Women ALL
O men = / = O women = / = GenderLikeDislikeAllP(like) Men Women All Meaning: the odds of preference in men is 2.55 times higher than in women OR = O men / O women = / = 2.55 More on odds ratios
OR > 1: the odds of preference is higher in men than in women OR < 1: the odds of preference is lower in men than in women OR = 1: the odds of preference in men is the same as in women Note : More on odds ratios