Download presentation
Presentation is loading. Please wait.
Published byFranklin Day Modified over 9 years ago
1
Binary logistic regression
2
Characteristic Regression model for target categorized variable explanatory variables – continuous and categorical Estimate of probability to categorize the dependent variable Enable to interpret the solution Sensitive to multicollinearity Exacting to data preparation
3
Applications In general: a response model to predict the probability of response To predict the probability to lose the certain type of client To predict fraud To predict the purchase of certain goods …….
4
Logistic regression model I Binary dependent variable 1…event occurs 0…event does not occur P(Y=1) how depends on values of independent variables?
5
Logistic regression model II Formula In classic linear regression model is within (-∞; ∞) In case of binary variable than indicate probability Y=1 Probability is within 0 and 1 To express probability can not be used simple linear combination of inputs Chance P/(1-P)…interval (0;∞) Logit ln(P/(1-P)…interval (-∞; ∞) and ln(P/(1-P)…the same interval
6
Logistic regression model III Logit of the P value is expressed as weighted sum of values of independent variable values.
7
Regression relation probability chance logit Logistic function
8
Categorical input variable – contrasts X1X2X3 Category 1100 Category 2010 Category 3001 Category 4000 reference category Contrast type Indicator
9
Contrasts I Convert categorical variables to several numerical variables (for example 0-1) Create contrasts with respect to interpretation Ordinal/nominal variables Reference catogories are not nedeed in all cases Contrast specification does not influence prediction
10
Contrasts II a) Indicator – each category is 0-1 variables, last or first category is skipped b) Simple – each category (except reference category) is compared with reference category c) Repeated – each category (except first) is compared with previous category d) Difference – each category (except first) is compared with average effect of previous categories c), d) Ordinal variables
11
Data preparation LR is sensitive to multicollinearity Necessary to reduce the number of variables Pay special attention to missing values extremes In practice are often (all) input variables categorized
12
Categorization of variables Possible way how to smooth extremes Categorization Based on experts Based on quantiles Optimal categorization with respect to target variable Categorized variable would not be based on many categories – it causes mutual relation of variables Merging of categories
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.