Logistic Regression Multivariate Analysis. What is a log and an exponent? Log is the power to which a base of 10 must be raised to produce a given number.

Slides:



Advertisements
Similar presentations
Continued Psy 524 Ainsworth
Advertisements

Logistic Regression.
Simple Logistic Regression
Regression Analysis Module 3. Regression Regression is the attempt to explain the variation in a dependent variable using the variation in independent.
Regression Analysis Once a linear relationship is defined, the independent variable can be used to forecast the dependent variable. Y ^ = bo + bX bo is.
1 BINARY CHOICE MODELS: LOGIT ANALYSIS The linear probability model may make the nonsense predictions that an event will occur with probability greater.
Multiple Regression [ Cross-Sectional Data ]
April 25 Exam April 27 (bring calculator with exp) Cox-Regression
Chapter 13 Multiple Regression
Linear Regression.
Chapter 12 Multiple Regression
Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc. Chap 14-1 Chapter 14 Introduction to Multiple Regression Basic Business Statistics 11 th Edition.
Notes on Logistic Regression STAT 4330/8330. Introduction Previously, you learned about odds ratios (OR’s). We now transition and begin discussion of.
An Introduction to Logistic Regression
Ch. 14: The Multiple Regression Model building
Introduction to Linear and Logistic Regression. Basic Ideas Linear Transformation Finding the Regression Line Minimize sum of the quadratic residuals.
Linear Regression/Correlation
Generalized Linear Models
Relationships Among Variables
Review for Final Exam Some important themes from Chapters 9-11 Final exam covers these chapters, but implicitly tests the entire course, because we use.
Logistic Regression Logistic Regression - Dichotomous Response variable and numeric and/or categorical explanatory variable(s) –Goal: Model the probability.
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 12: Multiple and Logistic Regression Marshall University.
Leedy and Ormrod Ch. 11 Gray Ch. 14
AS 737 Categorical Data Analysis For Multivariate
Logistic Regression STA2101/442 F 2014 See last slide for copyright information.
April 6 Logistic Regression –Estimating probability based on logistic model –Testing differences among multiple groups –Assumptions for model.
Examining Relationships in Quantitative Research
University of Warwick, Department of Sociology, 2014/15 SO 201: SSAASS (Surveys and Statistics) (Richard Lampard) Week 7 Logistic Regression I.
LOGISTIC REGRESSION A statistical procedure to relate the probability of an event to explanatory variables Used in epidemiology to describe and evaluate.
Chap 14-1 Copyright ©2012 Pearson Education, Inc. publishing as Prentice Hall Chap 14-1 Chapter 14 Introduction to Multiple Regression Basic Business Statistics.
Linear vs. Logistic Regression Log has a slightly better ability to represent the data Dichotomous Prefer Don’t Prefer Linear vs. Logistic Regression.
April 4 Logistic Regression –Lee Chapter 9 –Cody and Smith 9:F.
Week 5: Logistic regression analysis Overview Questions from last week What is logistic regression analysis? The mathematical model Interpreting the β.
Assessing Binary Outcomes: Logistic Regression Peter T. Donnan Professor of Epidemiology and Biostatistics Statistics for Health Research.
Section 9-1: Inference for Slope and Correlation Section 9-3: Confidence and Prediction Intervals Visit the Maths Study Centre.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 13 Multiple Regression Section 13.3 Using Multiple Regression to Make Inferences.
Logistic Regression. Linear Regression Purchases vs. Income.
Multiple Logistic Regression STAT E-150 Statistical Methods.
Correlation & Regression Analysis
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc.. Chap 14-1 Chapter 14 Introduction to Multiple Regression Basic Business Statistics 10 th Edition.
Logistic Regression Analysis Gerrit Rooks
CSE 5331/7331 F'07© Prentice Hall1 CSE 5331/7331 Fall 2007 Regression Margaret H. Dunham Department of Computer Science and Engineering Southern Methodist.
Introduction to Multiple Regression Lecture 11. The Multiple Regression Model Idea: Examine the linear relationship between 1 dependent (Y) & 2 or more.
Roger B. Hammer Assistant Professor Department of Sociology Oregon State University Conducting Social Research Logistic Regression Categorical Data Analysis.
Logistic Regression and Odds Ratios Psych DeShon.
Nonparametric Statistics
Logistic Regression For a binary response variable: 1=Yes, 0=No This slide show is a free open source document. See the last slide for copyright information.
1 BINARY CHOICE MODELS: LOGIT ANALYSIS The linear probability model may make the nonsense predictions that an event will occur with probability greater.
Lecturer: Ing. Martina Hanová, PhD. Business Modeling.
LOGISTIC REGRESSION. Purpose  Logistical regression is regularly used when there are only two categories of the dependent variable and there is a mixture.
PO 141: INTRODUCTION TO PUBLIC POLICY Summer I (2015) Claire Leavitt Boston University.
Logistic Regression: Regression with a Binary Dependent Variable.
Chapter 13 LOGISTIC REGRESSION. Set of independent variables Categorical outcome measure, generally dichotomous.
Nonparametric Statistics
Chapter 14 Introduction to Multiple Regression
BINARY LOGISTIC REGRESSION
Logistic Regression APKC – STATS AFAC (2016).
Inference for Regression (Chapter 14) A.P. Stats Review Topic #3
Introduction to Regression Analysis
Notes on Logistic Regression
Correlation and Simple Linear Regression
Multiple logistic regression
CHAPTER 29: Multiple Regression*
Nonparametric Statistics
Linear Regression/Correlation
Categorical Data Analysis Review for Final
Introduction to Logistic Regression
Introduction to Regression
Presentation transcript:

Logistic Regression Multivariate Analysis

What is a log and an exponent? Log is the power to which a base of 10 must be raised to produce a given number. The log of 1000 is 3 as 10 3 =1000. The log of an odds ratio of 1.0 is 0 as 10 0 = 1 Exponent (e) or raised to a certain power is the antilog of that number. Thus, (exp β ) = antilog β Antilog of log odds 0 is o =1 Exponential increases are curvilinear.

Main questions with logistic regression How do the odds of a successful outcome depend upon or change based on each explanatory variable (X)? How does the probability that a successful outcome occurs depend upon or change based on each explanatory variable (X)?

Logistic regression Single binary response variable predicted by categorical and interval variables Maximum likelihood model – the coefficients that make sample observations most likely are reported in the final model Binomial distribution that assumes a sigmoid curve (non-linear) The probability of success falls between 0 and 1 for all possible values of X (s-curve bends)

Sigmoid curve for logistic regression

Response variable Denote Y by 0 and 1 (dummy coding) 0 and 1 are usually termed failure and success of an outcome (by convention, success is category 1) The sample mean of Y is the sum of the number of successes divided by the sample size (proportion of success)

Odds ratios in logistic regression Can be thought of as likelihood or odds success based impact of predictors in model Interval : the odds of success for those who are a unit apart in X, net of other predictors. For dummy coefficients: the odds of success for those in the reference category of X (1) compared with those in the omitted (0) Every unit increase in X has an exponential effect on the odds of success so an odds ratio can be >1

Odds ratio π / 1- π is the odds ratio or the odds of success When the probability of success or π is ½ or 50-50, odds for success equals.5/1-.5 = 1.0. This means that success is equally as likely as failure Thus, predicted probability of.5 and an odds ratio of 1.0 are our points of comparison when making inferences

Logistic transformation of odds ratio To model dichotomous outcomes, SPSS takes logistic transformation of odds ratios: Log (π / 1- π ) = α + βX1 + βX2 … To interpret, we take the exponent values of beta coefficient for each predictor (can do for all in model) Odds ratio or the odds of success are: π / 1- π = e α + βX = e α + (e β ) X  Exponent 

We can also talk about the percentage change in odds for interval and dummy variables Thus, the exponential beta value in the SPSS output can be calculated into a percent by 100 (exp b –1) or the percentage change in odds for each unit increase in the independent variable. We don’t really talk about the intercept here … betas for each predictor are our concern

We can also talk about the probability of success or π Can calculate point estimates by substituting specific X values, thus it is good for forecasting, given respondent characteristics Impact of X on π is interactive/non-constant π is the probability of success and this probability varies as X changes and it is expressed in a % form (ranges 0-1) π = e α + βX / 1 + e α + βX or odds / 1 + odds

Slope in logistic regression models (FYI) Like the slope of a straight line, β refers to whether the sigmoid curve (π or prob. of success) increases β+ or decreases β- as the values of the intervals increase or we move from 0 to 1 for dummy Steepness of s-curve increases as absolute value of β increases The rate at which the curve climbs or descends changes according to the values of the independent variable thus β (X)

Slope in logistic regression models (FYI) When β = 0, π does not change as X increases (X has no bearing on probability or odds of success ) so the curve is flat, there is just a straight line For β > 0, π increases as X increases (probability of success increases thus curve increases) For β < 0, π decreases as X increases (probability of success decreases thus curve decreases) Mention.5 bit on next slide

Slope in logistic regression (FYI)

Null hypothesis for predictors Ho: β = 0 for Log (π / 1- π ) = α + βX 1…I X has no effect on the likelihood that [y =1] an outcome will occur Y is independent of X so the likelihood of being successful is the same for all income groups

Wald Statistic Null hypothesis test statistic for each predictor in your model Wald statistic is the significance test for each parameter in the model Null is that each β = 0 Has df=1; Chi-square distribution It is the square of z statistic which equals β/s.error

-2 log likelihood as test of null hypothesis for entire model A test of significance for model and is like the F-ratio; chi-square distribution; df = p α + β - p α Does the observed likelihood or odds of success differ from 1? Compares the model with the intercept alone to intercept and predictors. Do your predictors add to the predictive power of the model? Tests if the difference is 0 and is referred to as the model chi-square

Goodness of Fit Statistic – null for residuals (FYI) Compares observed probabilities or what you observed in the sample to the predicted probabilities of an outcome occurring based on model parameters in your equation Examines residuals – do the predictor coefficients significantly minimize their squared distances? Chi-square distribution; df = p Should be NS as observed and predicted are anticipated to be quite similar

Mean of our response variable attending self-help group (FYI) The sample mean of Y is the sum of the number of successes (yes to attend) divided by the sample size, n The sample mean is the proportion of successful outcomes Thus, 44 said yes and n = 400, thus mean proportion of yes is.11 or 11%

Odds ratio and % in odds change by age Age  = and p<.01 (beta negative). Thus, log odds of attending a self-help group decrease as a person gets older Exp  =.9431 … the odds ratio [exp  <1] thus odds decrease % change (in this case a reduction in) in odds of attending for each additional year of age is 100(exp  - 1) = 100 (.9431 – 1) = % less likely each year one ages

Predicted probability of attending by age When  <.5, the probability of attending declines and we would see a downward dip in the sigmoid curve with increasing values of X (keeping in mind probability ranges from 0-1) More meaningful with all predictors, however, a point estimate for age 80 would be:  = e (-.0586)(80) / 1 + e (-.0586)(80) =.009 The probability of those 80 years of age attending a group is 1%

Odds ratio and % change in odds by gender Gender  = and p<.05. Thus, odds of attending a self-help group among females is greater (referent category is female and beta is positive) Exp  = … odds of attending are 3.5 times as large for females as they are for males [exp  >1] % change (in this case an increase in) in odds of attending when a person is female is 100(exp  - 1)=100(3.50 – 1) = 250 %

Predicted probability of attending by gender e (1.254)(1) / 1 + e (1.254)(1) =.77 Thus, the probability of attending among females is 77% When  >.5, the probability of attending increases and we would see an upward trend in the sigmoid curve with increasing values of X on the horizontal axis (keeping in mind probability ranges from 0-1)

Wald statistic Coefficient for each independent variable is 0 Tells us which variables significantly predictor the likelihood of attending a self-help group Age = ** Gender *

Likelihood statistic for the model Likelihood or odds are 1.0 and predicted probability is.5 Constant alone minus constant and all predictors * All of our predictor variables have β = 0 With 12 df model chi-square of has a p<.0001 The predictors in model significantly add to our capacity to predict attendance

Goodness of Fit (FYI) , 12 df, ns Our model parameters minimize the squared distances [residual] between actual sample observations of attendance to that which the logistic regression equation predicts (odds and probabilities)

Logistic Regression References DeMaris, A. (1995). A tutorial in logistic regression. Journal of Marriage and the Family, 57(10): Agresti, A. & Finlay, B. (1997). Logistic regression – modeling categorical responses. Statistical methods for social sciences (3 rd ed., pp ). Prentice Hall: New Jersey. Dwyer, J.H. (1983). Statistical methods for the social and behavioral sciences (pp ). Oxford University Press: New York.