Chapter 2: Logistic Regression 2.1 Likelihood Approach 2.2 Binary Logistic Regression 2.3 Nominal and Ordinal Logistic Regression Models 1.

Slides:



Advertisements
Similar presentations
Modeling of Data. Basic Bayes theorem Bayes theorem relates the conditional probabilities of two events A, and B: A might be a hypothesis and B might.
Advertisements

Multiple Regression and Model Building
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Managerial Economics in a Global Economy
Brief introduction on Logistic Regression
CHAPTER 21 Inferential Statistical Analysis. Understanding probability The idea of probability is central to inferential statistics. It means the chance.
Logistic Regression.
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
11 Simple Linear Regression and Correlation CHAPTER OUTLINE
Correlation & Regression Chapter 15. Correlation statistical technique that is used to measure and describe a relationship between two variables (X and.
QUALITATIVE AND LIMITED DEPENDENT VARIABLE MODELS.
Statistical Methods Chichang Jou Tamkang University.
The Simple Regression Model
Topic 3: Regression.
Linear and generalised linear models
Aaker, Kumar, Day Seventh Edition Instructor’s Presentation Slides
Notes on Logistic Regression STAT 4330/8330. Introduction Previously, you learned about odds ratios (OR’s). We now transition and begin discussion of.
An Introduction to Logistic Regression
Linear and generalised linear models
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Maximum likelihood (ML)
Simple Linear Regression Analysis
Correlation & Regression
MODELS OF QUALITATIVE CHOICE by Bambang Juanda.  Models in which the dependent variable involves two ore more qualitative choices.  Valuable for the.
McGraw-Hill/IrwinCopyright © 2009 by The McGraw-Hill Companies, Inc. All Rights Reserved. Simple Linear Regression Analysis Chapter 13.
Regression and Correlation Methods Judy Zhong Ph.D.
Aaker, Kumar, Day Ninth Edition Instructor’s Presentation Slides
Introduction to Linear Regression and Correlation Analysis
Inference for regression - Simple linear regression
Chapter 13: Inference in Regression
Correlation and Linear Regression
1 G Lect 11W Logistic Regression Review Maximum Likelihood Estimates Probit Regression and Example Model Fit G Multiple Regression Week 11.
Chapter 15 Correlation and Regression
BRIEF REVIEW OF STATISTICAL CONCEPTS AND METHODS.
Chapter 3: Generalized Linear Models 3.1 The Generalization 3.2 Logistic Regression Revisited 3.3 Poisson Regression 1.
3/2003 Rev 1 I – slide 1 of 33 Session I Part I Review of Fundamentals Module 2Basic Physics and Mathematics Used in Radiation Protection.
Prof. Dr. S. K. Bhattacharjee Department of Statistics University of Rajshahi.
Various topics Petter Mostad Overview Epidemiology Study types / data types Econometrics Time series data More about sampling –Estimation.
Examining Relationships in Quantitative Research
University of Warwick, Department of Sociology, 2014/15 SO 201: SSAASS (Surveys and Statistics) (Richard Lampard) Week 7 Logistic Regression I.
April 4 Logistic Regression –Lee Chapter 9 –Cody and Smith 9:F.
Multiple Regression Petter Mostad Review: Simple linear regression We define a model where are independent (normally distributed) with equal.
Maximum Likelihood Estimation Methods of Economic Investigation Lecture 17.
1 11 Simple Linear Regression and Correlation 11-1 Empirical Models 11-2 Simple Linear Regression 11-3 Properties of the Least Squares Estimators 11-4.
1 Chapter 1: Stratified Data Analysis 1.1 Introduction 1.2 Examining Associations among Variables 1.3 Recursive Partitioning 1.4 Introduction to Logistic.
Chapter 4: Introduction to Predictive Modeling: Regressions
© Copyright McGraw-Hill Correlation and Regression CHAPTER 10.
Chapter 16 Data Analysis: Testing for Associations.
Testing hypotheses Continuous variables. H H H H H L H L L L L L H H L H L H H L High Murder Low Murder Low Income 31 High Income 24 High Murder Low Murder.
1 Chapter 2: Logistic Regression and Correspondence Analysis 2.1 Fitting Ordinal Logistic Regression Models 2.2 Fitting Nominal Logistic Regression Models.
Examining Relationships in Quantitative Research
Chapter Thirteen Copyright © 2006 John Wiley & Sons, Inc. Bivariate Correlation and Regression.
28. Multiple regression The Practice of Statistics in the Life Sciences Second Edition.
Correlation & Regression Analysis
Chapter 15 The Chi-Square Statistic: Tests for Goodness of Fit and Independence PowerPoint Lecture Slides Essentials of Statistics for the Behavioral.
1 Chapter 4: Introduction to Predictive Modeling: Regressions 4.1 Introduction 4.2 Selecting Regression Inputs 4.3 Optimizing Regression Complexity 4.4.
Logistic Regression Analysis Gerrit Rooks
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 15-1 Chapter 15 Multiple Regression Model Building Basic Business Statistics 10 th Edition.
Statistical Methods. 2 Concepts and Notations Sample unit – the basic landscape unit at which we wish to establish the presence/absence of the species.
LOGISTIC REGRESSION. Purpose  Logistical regression is regularly used when there are only two categories of the dependent variable and there is a mixture.
Logistic Regression: Regression with a Binary Dependent Variable.
Chapter 4 Basic Estimation Techniques
Notes on Logistic Regression
Regression Techniques
POSC 202A: Lecture Lecture: Substantive Significance, Relationship between Variables 1.
Simple Linear Regression
CHAPTER 12 More About Regression
Logistic Regression.
Regression Part II.
MGS 3100 Business Analysis Regression Feb 18, 2016
Presentation transcript:

Chapter 2: Logistic Regression 2.1 Likelihood Approach 2.2 Binary Logistic Regression 2.3 Nominal and Ordinal Logistic Regression Models 1

Chapter 2: Logistic Regression 2.1 Likelihood Approach 2.2 Binary Logistic Regression 2.3 Nominal and Ordinal Logistic Regression Models 2

Objectives Explain likelihood and maximum likelihood theory and estimation. Demonstrate likelihood for categorical response and explanatory variable. 3

Likelihood The likelihood is a statement about a data set. The likelihood assumes a model for the data. Changing the model, either the function or the parameter values, changes the likelihood. The likelihood is the probability of the data as a whole. This likelihood assumes independence. 4

Likelihood for Binomial Example The marginal distribution of Survived can be modeled with the binomial distribution. 5

Maximum Likelihood Theory The objective is to estimate the parameter and maximize the likelihood of the observed data. The maximum likelihood estimator provides –a large sample normal distribution of estimates –asymptotic consistency (convergence) –asymptotic efficiency (smallest standard errors) 6

Maximum Likelihood Estimation Use the kernel, the part of the likelihood function that depends on the model parameter. Use the logarithm transform. –The product of probabilities becomes the sum of the logs of the probabilities. Maximize the log-likelihood by finding the solution to the derivative of the likelihood with respect to the parameter or by an appropriate numerical method. 7

Estimation for Binomial Example 8

9

2.01 Multiple Choice Poll What is the likelihood of the data? a.The sum of the probabilities of individual cases b.The product of the log of the probabilities of individual cases c.The product of the log of the individual cases d.The sum of the log of the probabilities of individual cases 10

2.01 Multiple Choice Poll – Correct Answer What is the likelihood of the data? a.The sum of the probabilities of individual cases b.The product of the log of the probabilities of individual cases c.The product of the log of the individual cases d.The sum of the log of the probabilities of individual cases 11

Titanic Example The null hypothesis is that there is no association between Survived and Class. The alternative hypothesis is that there is an association between Survived and Class. Compute the likelihood under both hypotheses. Compare the hypotheses by examining the difference in the likelihood. 12

Titanic Example 13 1 st 2 nd 3 rd Crew Row Total Survived Lost Column Total

Uncertainty The negative log-likelihood measures variation, sometimes called uncertainty, in the sample. The higher the value of the negative log-likelihood is, the greater the variability (uncertainty) in the data. Use negative log-likelihood in much the same way as you use the sum of squares with a continuous response. 14

Null Hypothesis 15 1 st 2 nd 3 rd Crew Survived0.323 Lost0.677 using the marginal distribution

Uncertainty: Null Hypothesis 16 Analogous to corrected total sum of squares

Alternative Hypothesis 17 1 st 2 nd 3 rd Crew Survived Lost using the conditional distribution

Uncertainty: Alternative Hypothesis 18

Uncertainty: Alternative Hypothesis 19 Analogous to error sum of squares

Model Uncertainty 20 Analogous to model sum of squares

Hypothesis Test for Association 21

Model R 2 22

23

2.02 Multiple Answer Poll How does the difference between the – log-likelihood for the full model and the reduced model inform you? a.It is the probability of the model. b.It represents the reduction in the uncertainty. c.It is the numerator of the R 2 statistic. d.It is twice the likelihood ratio test statistic. 24

2.02 Multiple Answer Poll – Correct Answer How does the difference between the – log-likelihood for the full model and the reduced model inform you? a.It is the probability of the model. b.It represents the reduction in the uncertainty. c.It is the numerator of the R 2 statistic. d.It is twice the likelihood ratio test statistic. 25

Model Selection Akaike’s Information Criterion is widely accepted as a useful metric in model selection. Smaller AIC values indicate a better model. A correction is added for small samples. 26

AIC c Difference The AIC c for any given model cannot be interpreted by itself. The difference in AIC c can be used to determine how much support the candidate model has compared to the model with the smallest AIC c. 27 ΔSupport 0-2Substantial 4-7Considerably Less >10Essentially None

Model Selection Another popular statistic for model selection is Schwartz’s Bayesian Information Criterion (BIC). It measures bias and variance in the model like AIC. Select the model with the smallest BIC to minimize over-fitting the data. It uses a stronger penalty term than AIC. 28

This demonstration illustrates the concepts discussed previously. Hypothesis Tests and Model Selection 29

30

Exercise This exercise reinforces the concepts discussed previously. 31

32

2.03 Quiz Is this association significant? Use the LRT to decide. 33

2.03 Quiz – Correct Answer Is this association significant? Use the LRT to decide. It is not significant at α=0.05 level. 34

Chapter 2: Logistic Regression 2.1 Likelihood Approach 2.2 Binary Logistic Regression 2.3 Nominal and Ordinal Logistic Regression Models 35

Objectives Explain the concepts of logistic regression. Fit a logistic regression model using JMP software. Examine logistic regression output. 36

Overview 37 ResponseExplanatoryMethod ContinuousCategoricalANOVA Continuous Linear Regression Categorical Crosstabulation CategoricalContinuousLogistic Regression

Types of Logistic Regression Models Binary logistic regression addresses a response with only two levels. Nominal logistic regression addresses a response with more than two levels with no inherent order. Ordinal logistic regression addresses a response with more than two levels with an inherent order. 38

Purpose of Logistic Regression A logistic regression model predicts the probability of specific outcomes. It is designed to describe probabilities associated with the levels of the response variable. Probability is bounded, [0, 1], but the response in a linear regression model is unbounded, (-∞,∞). 39

The Logistic Curve The relationship between the probability of a response and a predictor might not be linear. –Asymptotes arise from bounded probability. Transform the probability to make the relationship linear. –Two-step transformation for logistic regression. Linear regression cannot model this relationship well, but logistic regression can. 40

Logistic Curve The asymptotic limits of the probability produce a nonlinear relationship with the explanatory variable. 41

Transform Probability Step 1: Convert the probability to the odds. –Range of odds is 0 to ∞. Step 2: Convert the odds to the logarithm of the odds. –Range of log(odds) is -∞ to ∞. The log(odds) is a function of the probability and its range is suitable for linear regression. 42

What Are the Odds? The odds are a function of the probability of an event. The odds of two events or of one event under two conditions can be compared as a ratio. 43

Probability of Outcome 44 Default on Loan Yes No Yes Late Payments (Group A) 2060 No Late Payments (Group B) 1090 Total30150 Probability of defaulting=20/80 (.25) in Group A Probability of not defaulting=60/80 (.75) in Group A Total

Odds of Outcome 45 Odds of Defaulting in Group A probability of defaulting in group with history of late payments probability of not defaulting in group with history of late payments 0.25÷0.75=0.33 ÷ Odds are the ratio of P(A) to P(not A).

Odds Ratio of Outcome 46 Odds Ratio of Group A to Group B odds of defaulting in group with history of late payments odds of defaulting in group with no history of late payments 0.33÷0.11=3 ÷ Odds ratio is the ratio of odds(A) to odds(B).

Interpretation of the Odds Ratio ∞ no association B more likelyA more likely

48

2.04 Quiz If the chance of rain is 75%, then what are the odds that it will rain? 49

2.04 Quiz – Correct Answer If the chance of rain is 75%, then what are the odds that it will rain? The odds are 3 because the odds are the ratio of probability that it will rain to the probability that it will not, or 0.75/0.25=3. 50

Target or Positive Value The binary response takes two possible values that represent two states, an event and the corresponding non-event. The logit transform is based on the odds of the event. This event is known as the target or positive value. The target value is the first of the two response values. The first value is determined by alphanumeric sorting, the order in a recognized series of values, or by the Value Ordering column property if you add it. 51

Logit Transformation 52 where iindexes all cases (observations). π i is the probability that the event (survived, for example) occurs in the i th case. 1- π i is the probability that the event (survived, for example) does not occur in the i th case logis the natural log (to the base e).

Assumption 53  i Predictor Logit Transform 

Logistic Regression Model 54

55

2.05 Multiple Answer Poll Which of the following statements about the logit transform are true? a.The logit is a function of the odds of an outcome. b.The logit is a probability of an outcome. c.The logit linearizes the relationship with the predictor. d.The logit transformation parameters must be estimated. 56

2.05 Multiple Answer Poll – Correct Answer Which of the following statements about the logit transform are true? a.The logit is a function of the odds of an outcome. b.The logit is a probability of an outcome. c.The logit linearizes the relationship with the predictor. d.The logit transformation parameters must be estimated. 57

Likelihood Function A likelihood function expresses the probability of the observed data as a function of the unknown model parameters. The goal is to derive values of the parameters such that the probability of the observed data is as large as possible. 58

Maximum Likelihood Estimate 59 Log-likelihood

Model Inference 60 0 LogL 1 LogL 0 Log-likelihood function

Logistic Curve 61 Weak Relationship Strong Relationship Very Strong Relationship

Central Cutoff The ROC curve presented in Logistic Fit includes a yellow line that is tangent to the curve at the point with the maximum vertical distance from the diagonal line. This point provides the greatest separation between sensitivity and 1-specificity. This point is identified with an asterisk in the ROC Table. 62

Titanic Passengers Example There is another data set for 1309 passengers of the final voyage of the Titanic. –The crew members are not included. The new data set includes the variables Survived, Passenger Class, Sex, Age, Siblings and Spouses, Parents and Children, and Fare. –Some variables in this data set are not used in the demonstration. 63

64 This demonstration illustrates the concepts discussed previously. Binary Logistic Regression

65

2.06 Quiz You want to predict the probability of not surviving, given the number of siblings and spouses aboard. What kind of association exists between these two variables? Is it a strong relationship or a weak relationship? 66

2.06 Quiz – Correct Answer You want to predict the probability of not surviving, given the number of siblings and spouses aboard. What kind of association exists between these two variables? Is it a strong relationship or a weak relationship? Weak: the fitted regression line is nearly flat, indicating a weak association. 67

68

Exercise This exercise reinforces the concepts discussed previously. 69

Multiple Logistic Regression Several explanatory variables exhibit an association with Survived. Any of these associations can predict the outcome of Survived better than the overall proportion. Using more than one explanatory variable in a logistic regression model will further improve predictions of the outcome. 70

Interaction Effect The linear combination of the effects of predictors might not account for all of the association. The effect of one predictor might depend on the level of another predictor. This additional effect is known as an interaction. It is modeled by including a crossed term involving both predictors in the linear combination. –A, B, A*B 71

Lack of Fit The whole model test is a likelihood ratio test to decide whether the model is significantly better at predicting the response than the marginal distribution. The lack of fit test is a likelihood ratio test to decide whether another model could predict better than the current model. It compares the –log-likelihood of the fitted model to the –log-likelihood of the saturated model. The saturated model is achieved with a parameter for every observation and is a perfect fit to the data. 72

73 This demonstration illustrates the concepts discussed previously. Multiple Logistic Regression

74

75 Exercise This exercise reinforces the concepts discussed previously.

76

2.07 Multiple Choice Poll Suppose process A makes a product, which is evaluated as defective or non-defective. Suppose the probability of a defective is 0.2. Which is true? a.The odds of a defective from process A is given by 0.8/0.2=4. b.The odds of a defective from process A is given by 0.2/0.8=

2.07 Multiple Choice Poll – Correct Answer Suppose process A makes a product, which is evaluated as defective or non-defective. Suppose the probability of a defective is 0.2. Which is true? a.The odds of a defective from process A is given by 0.8/0.2=4. b.The odds of a defective from process A is given by 0.2/0.8=

2.08 Multiple Choice Poll The odds ratio for getting a defective product from process A versus getting one from process B is What is its interpretation? a.You expect defectives to occur 25 times more often from process B than from process A. b.You expect defectives to occur ¼ as often from process B than from process A. c.You expect defectives to occur 75% less often from process A than from process B. 79

2.08 Multiple Choice Poll – Correct Answer The odds ratio for getting a defective product from process A versus getting one from process B is What is its interpretation? a.You expect defectives to occur 25 times more often from process B than from process A. b.You expect defectives to occur ¼ as often from process B than from process A. c.You expect defectives to occur 75% less often from process A than from process B. 80

Chapter 2: Logistic Regression 2.1 Likelihood Approach 2.2 Binary Logistic Regression 2.3 Nominal and Ordinal Logistic Regression Models 81

Objectives Explain the generalized logit and the cumulative logit. Fit a nominal logistic and an ordinal logistic regression model. Interpret the parameter estimates and odds ratios. 82

Nominal Logistic Regression Binary logistic regression can be extended to responses with more than two levels. Three or more levels with no particular order or rank can be modeled with nominal logistic regression. The linear portion of the model remains the same as the one in the binary logistic model. The logit transform of the response is adapted to the nominal response. –This adaptation is known as the generalized logit. 83

Generalized Logits 84 Response Log Logit(1) Logit(2) Number of Generalized Logits= Number of Levels -1

Generalized Logit Model 85 Logit(i) Predictor X Different Slopes and Intercepts Logit(i) Predictor X Logit(2)=a 2 +B 2 X Logit(1)=a 1 +B 1 X Different Slopes and Intercepts

86

2.09 Multiple Answer Poll Suppose a nominal response variable has four levels. Which of the following statements is true? a.JMP computes three generalized logits. b.Logit(1) is the log odds for level 1 occurring versus level 4 occurring. c.JMP computes a separate intercept parameter for each logit. d.JMP computes a separate slope parameter for each logit. 87

2.09 Multiple Answer Poll – Correct Answer Suppose a nominal response variable has four levels. Which of the following statements is true? a.JMP computes three generalized logits. b.Logit(1) is the log odds for level 1 occurring versus level 4 occurring. c.JMP computes a separate intercept parameter for each logit. d.JMP computes a separate slope parameter for each logit. 88

Titanic Passengers Example The passengers on this voyage boarded the Titanic in one of three ports. –Southampton, England (S) –Cherbourg, France (C) –Queenstown, Ireland (Q) (Known today as Cobh) Predict the port of departure using the continuous predictors Fare, Siblings and Spouse, and Parents and Children. The explanatory variable Age has many missing values and it is correlated with the other predictors. 89

90 This demonstration illustrates the concepts discussed previously. Nominal Logistic Regression Model

91

92 Exercise This exercise reinforces the concepts discussed previously.

Ordinal Logistic Regression The generalized logits used in nominal logistic regression provide the most flexibility, but at the cost of a full set of parameters for each level of the response. Some responses are naturally ordinal. An ordinal response requires a unique intercept for all but the last level. An ordinal response uses a common set of parameters for all of the remaining terms. 93

Cumulative Logits 94 Response Log Logit(1) Logit(2) Number of Cumulative Logits= Number of Levels -1

Proportional Odds Assumptions 95 Predictor X Logit(i) Logit(2)= a 2 +BX Logit(1)= a 1 +BX Equal Slopes

Popcorn Example An experiment was conducted to determine whether the appeal of popcorn depended on the amount of salt. The response is ordinal. –1 (poor) to 5 (excellent). The factor is continuous with four levels, 0 to 3. 96

97 This demonstration illustrates the concepts discussed previously. Ordinal Logistic Regression

98

Exercise This exercise reinforces the concepts discussed previously. 99