Logistic regression for binary response variables.

Slides:



Advertisements
Similar presentations
Dummy Dependent variable Models
Advertisements

Statistical Analysis SC504/HS927 Spring Term 2008
Introduction to Generalized Linear Model (GLM) Man Li, Research Fellow International Food Policy Research Institute Technical Training for Modeling Scenarios.
Qualitative predictor variables
Prof. Navneet Goyal CS & IS BITS, Pilani
Brief introduction on Logistic Regression
732G21/732G28/732A35 Lecture computer programmers with different experience have performed a test. For each programmer we have recorded whether.
Logistic Regression.
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
1 BINARY CHOICE MODELS: LOGIT ANALYSIS The linear probability model may make the nonsense predictions that an event will occur with probability greater.
Simple Linear Regression
Exact Logistic Regression Larry Cook. Outline Review the logistic regression model Explore an example where model assumptions fail –Brief algebraic interlude.
April 25 Exam April 27 (bring calculator with exp) Cox-Regression
© 2010 Pearson Prentice Hall. All rights reserved Least Squares Regression Models.
Chapter 10 Simple Regression.
Life after linear regression
Sociology 601 Class 28: December 8, 2009 Homework 10 Review –polynomials –interaction effects Logistic regressions –log odds as outcome –compared to linear.
Data mining and statistical learning, lecture 5 Outline  Summary of regressions on correlated inputs  Ridge regression  PCR (principal components regression)
Chapter 11 Survival Analysis Part 2. 2 Survival Analysis and Regression Combine lots of information Combine lots of information Look at several variables.
Nemours Biomedical Research Statistics April 23, 2009 Tim Bunnell, Ph.D. & Jobayer Hossain, Ph.D. Nemours Bioinformatics Core Facility.
Lecture 16 – Thurs, Oct. 30 Inference for Regression (Sections ): –Hypothesis Tests and Confidence Intervals for Intercept and Slope –Confidence.
Notes on Logistic Regression STAT 4330/8330. Introduction Previously, you learned about odds ratios (OR’s). We now transition and begin discussion of.
Simple Linear Regression and Correlation
Generalized Linear Models
1 B. The log-rate model Statistical analysis of occurrence-exposure rates.
Logistic Regression In logistic regression the outcome variable is binary, and the purpose of the analysis is to assess the effects of multiple explanatory.
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Slides by JOHN LOUCKS & Updated by SPIROS VELIANITIS.
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 12: Multiple and Logistic Regression Marshall University.
1 G Lect 11W Logistic Regression Review Maximum Likelihood Estimates Probit Regression and Example Model Fit G Multiple Regression Week 11.
Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Inference on the Least-Squares Regression Model and Multiple Regression 14.
Population All members of a set which have a given characteristic. Population Data Data associated with a certain population. Population Parameter A measure.
Logistic Regression Pre-Challenger Relation Between Temperature and Field-Joint O-Ring Failure Dalal, Fowlkes, and Hoadley (1989). “Risk Analysis of the.
R Programming Odds & Odds Ratios 1. Session 3 Overview 1.Odds 2.Odds Ratio (OR) 3.Confidence Intervals for OR’s 4.Inference based on OR’s 2.
Introduction to Linear Regression
Logistic Regression STA2101/442 F 2014 See last slide for copyright information.
April 6 Logistic Regression –Estimating probability based on logistic model –Testing differences among multiple groups –Assumptions for model.
AN INTRODUCTION TO LOGISTIC REGRESSION ENI SUMARMININGSIH, SSI, MM PROGRAM STUDI STATISTIKA JURUSAN MATEMATIKA UNIVERSITAS BRAWIJAYA.
1 היחידה לייעוץ סטטיסטי אוניברסיטת חיפה פרופ’ בנימין רייזר פרופ’ דוד פרג’י גב’ אפרת ישכיל.
LOGISTIC REGRESSION A statistical procedure to relate the probability of an event to explanatory variables Used in epidemiology to describe and evaluate.
When and why to use Logistic Regression?  The response variable has to be binary or ordinal.  Predictors can be continuous, discrete, or combinations.
April 4 Logistic Regression –Lee Chapter 9 –Cody and Smith 9:F.
Forecasting Choices. Types of Variable Variable Quantitative Qualitative Continuous Discrete (counting) Ordinal Nominal.
Assessing Binary Outcomes: Logistic Regression Peter T. Donnan Professor of Epidemiology and Biostatistics Statistics for Health Research.
MBP1010 – Lecture 8: March 1, Odds Ratio/Relative Risk Logistic Regression Survival Analysis Reading: papers on OR and survival analysis (Resources)
Multiple Logistic Regression STAT E-150 Statistical Methods.
Multiple Regression  Similar to simple regression, but with more than one independent variable R 2 has same interpretation R 2 has same interpretation.
Logistic Regression. Linear regression – numerical response Logistic regression – binary categorical response eg. has the disease, or unaffected by the.
A first order model with one binary and one quantitative predictor variable.
Logistic regression. Recall the simple linear regression model: y =  0 +  1 x +  where we are trying to predict a continuous dependent variable y from.
Logistic regression (when you have a binary response variable)
1 Probability and Statistics Confidence Intervals.
Logistic Regression Hal Whitehead BIOL4062/5062.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 12 Analyzing the Association Between Quantitative Variables: Regression Analysis Section.
Logistic Regression and Odds Ratios Psych DeShon.
1 BINARY CHOICE MODELS: LOGIT ANALYSIS The linear probability model may make the nonsense predictions that an event will occur with probability greater.
Simple linear regression. What is simple linear regression? A way of evaluating the relationship between two continuous variables. One variable is regarded.
Logistic Regression Logistic Regression - Binary Response variable and numeric and/or categorical explanatory variable(s) –Goal: Model the probability.
Simple linear regression. What is simple linear regression? A way of evaluating the relationship between two continuous variables. One variable is regarded.
CHAPTER 7 Linear Correlation & Regression Methods
Logistic Regression.
Notes on Logistic Regression
Basic Estimation Techniques
John Loucks St. Edward’s University . SLIDES . BY.
Generalized Linear Models
Introduction to logistic regression a.k.a. Varbrul
Basic Estimation Techniques
Chapter 12 Inference on the Least-squares Regression Line; ANOVA
Logistic Regression.
Introduction to Logistic Regression
Logistic Regression.
Presentation transcript:

Logistic regression for binary response variables

Space shuttle example n = 24 space shuttle launches prior to Challenger disaster on January 27, 1986 Response y is an indicator variable –y = 1 if O-ring failures during launch –y = 0 if no O-ring failures during launch Predictor x 1 is launch temperature, in degrees Fahrenheit

Space shuttle example

A model

The mean of a binary response If there are 20% smokers and 80% non-smokers, and Y i = 1, if smoker and 0, if non-smoker, then: If p i = P (Y i = 1) and 1 – p i = P (Y i = 0), then:

A linear regression model for a binary response for Y i = 0, 1 If the simple linear regression model is: Then, the mean response … … is the probability that Y i = 1 when the level of the predictor variable is x i.

Space shuttle example

(Simple) logistic regression function

Space shuttle example

Alternative formulation of (simple) logistic regression function (algebra) “logit”

Space shuttle example

Interpretation of slope coefficients

Odds If there are 20% smokers and 80% non-smokers: “Odds are 4 to 1” … 4 non-smokers to 1 smoker. and If p i = P (Y i = 1) and 1 – p i = P (Y i = 0), then: and

Odds ratio MALE: 20% smokers and 80% non-smokers: FEMALE: 40% smokers and 60% non-smokers: The odds that a male is a nonsmoker is 2.67 times the odds that a female is a nonsmoker.

Odds ratio Group 1 Group 2 The odds ratio

Space shuttle example Predicted odds: Predicted odds at x 1 = 55 degrees: Predicted odds at x 1 = 80 degrees:

Space shuttle example Predicted odds ratio for x 1 = 55 relative to x 1 = 80: The odds of O-ring failure at 55 degrees Fahrenheit is 76 times the odds of O-ring failure at 80 degrees Fahrenheit!

Interpretation of slope coefficients The ratio of the odds at X 1 = A relative to the odds at X 1 = B (for fixed values of other X’s) is:

Estimation of logistic regression coefficients

Maximum likelihood estimation Choose as estimates of the parameters the values that assign the highest probability to (“maximize likelihood of”) the observed outcome.

Suppose For first observation, Y 1 = 1 and x 1 = 53: … for second observation, Y 2 = 1 and x 2 = 56: … and for 24th observation, Y 24 = 0 and x 24 = 81:

If α = 10 and β = -0.15, what is the probability of observed outcome? The log likelihood of the observed outcome is: The likelihood of the observed outcome is:

Maximum likelihood estimation Choose as estimates of the parameters the values that assign the highest probability to (“maximize likelihood of”) the observed outcome.

Suppose For first observation, Y 1 = 1 and x 1 = 53: … for second observation, Y 2 = 1 and x 2 = 56: … and for 24th observation, Y 24 = 0 and x 24 = 81:

If α = 10.8 and β = -0.17, what is the probability of observed outcome? The log likelihood of the observed outcome is: The likelihood of the observed outcome is:

Space shuttle example Link Function: Logit Response Information Variable Value Count failure 1 7 (Event) 0 17 Total 24 Logistic Regression Table Odds 95% CI Predictor Coef SE Coef Z P Ratio Lower Upper Constant temp

Properties of MLEs If a model is correct and the sample size is large enough: –MLEs are essentially unbiased. –Formulas exist for estimating the standard errors of the estimators. –The estimators are about as precise as any nearly unbiased estimators. –MLEs are approximately normally distributed.

Test and confidence intervals for single coefficients

Inference for β j Test statistic: follows approximate standard normal distribution. Confidence interval:

Space shuttle example Link Function: Logit Response Information Variable Value Count failure 1 7 (Event) 0 17 Total 24 Logistic Regression Table Odds 95% CI Predictor Coef SE Coef Z P Ratio Lower Upper Constant temp

Space shuttle example There is sufficient evidence, at the α = 0.05 level, to conclude that temperature is related to the probability of O-ring failure. For every 1-degree increase in temperature, the odds ratio of O-ring failure to O-ring non-failure is estimated to be 0.84 (95% CI is 0.72 to 0.99).

Survival in the Donner Party In 1846, Donner and Reed families traveled from Illinois to California by covered wagon. Group became stranded in eastern Sierra Nevada mountains when hit by heavy snow. 40 of 87 members died from famine and exposure. Are females better able to withstand harsh conditions than are males?

Survival in the Donner Party

Link Function: Logit Response Information Variable Value Count STATUS SURVIVED 20 (Event) DIED 25 Total 45 Logistic Regression Table Odds 95% CI Predictor Coef SE Coef Z P Ratio Lower Upper Constant AGE Gender