LOGISTIC REGRESSION A statistical procedure to relate the probability of an event to explanatory variables Used in epidemiology to describe and evaluate.

Slides:



Advertisements
Similar presentations
Chapter 2 Describing Contingency Tables Reported by Liu Qi.
Advertisements

Lesson 10: Linear Regression and Correlation
Brief introduction on Logistic Regression
Logistic Regression Psy 524 Ainsworth.
CHAPTER 21 Inferential Statistical Analysis. Understanding probability The idea of probability is central to inferential statistics. It means the chance.
Regression analysis Linear regression Logistic regression.
Logistic Regression.
Departments of Medicine and Biostatistics
Regression Analysis Once a linear relationship is defined, the independent variable can be used to forecast the dependent variable. Y ^ = bo + bX bo is.
1 BINARY CHOICE MODELS: LOGIT ANALYSIS The linear probability model may make the nonsense predictions that an event will occur with probability greater.
Objectives (BPS chapter 24)
Logistic Regression STA302 F 2014 See last slide for copyright information 1.
April 25 Exam April 27 (bring calculator with exp) Cox-Regression
Logistic Regression Multivariate Analysis. What is a log and an exponent? Log is the power to which a base of 10 must be raised to produce a given number.
Copyright © 2008 by the McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Managerial Economics, 9e Managerial Economics Thomas Maurice.
PH6415 Review Questions. 2 Question 1 A journal article reports a 95%CI for the relative risk (RR) of an event (treatment versus control as (0.55, 0.97).
Introduction to Logistic Regression. Simple linear regression Table 1 Age and systolic blood pressure (SBP) among 33 adult women.
Chapter 11 Survival Analysis Part 2. 2 Survival Analysis and Regression Combine lots of information Combine lots of information Look at several variables.
EPI 809/Spring Multiple Logistic Regression.
Basic Statistical Concepts Donald E. Mercante, Ph.D. Biostatistics School of Public Health L S U - H S C.
Correlation 1. Correlation - degree to which variables are associated or covary. (Changes in the value of one tends to be associated with changes in the.
Regression and Correlation
Correlation & Regression
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 12: Multiple and Logistic Regression Marshall University.
Education 795 Class Notes Applied Research Logistic Regression Note set 10.
Dr Laura Bonnett Department of Biostatistics. UNDERSTANDING SURVIVAL ANALYSIS.
April 11 Logistic Regression –Modeling interactions –Analysis of case-control studies –Data presentation.
+ Chapter 12: Inference for Regression Inference for Linear Regression.
Logistic Regression STA2101/442 F 2014 See last slide for copyright information.
April 6 Logistic Regression –Estimating probability based on logistic model –Testing differences among multiple groups –Assumptions for model.
CS 478 – Tools for Machine Learning and Data Mining Linear and Logistic Regression (Adapted from various sources) (e.g., Luiz Pessoa PY 206 class at Brown.
AN INTRODUCTION TO LOGISTIC REGRESSION ENI SUMARMININGSIH, SSI, MM PROGRAM STUDI STATISTIKA JURUSAN MATEMATIKA UNIVERSITAS BRAWIJAYA.
Multiple Regression and Model Building Chapter 15 Copyright © 2014 by The McGraw-Hill Companies, Inc. All rights reserved.McGraw-Hill/Irwin.
When and why to use Logistic Regression?  The response variable has to be binary or ordinal.  Predictors can be continuous, discrete, or combinations.
Linear vs. Logistic Regression Log has a slightly better ability to represent the data Dichotomous Prefer Don’t Prefer Linear vs. Logistic Regression.
APPLIED DATA ANALYSIS IN CRIMINAL JUSTICE CJ 525 MONMOUTH UNIVERSITY Juan P. Rodriguez.
Copyright © 2005 by the McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Managerial Economics Thomas Maurice eighth edition Chapter 4.
April 4 Logistic Regression –Lee Chapter 9 –Cody and Smith 9:F.
Lecture 2 Review Probabilities Probability Distributions Normal probability distributions Sampling distributions and estimation.
© Department of Statistics 2012 STATS 330 Lecture 20: Slide 1 Stats 330: Lecture 20.
CORRELATION: Correlation analysis Correlation analysis is used to measure the strength of association (linear relationship) between two quantitative variables.
Chapter 16 Data Analysis: Testing for Associations.
Statistical planning and Sample size determination.
Logistic Regression. Linear Regression Purchases vs. Income.
1 Multivariable Modeling. 2 nAdjustment by statistical model for the relationships of predictors to the outcome. nRepresents the frequency or magnitude.
Scatter Diagrams scatter plot scatter diagram A scatter plot is a graph that may be used to represent the relationship between two variables. Also referred.
Multiple Logistic Regression STAT E-150 Statistical Methods.
Multiple Regression  Similar to simple regression, but with more than one independent variable R 2 has same interpretation R 2 has same interpretation.
Heart Disease Example Male residents age Two models examined A) independence 1)logit(╥) = α B) linear logit 1)logit(╥) = α + βx¡
Linear Models Binary Logistic Regression. One-Way IVR 2 Hawaiian Bats Examine data.frame on HO Section 1.1 Questions –Is subspecies related to canine.
1 Introduction to Modeling Beyond the Basics (Chapter 7)
Probability and odds Suppose we a frequency distribution for the variable “TB status” The probability of an individual having TB is frequencyRelative.
26134 Business Statistics Week 4 Tutorial Simple Linear Regression Key concepts in this tutorial are listed below 1. Detecting.
Applied Epidemiologic Analysis - P8400 Fall 2002 Labs 6 & 7 Case-Control Analysis ----Logistic Regression Henian Chen, M.D., Ph.D.
Logistic Regression An Introduction. Uses Designed for survival analysis- binary response For predicting a chance, probability, proportion or percentage.
Logistic Regression and Odds Ratios Psych DeShon.
Nonparametric Statistics
1 BINARY CHOICE MODELS: LOGIT ANALYSIS The linear probability model may make the nonsense predictions that an event will occur with probability greater.
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 13: Multiple, Logistic and Proportional Hazards Regression.
26134 Business Statistics Week 4 Tutorial Simple Linear Regression Key concepts in this tutorial are listed below 1. Detecting.
Nonparametric Statistics
Basic Estimation Techniques
Logistic Regression Logistic Regression is used to study or model the association between a binary response variable (y) and a set of explanatory variables.
Basic Estimation Techniques
Multiple logistic regression
Nonparametric Statistics
Introduction to Logistic Regression
Simple Linear Regression
Logistic Regression.
Presentation transcript:

LOGISTIC REGRESSION A statistical procedure to relate the probability of an event to explanatory variables Used in epidemiology to describe and evaluate the effect of a risk on the occurrence of a disease event. Example: Framingham Heart Study Coronary heart disease and blood pressure

LOGISTIC REGRESSION: AN EXAMPLE Event: Coronary Heart Disease Occurrence is the dependent variable, which takes 2 values: Yes or No. Risk factor: Blood pressure Systolic blood pressure is the independent variable X, a continuous measurement. The probability of getting coronary heart disease depends on blood pressure.

DATA

SCATTER PLOT

LINEAR REGRESSION FOR Prob.(CHD): NOT A GOOD IDEA!

PROPORTION WITH CHD BY SBP GROUP Systolic BP Range Proportion mmHg 0/ mmHg 2/ mmHg 3/3 1.00

LOGISTIC REGRESSION PROBABILITY MODEL 1 p(X) = exp (-  0 -   X) The probability of the event varies as an S-shaped function of the risk factor X: the logistic curve.

LOGISTIC CURVE MODEL: OCCURRENCE OF CHD AS A FUNCTION OF SBP

LOGISTIC MODEL: LOG ODDS p (X) log =  0 +  1 X 1 - p (X) The log of the odds of the event is a linear function of X. Log(odds of CHD) = (SBP)

ODDS The odds of an event is the chance that the event occurs divided by the chance of its not occurring: Odds = p/(1 - p) = p/q

  : KEY PARAMETER OF THE LOGISTIC MODEL p (X) log =  0 +  1 X 1 - p (X) The parameter   is like the slope of a linear regression model.   = 0 indicates that X has no effect on the probability, e.g., a man’s chance of CHD does not depend on his SBP.

 1 : KEY PARAMETER p (X) log =  0 +  1 X 1 - p (X) The coefficient  1 measures the amount of change in the log of the odds per unit change in X.

 1 : KEY PARAMETER log odds(X+1) =  0 +  1 (X+1) =  0 +  1 X+  1 log odds(X) =  0 +  1 X Difference in log odds =  1 E.g., the log of the odds of getting CHD increases by for an increase of 1 mmHg of systolic blood pressure. (Hard to explain to a patient!)

THE COEFFICIENT  1 AND THE ODDS RATIO Difference in log odds given by  1 translates into the odds ratio (OR). exp(  1 ) = OR = ratio of odds at risk level of X+1 to the odds when risk level is X  1 = 0  OR = 1.

THE COEFFICIENT $ 1 AND THE ODDS RATIO For example, the odds of CHD are multiplied by the factor exp(0.0243) = for every increase of 1 mmHg in SBP. A difference of 10 mmHg multiplies the odds of CHD by (1.025) 10, or

ESTIMATION OF THE PARAMETERS Technique: Maximum likelihood estimation For large sample sizes, the normal distribution is used to put a confidence interval around the estimate of the coefficient  .

HYPOTHESIS TESTING Ho:  1 = 0 No difference in risk at different levels of the risk factor X. No association between risk factor X and probability of occurrence.

HYPOTHESIS TESTING Ha:  1 =/= 0 or  1 > 0 (risk increases with X) or  1 < 0 (risk goes down as X increases)

HYPOTHESIS TESTING Ho: OR = 1 Ha: OR =/= 1 or OR > 1 (risk increases with X) or OR < 1 (X is protective)

RESULTS OF LOGISTIC REGRESSION OR with confidence interval and p value indicate whether there is a significant association between level of the risk factor and chance of occurrence OR = (1.015, 1.034), p < 0.001

RESULTS OF LOGISTIC REGRESSION Can be used to predict an individual’s risk: prob. of CHD when SBP = 180: p/q = exp{ (180)} Solve for p: prob. of CHD = 0.125

MULTIVARIATE LOGISTIC REGRESSION Model with additional risk factors: p (X) log =  0 +  1 X +  2 X 1 - p (X) Log(odds of CHD) =   0 +  1 (SBP) +  2 (CHOL) +  3 (smoker)