Introduction to logistic regression and Generalized Linear Models July 14, 2011 Introduction to Statistical Measurement and Modeling Karen Bandeen-Roche,

Slides:



Advertisements
Similar presentations
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Advertisements

Prof. Navneet Goyal CS & IS BITS, Pilani
Brief introduction on Logistic Regression
Lecture 4, part 1: Linear Regression Analysis: Two Advanced Topics
Logistic Regression Psy 524 Ainsworth.
Logistic Regression I Outline Introduction to maximum likelihood estimation (MLE) Introduction to Generalized Linear Models The simplest logistic regression.
Uncertainty and confidence intervals Statistical estimation methods, Finse Friday , 12.45–14.05 Andreas Lindén.
Logistic Regression.
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
11 Simple Linear Regression and Correlation CHAPTER OUTLINE
Regression Analysis Once a linear relationship is defined, the independent variable can be used to forecast the dependent variable. Y ^ = bo + bX bo is.
Prediction, Correlation, and Lack of Fit in Regression (§11. 4, 11
1 Chapter 2 Simple Linear Regression Ray-Bing Chen Institute of Statistics National University of Kaohsiung.
Chapter 13 Multiple Regression
Maximum likelihood estimates What are they and why do we care? Relationship to AIC and other model selection criteria.
Chapter 12 Multiple Regression
Generalised linear models
Introduction to Logistic Regression. Simple linear regression Table 1 Age and systolic blood pressure (SBP) among 33 adult women.
Statistical Inference and Regression Analysis: GB Professor William Greene Stern School of Business IOMS Department Department of Economics.
Log-linear and logistic models Generalised linear model ANOVA revisited Log-linear model: Poisson distribution logistic model: Binomial distribution Deviances.
Log-linear and logistic models
Generalised linear models Generalised linear model Exponential family Example: logistic model - Binomial distribution Deviances R commands for generalised.
Nemours Biomedical Research Statistics April 23, 2009 Tim Bunnell, Ph.D. & Jobayer Hossain, Ph.D. Nemours Bioinformatics Core Facility.
Generalised linear models Generalised linear model Exponential family Example: Log-linear model - Poisson distribution Example: logistic model- Binomial.
An Introduction to Logistic Regression
Maximum likelihood (ML)
Simple Linear Regression and Correlation
Regression Model Building Setting: Possibly a large set of predictor variables (including interactions). Goal: Fit a parsimonious model that explains variation.
Generalized Linear Models
Logistic regression for binary response variables.
Correlation & Regression
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 12: Multiple and Logistic Regression Marshall University.
Review of Lecture Two Linear Regression Normal Equation
Unit 4b: Fitting the Logistic Model to Data © Andrew Ho, Harvard Graduate School of EducationUnit 4b – Slide 1
MODELS OF QUALITATIVE CHOICE by Bambang Juanda.  Models in which the dependent variable involves two ore more qualitative choices.  Valuable for the.
Inference for regression - Simple linear regression
Simple Linear Regression
Chapter 3: Generalized Linear Models 3.1 The Generalization 3.2 Logistic Regression Revisited 3.3 Poisson Regression 1.
The Triangle of Statistical Inference: Likelihoood
Lecture 3: Inference in Simple Linear Regression BMTRY 701 Biostatistical Methods II.
Excepted from HSRP 734: Advanced Statistical Methods June 5, 2008.
April 6 Logistic Regression –Estimating probability based on logistic model –Testing differences among multiple groups –Assumptions for model.
Generalized Linear Models All the regression models treated so far have common structure. This structure can be split up into two parts: The random part:
Multiple Regression and Model Building Chapter 15 Copyright © 2014 by The McGraw-Hill Companies, Inc. All rights reserved.McGraw-Hill/Irwin.
When and why to use Logistic Regression?  The response variable has to be binary or ordinal.  Predictors can be continuous, discrete, or combinations.
Lecture 8 Simple Linear Regression (cont.). Section Objectives: Statistical model for linear regression Data for simple linear regression Estimation.
Inference for Regression Simple Linear Regression IPS Chapter 10.1 © 2009 W.H. Freeman and Company.
April 4 Logistic Regression –Lee Chapter 9 –Cody and Smith 9:F.
1 GLM I: Introduction to Generalized Linear Models By Curtis Gary Dean Distinguished Professor of Actuarial Science Ball State University By Curtis Gary.
1 11 Simple Linear Regression and Correlation 11-1 Empirical Models 11-2 Simple Linear Regression 11-3 Properties of the Least Squares Estimators 11-4.
© Department of Statistics 2012 STATS 330 Lecture 20: Slide 1 Stats 330: Lecture 20.
Lecture 4: Statistics Review II Date: 9/5/02  Hypothesis tests: power  Estimation: likelihood, moment estimation, least square  Statistical properties.
Multiple Logistic Regression STAT E-150 Statistical Methods.
Logistic Regression Analysis Gerrit Rooks
Logistic regression (when you have a binary response variable)
ALISON BOWLING MAXIMUM LIKELIHOOD. GENERAL LINEAR MODEL.
1 Introduction to Modeling Beyond the Basics (Chapter 7)
1 Fighting for fame, scrambling for fortune, where is the end? Great wealth and glorious honor, no more than a night dream. Lasting pleasure, worry-free.
Roger B. Hammer Assistant Professor Department of Sociology Oregon State University Conducting Social Research Logistic Regression Categorical Data Analysis.
Logistic Regression and Odds Ratios Psych DeShon.
LOGISTIC REGRESSION. Purpose  Logistical regression is regularly used when there are only two categories of the dependent variable and there is a mixture.
BINARY LOGISTIC REGRESSION
Notes on Logistic Regression
Generalized Linear Models
Introduction to logistic regression a.k.a. Varbrul
Jeffrey E. Korte, PhD BMTRY 747: Foundations of Epidemiology II
EC 331 The Theory of and applications of Maximum Likelihood Method
Chapter 6 Logistic Regression: Regression with a Binary Dependent Variable Copyright © 2010 Pearson Education, Inc., publishing as Prentice-Hall.
Logistic Regression.
Introductory Statistics
Presentation transcript:

Introduction to logistic regression and Generalized Linear Models July 14, 2011 Introduction to Statistical Measurement and Modeling Karen Bandeen-Roche, PhD Department of Biostatistics Johns Hopkins University

Data motivation  Osteoporosis data  Scientific question : Can we detect osteoporosis earlier and more safely?  Some related statistical questions :  How does the risk of osteoporosis vary as a function of measures commonly used to screen for osteoporosis?  Does age confound the relationship of screening measures with osteoporosis risk?  Do ultrasound and DPA measurements discriminate osteoporosis risk independently of each other?

Outline  Why we need to generalize linear models  Generalized Linear Model specification  Systematic, random model components  Maximum likelihood estimation  Logistic regression as a special case of GLM  Systematic model / interpretation  Inference  Example

Regression for categorical outcomes  Why not just apply linear regression to categorical Y’s?  Linear model (A1) will often be unreasonable.  Assumption of equal variances (A3) will nearly always be unreasonable.  Assumption of normality will never be reasonable

Introduction: Regression for binary outcomes  Y i = 1{event occurs for sampling unit i} = 1 if the event occurs = 0 otherwise.  p i = probability that the event occurs for sampling unit i := Pr{Y i = 1}  Begin by generalizing random model (A5):  Probability mass function: Bernoulli Pr{Y i = 1} = p i ; Pr{Y i = 0} = 1-p i all other y i occur with 0 probability

Binary regression  By assuming Bernoulli: (A3) is definitely not reasonable  Var(Y i ) = p i (1-p i )  Variance is not constant: rather a function of the mean  Systematic model  Goal remains to describe E[Y i |x i ]  Expectation of Bernoulli Y i = p i  To achieve a reasonable linear model (A1): describe some function of E[Y i |x i ] as a linear function of covariates  g(E[Y i |x i ]) = x i ’ β  Some common g: log, log{p/(1-p)}, probit

General framework: Generalized Linear Models  Random model  Y~a density or mass function, f Y, not necessarily normal  Technical aside: f Y within the “exponential family”  Systematic model  g(E[Y i |x i ]) = x i ’ β = η i  “g” = “link function”; “x i ’ β ” = “linear predictor”  Reference: Nelder JA, Wedderburn RWM, Generalized linear models, JRSSA 1972; 135:

Types of Generalized Linear Models Model (link function) ResponseDistributionRegression Coef Interp Linear ContinuousGaussianChange in ave(Y) per unit change in X Logistic BinaryBinomialLog odds ratio Log-linear Times to events/counts PoissonLog relative rate Proportional hazards Times to events Semi- parametric Log hazard

Estimation  Estimation: maximizes L( β,a;y,X) =  General method: Maximum likelihood (Fisher)  Given {Y 1,...,Y n } distributed with joint density or mass function f Y (y; θ ), a likelihood function L( θ ;y) is any function (of θ ) that is proportional to f Y (y; θ ).  If sampling is random, {Y 1,...,Y n } are statistically independent, and L( θ ;y) α product of individual f.

Maximum likelihood  The maximum likelihood estimate (MLE),, maximizes L( θ ;y):  Under broad assumptions MLEs are asymptotically  Unbiased (consistent)  Efficient (most precise / lowest variance)

Logistic regression  Y i binary with p i = Pr{Y i = 1}  Example: Y i = 1{person i diagnosed with heart disease}  Simple logistic regression (1 covariate)  Random Model: Bernoulli / Binomial  Systematic Model: log{p i /(1- p i )}= β 0 + β 1 x i  log odds; logit(p i )  Parameter interpretation  β 0 = log(heart disease odds) in subpopulation with x=0  β 1 = log{p x+1 /(1-p x+1 )}- log{p x /(1-p x )}

Logistic regression Interpretation notes  β 1 = log{p x+1 /(1-p x+1 )}- log{p x /(1-p x )} =  exp( β 1 ) = = odds ratio for association of prevalent heart disease with each (say) one year increment in age = factor by which odds of heart disease increases / decreases with each 1-year cohort of age

Multiple logistic regression  Systematic Model: log{p i /(1- p i )}= β 0 + β 1 x i1 + … + β p x ip  Parameter interpretation  β 0 = log(heart disease odds) in subpopulation with all x =0  β j = difference in log outcome odds comparing subpopulations who differ by 1 on x j, and whose values on all other covariates are the same  “Adjusting for,” “Controlling for” the other covariates  One can define variables contrasting outcome odds differences between groups, nonlinear relationships, interactions, etc., just as in linear regression

Logistic regression - prediction  Translation from η i to p i  log{p i /(1- p i )}= β 0 + β 1 x i1 + … + β p x ip  Then = logistic function of η i  Graph of p i versus η i has a sigmoid shape

GLMs - Inference  The negative inverse Hessian matrix of the log likelihood function characterizes Var( ) (adjunct)  SE( ) obtained as square root of the jth diagonal entry  Typically, substituting for β  “Wald” inference applies the paradigm from Lecture 2  Z = is asympotically ~ N(0,1) under H 0 : β j = β 0j  Z provides a test statistic for H 0 : β j = β 0j versus H A : β j ≠ β 0j  ± z (1- α /2) SE{ } =(L,U) is a (1- α )x100% CI for β j  {exp(L),exp(U)} is a (1- α )x100% CI for exp( β j )

GLMs: “Global” Inference  Analog: F-testing in linear regression  The only difference: log likelihoods replace SS  Hypothesis to be tested is H 0 : β j1 =...= β jk = 0  Fit model excluding x j1,...,x jp j : Save -2 log likelihood = L s  Fit “full” (or larger) model adding x j1,...,x jp j to smaller model. Save -2 log likelihood = L L  Test statistic S = L s - L L  Distribution under null hypothesis: χ 2 p j  Define rejection region based on this distribution  Compute S  Reject or not as S is in rejection region or not

GLMs: “Global” Inference  Many programs refer to “deviance” rather than -2 log likelihood  This quantity equals the difference in -2 log likelihoods between ones fitted model and a “saturated model”  Deviance measures “fit”  Differences in deviances can be substituted for differences in -2 log likelihood in the method given on the previous page  Likelihood ratio tests have appealing optimality properties

Outline: A few more topics  Model checking: Residuals, influence points  ML can be written as an iteratively reweighted least squares algorithm  Predictive accuracy  Framework generalizes easily

Main Points  Generalized linear modeling provides a flexible regression framework for a variety of response types  Continuous, categorical measurement scales  Probability distributions tailored to the outcome  Systematic model to accommodate  Measurement range, interpretation  Logistic regression  Binary responses (yes, no)  Bernoulli / binomial distribution  Regression coefficients as log odds ratios for association between predictors and outcomes

Main Points  Generalized linear modeling accommodates description, inference, adjustment with the same flexibility as linear modeling  Inference  “Wald” - statistical tests and confidence intervals via parameter estimator standardization  “Likelihood ratio” / “global” – via comparison of log likelihoods from nested models