Université d’Ottawa - Bio 4518 - Biostatistiques appliquées © Antoine Morin et Scott Findlay 2016-01-08 04:32 1 Logistic regression.

Slides:



Advertisements
Similar presentations
Logistic Regression Psy 524 Ainsworth.
Advertisements

Likelihood Ratio, Wald, and Lagrange Multiplier (Score) Tests
Chapter 10 Curve Fitting and Regression Analysis
Linear regression models
Logistic Regression Example: Horseshoe Crab Data
Nguyen Ngoc Anh Nguyen Ha Trang
1 Chapter 2 Simple Linear Regression Ray-Bing Chen Institute of Statistics National University of Kaohsiung.
Generalized Linear Models (GLM)
Logistic Regression Predicting Dichotomous Data. Predicting a Dichotomy Response variable has only two states: male/female, present/absent, yes/no, etc.
Maximum likelihood estimates What are they and why do we care? Relationship to AIC and other model selection criteria.
1 Chapter 3 Multiple Linear Regression Ray-Bing Chen Institute of Statistics National University of Kaohsiung.
458 Fitting models to data – II (The Basics of Maximum Likelihood Estimation) Fish 458, Lecture 9.
Generalised linear models
Lesson #32 Simple Linear Regression. Regression is used to model and/or predict a variable; called the dependent variable, Y; based on one or more independent.
Maximum Likelihood We have studied the OLS estimator. It only applies under certain assumptions In particular,  ~ N(0, 2 ) But what if the sampling distribution.
Log-linear and logistic models Generalised linear model ANOVA revisited Log-linear model: Poisson distribution logistic model: Binomial distribution Deviances.
Log-linear and logistic models
Chapter Topics Types of Regression Models
Nemours Biomedical Research Statistics April 23, 2009 Tim Bunnell, Ph.D. & Jobayer Hossain, Ph.D. Nemours Bioinformatics Core Facility.
Multiple Linear Regression
An Introduction to Logistic Regression
Correlation 1. Correlation - degree to which variables are associated or covary. (Changes in the value of one tends to be associated with changes in the.
Logistic Regression with “Grouped” Data Lobster Survival by Size in a Tethering Experiment Source: E.B. Wilkinson, J.H. Grabowski, G.D. Sherwood, P.O.
Relationships Among Variables
Review of Lecture Two Linear Regression Normal Equation
Unit 4b: Fitting the Logistic Model to Data © Andrew Ho, Harvard Graduate School of EducationUnit 4b – Slide 1
Regression Analysis Regression analysis is a statistical technique that is very useful for exploring the relationships between two or more variables (one.
Correlation and Linear Regression
Logistic Regression Pre-Challenger Relation Between Temperature and Field-Joint O-Ring Failure Dalal, Fowlkes, and Hoadley (1989). “Risk Analysis of the.
University of Ottawa - Bio 4118 – Applied Biostatistics © Antoine Morin and Scott Findlay 08/10/ :23 PM 1 Some basic statistical concepts, statistics.
Correlation and Regression SCATTER DIAGRAM The simplest method to assess relationship between two quantitative variables is to draw a scatter diagram.
Introduction to Linear Regression
Repeated Measures  The term repeated measures refers to data sets with multiple measurements of a response variable on the same experimental unit or subject.
April 6 Logistic Regression –Estimating probability based on logistic model –Testing differences among multiple groups –Assumptions for model.
Logistic regression. Analysis of proportion data We know how many times an event occurred, and how many times did not occur. We want to know if these.
AN INTRODUCTION TO LOGISTIC REGRESSION ENI SUMARMININGSIH, SSI, MM PROGRAM STUDI STATISTIKA JURUSAN MATEMATIKA UNIVERSITAS BRAWIJAYA.
When and why to use Logistic Regression?  The response variable has to be binary or ordinal.  Predictors can be continuous, discrete, or combinations.
April 4 Logistic Regression –Lee Chapter 9 –Cody and Smith 9:F.
Lesson Multiple Regression Models. Objectives Obtain the correlation matrix Use technology to find a multiple regression equation Interpret the.
Forecasting Choices. Types of Variable Variable Quantitative Qualitative Continuous Discrete (counting) Ordinal Nominal.
Regression. Types of Linear Regression Model Ordinary Least Square Model (OLS) –Minimize the residuals about the regression linear –Most commonly used.
© Department of Statistics 2012 STATS 330 Lecture 20: Slide 1 Stats 330: Lecture 20.
Maximum Likelihood Estimation Psych DeShon.
Applied Statistics Week 4 Exercise 3 Tick bites and suspicion of Borrelia Mihaela Frincu
Université d’Ottawa / University of Ottawa 2001 Bio 8100s Applied Multivariate Biostatistics L1a.1 Lecture 1a: Some basic statistical concepts l The use.
© Department of Statistics 2012 STATS 330 Lecture 22: Slide 1 Stats 330: Lecture 22.
LOGISTIC REGRESSION Binary dependent variable (pass-fail) Odds ratio: p/(1-p) eg. 1/9 means 1 time in 10 pass, 9 times fail Log-odds ratio: y = ln[p/(1-p)]
Statistics 2: generalized linear models. General linear model: Y ~ a + b 1 * x 1 + … + b n * x n + ε There are many cases when general linear models are.
Logistic Regression Saed Sayad 1www.ismartsoft.com.
Logistic Regression and Odds Ratios Psych DeShon.
Hypothesis Testing. Statistical Inference – dealing with parameter and model uncertainty  Confidence Intervals (credible intervals)  Hypothesis Tests.
Nonparametric Statistics
R Programming/ Binomial Models Shinichiro Suna. Binomial Models In binomial model, we have one outcome which is binary and a set of explanatory variables.
Université d’Ottawa / University of Ottawa 2001 Bio 4118 Applied Biostatistics L12.1 Lecture 12: Generalized Linear Models (GLM) What are they? When do.
University of Ottawa - Bio 4158 – Applied Biostatistics © Antoine Morin and Scott Findlay 06/07/2016 6:16 AM 1 Single classification analysis of variance.
LOGISTIC REGRESSION. Purpose  Logistical regression is regularly used when there are only two categories of the dependent variable and there is a mixture.
Bivariate Regression. Bivariate Regression analyzes the relationship between two variables. Bivariate Regression analyzes the relationship between two.
Nonparametric Statistics
Transforming the data Modified from:
The simple linear regression model and parameter estimation
Logistic regression.
A priori violations In the following cases, your data violates the normality and homoskedasticity assumption on a priori grounds: (1) count data  Poisson.
Generalized Linear Models
Nonparametric Statistics
SAME THING?.
Linear regression Fitting a straight line to observations.
Simple Linear Regression
Logistic Regression with “Grouped” Data
Maximum Likelihood We have studied the OLS estimator. It only applies under certain assumptions In particular,  ~ N(0, 2 ) But what if the sampling distribution.
Presentation transcript:

Université d’Ottawa - Bio Biostatistiques appliquées © Antoine Morin et Scott Findlay :32 1 Logistic regression

Université d’Ottawa - Bio Biostatistiques appliquées © Antoine Morin et Scott Findlay :32 2 Logistic regression Member of the GLM family Unlike standard linear regression, the dependent variable is binary (0,1), so that each cases’ value is either 0 or 1. Normally, 0 is taken to mean the absence of some attribute, 1 its presence. Logistic regression can be extended to the case where there are more than two possible values for the dependent variable (e.g. low, medium, high – multinomial regression)

Université d’Ottawa - Bio Biostatistiques appliquées © Antoine Morin et Scott Findlay :32 3 Example: incidence of heart attacks in relation to age Linear regression inappropriate because: Residuals not normal Residuals heteroscedastic Predicted values nonsense (e.g. what does a predicted value of 0.3 mean?)

Université d’Ottawa - Bio Biostatistiques appliquées © Antoine Morin et Scott Findlay :32 4 Logistic regression: dependent variable Variable of interest is the probability p of obtaining a a one as a function of predictor variables The magnitude of regression coefficients in the model depends on distribution of the predictor variables in the two groups Y= 0 and Y = 1, X Y X Y

Université d’Ottawa - Bio Biostatistiques appliquées © Antoine Morin et Scott Findlay :32 5 Dependent variable: logit (p) logit p

Université d’Ottawa - Bio Biostatistiques appliquées © Antoine Morin et Scott Findlay :32 6 Logistic regression: model coefficients Negative regression coefficient means probability of success decreases with increasing value of predictor. Positive regression coefficient means probability of success decreases with increasing value of predictor. X Y X Y  > 0  < 0

Université d’Ottawa - Bio Biostatistiques appliquées © Antoine Morin et Scott Findlay :32 7 Logistic regression: model coefficients The magnitude of the regression coefficient depends on how abruptly p changes with X, with large values indicating abrupt change. X Y 1 0  > 0, small X Y 1 0  > 0, large

Université d’Ottawa - Bio Biostatistiques appliquées © Antoine Morin et Scott Findlay :32 8 Least squares estimation (LSE) An ordinary least squares (OLS) estimate of a model parameter  is that which minimizes the sum of squared differences between observed and predicted values: Predicted values are derived from some model whose parameters we wish to estimate OLS  SS R

Université d’Ottawa - Bio Biostatistiques appliquées © Antoine Morin et Scott Findlay :32 9 Maximum likelihood estimation (MLE) A maximum likelihood estimate (MLE) of a model parameter  for a given distribution is that which maximizes the probability of generating the observed sample data. MLEs are obtained by maximizing the loss function …or equivalently, by minimizing the negative log likelihood function MLE  L or - log L - log L L

Université d’Ottawa - Bio Biostatistiques appliquées © Antoine Morin et Scott Findlay :32 10 How are the model parameters estimated? Estimated not by least squares, but rather by Maximum Likelihood –Based on an estimate of the likelihood of obtaining the observed results based on different values of the model parameters –In principle, parameter estimates should converge to those maximizing log-likelihood or minimizing - LogL

Université d’Ottawa - Bio Biostatistiques appliquées © Antoine Morin et Scott Findlay :32 11 Hypothesis testing Likelihood –Deviance=-2L –Is apprioximately distributed as chi-square –Measures the variation unexplained by the fitted model, analagous to residual sums of squares. Model comparison –Change in deviance when model terms are added (or deleted) is also approximately distributed as chi-square, so can test hypotheses relating to individual model terms.

Université d’Ottawa - Bio Biostatistiques appliquées © Antoine Morin et Scott Findlay :32 12 Model assumptions Observations are independent Dependent variable has a binomial distribution Little error in measurement of dependent variables.

Université d’Ottawa - Bio Biostatistiques appliquées © Antoine Morin et Scott Findlay :32 13 Logistic regression in SPlus *** Generalized Linear Model *** Call: glm(formula = cardiaque ~ age, family = binomial(link = logit), data = SDF12, na.action = na.exclude, control = list(epsilon = , maxit = 50, trace = F)) Deviance Residuals: Min 1Q Median 3Q Max Coefficients: Value Std. Error t value (Intercept) age (Dispersion Parameter for Binomial family taken to be 1 ) Null Deviance: on 1999 degrees of freedom Residual Deviance: on 1998 degrees of freedom Number of Fisher Scoring Iterations: 4

Université d’Ottawa - Bio Biostatistiques appliquées © Antoine Morin et Scott Findlay :32 14 Incidence of heart attack in relation to age

Université d’Ottawa - Bio Biostatistiques appliquées © Antoine Morin et Scott Findlay :32 15 Presence of post-operative kyphosis using logistic regression Kyphosis: a binary variable indicating the presence/absence of a postoperative spinal deformity called Kyphosis. Age: the age of the child in months. Number: the number of vertebrae involved in the spinal operation. Start: the beginning of the range of the vertebrae involved in the operation

Université d’Ottawa - Bio Biostatistiques appliquées © Antoine Morin et Scott Findlay :32 16 Evidence that the distribution of predictor variables differs among levels of response variable

Université d’Ottawa - Bio Biostatistiques appliquées © Antoine Morin et Scott Findlay :32 17 The model

Université d’Ottawa - Bio Biostatistiques appliquées © Antoine Morin et Scott Findlay :32 18 Testing hypotheses