Simple Logistic Regression

Slides:



Advertisements
Similar presentations
Logistic Regression.
Advertisements

Basic Statistics The Chi Square Test of Independence.
1 Contingency Tables: Tests for independence and homogeneity (§10.5) How to test hypotheses of independence (association) and homogeneity (similarity)
Logistic Regression Example: Horseshoe Crab Data
Multiple Linear Regression
Multiple Logistic Regression RSQUARE, LACKFIT, SELECTION, and interactions.
April 25 Exam April 27 (bring calculator with exp) Cox-Regression
Logistic Regression Multivariate Analysis. What is a log and an exponent? Log is the power to which a base of 10 must be raised to produce a given number.
ChiSq Tests: 1 Chi-Square Tests of Association and Homogeneity.
PH6415 Review Questions. 2 Question 1 A journal article reports a 95%CI for the relative risk (RR) of an event (treatment versus control as (0.55, 0.97).
Plots, Correlations, and Regression Getting a feel for the data using plots, then analyzing the data with correlations and linear regression.
Chapter 11 Survival Analysis Part 2. 2 Survival Analysis and Regression Combine lots of information Combine lots of information Look at several variables.
EPI 809/Spring Multiple Logistic Regression.
Two-Way ANOVA in SAS Multiple regression with two or
Nemours Biomedical Research Statistics April 23, 2009 Tim Bunnell, Ph.D. & Jobayer Hossain, Ph.D. Nemours Bioinformatics Core Facility.
Logistic Regression Biostatistics 510 March 15, 2007 Vanessa Perez.
More Linear Regression Outliers, Influential Points, and Confidence Interval Construction.
Notes on Logistic Regression STAT 4330/8330. Introduction Previously, you learned about odds ratios (OR’s). We now transition and begin discussion of.
STAT E-150 Statistical Methods
1 Chapter 20 Two Categorical Variables: The Chi-Square Test.
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 12: Multiple and Logistic Regression Marshall University.
Logistic Regression II Simple 2x2 Table (courtesy Hosmer and Lemeshow) Exposure=1Exposure=0 Disease = 1 Disease = 0.
The Chi-Square Test Used when both outcome and exposure variables are binary (dichotomous) or even multichotomous Allows the researcher to calculate a.
Categorical Data Prof. Andy Field.
Chapter 13: Inference in Regression
Introduction to SAS Essentials Mastering SAS for Data Analytics
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 14 Comparing Groups: Analysis of Variance Methods Section 14.2 Estimating Differences.
April 11 Logistic Regression –Modeling interactions –Analysis of case-control studies –Data presentation.
EIPB 698E Lecture 10 Raul Cruz-Cano Fall Comments for future evaluations Include only output used for conclusions Mention p-values explicitly (also.
Chi-square Test of Independence Steps in Testing Chi-square Test of Independence Hypotheses.
Statistics and Quantitative Analysis U4320 Segment 12: Extension of Multiple Regression Analysis Prof. Sharyn O’Halloran.
April 6 Logistic Regression –Estimating probability based on logistic model –Testing differences among multiple groups –Assumptions for model.
Copyright © 2009 Pearson Education, Inc LEARNING GOAL Interpret and carry out hypothesis tests for independence of variables with data organized.
AN INTRODUCTION TO LOGISTIC REGRESSION ENI SUMARMININGSIH, SSI, MM PROGRAM STUDI STATISTIKA JURUSAN MATEMATIKA UNIVERSITAS BRAWIJAYA.
Multiple Regression and Model Building Chapter 15 Copyright © 2014 by The McGraw-Hill Companies, Inc. All rights reserved.McGraw-Hill/Irwin.
Lecture 8 Simple Linear Regression (cont.). Section Objectives: Statistical model for linear regression Data for simple linear regression Estimation.
April 4 Logistic Regression –Lee Chapter 9 –Cody and Smith 9:F.
CHAPTER 11 SECTION 2 Inference for Relationships.
1 Topic 2 LOGIT analysis of contingency tables. 2 Contingency table a cross classification Table containing two or more variables of classification, and.
Section 9-1: Inference for Slope and Correlation Section 9-3: Confidence and Prediction Intervals Visit the Maths Study Centre.
FPP 28 Chi-square test. More types of inference for nominal variables Nominal data is categorical with more than two categories Compare observed frequencies.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 13 Multiple Regression Section 13.3 Using Multiple Regression to Make Inferences.
BPS - 5th Ed. Chapter 221 Two Categorical Variables: The Chi-Square Test.
11/16/2015Slide 1 We will use a two-sample test of proportions to test whether or not there are group differences in the proportions of cases that have.
1 STA 617 – Chp10 Models for matched pairs Summary  Describing categorical random variable – chapter 1  Poisson for count data  Binomial for binary.
Multiple Logistic Regression STAT E-150 Statistical Methods.
1 G Lect 7a G Lecture 7a Comparing proportions from independent samples Analysis of matched samples Small samples and 2  2 Tables Strength.
More Contingency Tables & Paired Categorical Data Lecture 8.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 10 Comparing Two Groups Section 10.1 Categorical Response: Comparing Two Proportions.
Applied Epidemiologic Analysis - P8400 Fall 2002 Labs 6 & 7 Case-Control Analysis ----Logistic Regression Henian Chen, M.D., Ph.D.
Chapter 13- Inference For Tables: Chi-square Procedures Section Test for goodness of fit Section Inference for Two-Way tables Presented By:
1 Say good things, think good thoughts, and do good deeds.
Applied Epidemiologic Analysis - P8400 Fall 2002 Labs 6 & 7 Case-Control Analysis ----Logistic Regression Henian Chen, M.D., Ph.D.
Sullivan – Fundamentals of Statistics – 2 nd Edition – Chapter 11 Section 3 – Slide 1 of 27 Chapter 11 Section 3 Inference about Two Population Proportions.
Analysis of matched data Analysis of matched data.
Class Seven Turn In: Chapter 18: 32, 34, 36 Chapter 19: 26, 34, 44 Quiz 3 For Class Eight: Chapter 20: 18, 20, 24 Chapter 22: 34, 36 Read Chapters 23 &
Copyright © 2009 Pearson Education, Inc LEARNING GOAL Interpret and carry out hypothesis tests for independence of variables with data organized.
 Check the Random, Large Sample Size and Independent conditions before performing a chi-square test  Use a chi-square test for homogeneity to determine.
Other tests of significance. Independent variables: continuous Dependent variable: continuous Correlation: Relationship between variables Regression:
BINARY LOGISTIC REGRESSION
Advanced Quantitative Techniques
Notes on Logistic Regression
Statistics Sweden September 2004 Dan Hedlin
Hypothesis Testing Review
Multiple logistic regression
Categorical Data Analysis Review for Final
Lexico-grammar: From simple counts to complex models
Statistics Sweden September 2004 Dan Hedlin
Logistic Regression.
Presentation transcript:

Simple Logistic Regression An introduction to PROC FREQ and PROC LOGISTIC

Introduction to Logistic Regression Logistic Regression is used when the outcome variable of interest is categorical, rather than continuous. Examples include: death vs. no death, recovery vs. no recovery, obese vs. not obese, etc. All of the examples you will see in this class have binary outcomes, meaning there are only two possible outcomes. Simple Logistic Regression has only one predictor variable. You may already be familiar with this type of regression under a different name: odds ratio.

Simple Logistic Regression: An example Imagine you are interested in investigating whether there is a relationship between race and party identification. Race (Black or White) is the independent variable, and Party Identification (Democrat or Republican) is the dependent variable. Consider the following table: Example from Agresti, A. Categorical Data Analysis, 2nd ed. 2002.

Race x Party Identification Democrat Republican Black 103 11 White 341 405

The odds of being a Democrat for Black vs. White is: OR(odds ratio) = (103/11)/(341/405) = (103x405)/(341x11) = 11.12 Blacks have a 11.12 times greater odds of being a Democrat than Whites. The odds of being a Republican for Black vs. White is: (11/103)/(405/341) = (11x341)/(405x103) = 0.09 Blacks have a 91% (1-0.09) lower odds of being a Republican than Whites.

Odds Ratios in SAS Copy the following code into SAS:

Odds Ratios with PROC FREQ There are two ways to get Odds Ratios in SAS when there is one predictor and one outcome variable. The first is with PROC FREQ. Type the following code into SAS:

Notes about the SAS code: weight is a term in SAS which weighs whatever variable you specify. When you have a table you want to enter into SAS, it is often easier to use a “count” variable rather than list each subject individually. Because the data set has 860 observations, we would have to type out 860 separate datalines if we did not use the “count” variable and “weight count” option.

TABLES tells SAS to construct a table with the two specified variables (in this case, race and party). The chisq option requests all Chi-Square statistics. The relrisk option gives you estimates of the odds ratio and relative risks for the two columns.

Output from PROC FREQ

Reading the Table Each cell has four numbers: count, percent, row %, and column % There are 103 Black Democrats, which is 11.98% of the total sample. 90.35% of Blacks are Democrats. 20.32% of Democrats are Black. Compare this to 2.64% of Republicans who are Black.

Interpreting Chi-Square Statistic The Chi-Square (Χ2) test statistic tests the null hypothesis that two variables are independent versus the alternative, that they are not independent (that is, related). Ho: race and party identification are independent Ha: race and party identification are associated Χ2 = 78.9082, pvalue < 0.0001. Reject Ho. Conclude that race and party identification are associated.

Output of Odds Ratio

Interpreting the Odds Ratio You can find the OR in the SAS output under “Case-Control (Odds Ratio).” The odds ratio is 11.12 with a 95% Confidence Interval of [5.87, 21.05]. Because this C.I. does not contain 0, we know that the OR is statistically significant. Blacks have a 11.12 times greater odds of being Democratic than Whites.

A note about the PROC FREQ table: Dem Rep Black 103 11 White 341 405 Notice the way the table is set up in SAS: When calculating the OR in PROC FREQ, SAS will alphabetize the table, and this affects the OR it will calculate. SAS is calculating the odds of being a Democrat for Blacks versus Whites (or the odds of being Black for Democrats versus Republicans). If you wanted the odds of being Democratic for Whites versus Blacks, you would have to either calculate this by hand or use PROC LOGISTIC.

Odds Ratio with PROC LOGISTIC To simplify our data set, we will change our variables to have values of 1 and 0, rather than B/W and D/R. If someone is Black, s/he will have a value of “1” for the variable “race2.” Whites will have a value of “0.” If someone is a Democrat, s/he will have a value of “1” for “party2.” Republicans will have a value of “0.” Type the following code into SAS, which creates a new data set called “partyid2”:

PROC LOGISTIC Once you have created the new data set, do regression analysis on the data, using PROC LOGISTIC (notice the format is similar to that of linear regression, with the model statement y = x): “Descending” tells SAS to model the probability that “party2” = 1 (Democratic). If you did not include the descending statement, SAS would model the probability that “party2” = 0 (Republican). All subsequent interpretations will be in terms of the odds of being Democratic, not Republican.

PROC LOGISTIC Output

Interpreting the Output From PROC LOGISITC, we now have an equation for our log(odds): Log(odds) = β0 + β1x Log(odds) = -0.1720 + 2.4088x where x = 1 if the person is Black and x = 0 if the person is White.

Calculating the Odds Ratio Suppose we wanted to know the odds of being a Democrat for Blacks vs. Whites. The log(odds) of being Democratic for Blacks is: β0 + β1(1) = β0 + β1 The log(odds) of being Democratic for Whites is: β0 + β1(0) = β0. To calculate the OR, take the log(odds) for Blacks minus the log(odds) for Whites: β0 + β1 – (β0) = β1 Then exponentiate this value: exp(β1) = exp(2.4088) = 11.12 This is the same OR calculated earlier using PROC FREQ. In addition, it is given to you in the PROC LOGISTIC output under “Odds Ratio Estimates” with the 95% C.I.

Calculating the OR, cont. Suppose we wanted to know the odds of being a Democrat for Whites vs. Blacks. To calculate the OR, take the log(odds) for Whites minus the log(odds) for Blacks: β0 – (β0 + β1) = -β1 Then exponentiate this value: exp(-β1) = exp(-2.4088) = 0.0899 Whites have a 91% (1-.0899) decreased odds of being Democratic than Blacks.

Significance Testing Testing the significance of a parameter estimate can be done by constructing a confidence interval around that parameter estimate. If the C.I. for an estimate (or log(OR)) contains 0, the variable is not significantly associated with the outcome. If the C.I. for an OR contains 1, the variable is not significantly associated with the outcome.

The Wald Chi-Square statistic tests whether the parameter estimate equals zero, that is Ho: β1 = 0 vs. Ha: β1 ≠ 0. From the output, we see that the pvalue of this test < 0.0001, so we reject Ho and conclude that race is significantly related to party identification.

Confidence Interval Construction Confidence interval construction is similar to what you have seen for linear regression, except that it is now on the natural log scale: 95% C.I. for β1 = β1 +/- 1.96*se(β1) = 2.4088 +/- 1.96*(0.3256) = [1.77,3.05]. This C.I. does not contain 0. exp [1.77,3.05] = [5.875, 21.052] This C.I. does not contain 1. Notice that [5.875, 21.052] is also the 95% C.I. for the OR given in the SAS output.

Calculating the Probability If you were asked to calculate the probability that someone is a Democrat, given that he is Black, you would use the following formula: Π(probability) = exp(log(odds))/[1+ exp(log(odds))] Π = exp(-0.1720+2.4088)/[1+ exp(-0.1720+2.4088)] = 0.9035 A Black person has a 90.35% chance of being a Democrat.

Summary This has been an introduction to calculating odds ratios in PROC FREQ and PROC LOGISTIC. The next section will introduce you to multiple predictors in logistic regression, including interactions.