Analyzing dichotomous dummy variables

Slides:



Advertisements
Similar presentations
Dummy Dependent variable Models
Advertisements

Sociology 680 Multivariate Analysis Logistic Regression.
© Department of Statistics 2012 STATS 330 Lecture 32: Slide 1 Stats 330: Lecture 32.
Brief introduction on Logistic Regression
Logistic Regression Psy 524 Ainsworth.
Binary Logistic Regression: One Dichotomous Independent Variable
Quantitative Methods Analyzing dichotomous dummy variables.
Logistic Regression.
Regression With Categorical Variables. Overview Regression with Categorical Predictors Logistic Regression.
Models with Discrete Dependent Variables
Logistic Regression Multivariate Analysis. What is a log and an exponent? Log is the power to which a base of 10 must be raised to produce a given number.
Chapter 8 Logistic Regression 1. Introduction Logistic regression extends the ideas of linear regression to the situation where the dependent variable,
Log-linear and logistic models Generalised linear model ANOVA revisited Log-linear model: Poisson distribution logistic model: Binomial distribution Deviances.
Log-linear and logistic models
Logistic Regression Biostatistics 510 March 15, 2007 Vanessa Perez.
Topic 3: Regression.
Notes on Logistic Regression STAT 4330/8330. Introduction Previously, you learned about odds ratios (OR’s). We now transition and begin discussion of.
An Introduction to Logistic Regression
Generalized Linear Models
1 Regression Models with Binary Response Regression: “Regression is a process in which we estimate one variable on the basis of one or more other variables.”
ALISON BOWLING THE GENERAL LINEAR MODEL. ALTERNATIVE EXPRESSION OF THE MODEL.
Excepted from HSRP 734: Advanced Statistical Methods June 5, 2008.
Logistic Regression STA2101/442 F 2014 See last slide for copyright information.
April 6 Logistic Regression –Estimating probability based on logistic model –Testing differences among multiple groups –Assumptions for model.
9-1 MGMG 522 : Session #9 Binary Regression (Ch. 13)
Generalized Linear Models All the regression models treated so far have common structure. This structure can be split up into two parts: The random part:
University of Warwick, Department of Sociology, 2014/15 SO 201: SSAASS (Surveys and Statistics) (Richard Lampard) Week 7 Logistic Regression I.
AN INTRODUCTION TO LOGISTIC REGRESSION ENI SUMARMININGSIH, SSI, MM PROGRAM STUDI STATISTIKA JURUSAN MATEMATIKA UNIVERSITAS BRAWIJAYA.
When and why to use Logistic Regression?  The response variable has to be binary or ordinal.  Predictors can be continuous, discrete, or combinations.
Linear vs. Logistic Regression Log has a slightly better ability to represent the data Dichotomous Prefer Don’t Prefer Linear vs. Logistic Regression.
April 4 Logistic Regression –Lee Chapter 9 –Cody and Smith 9:F.
Week 5: Logistic regression analysis Overview Questions from last week What is logistic regression analysis? The mathematical model Interpreting the β.
Regression. Types of Linear Regression Model Ordinary Least Square Model (OLS) –Minimize the residuals about the regression linear –Most commonly used.
Chapter 13: Limited Dependent Vars. Zongyi ZHANG College of Economics and Business Administration.
Logistic Regression. Linear Regression Purchases vs. Income.
Warsaw Summer School 2015, OSU Study Abroad Program Advanced Topics: Interaction Logistic Regression.
Multiple Regression  Similar to simple regression, but with more than one independent variable R 2 has same interpretation R 2 has same interpretation.
Logistic Regression. Linear regression – numerical response Logistic regression – binary categorical response eg. has the disease, or unaffected by the.
Dates Presentations Wed / Fri Ex. 4, logistic regression, Monday Dec 7 th Final Tues. Dec 8 th, 3:30.
CSE 5331/7331 F'07© Prentice Hall1 CSE 5331/7331 Fall 2007 Regression Margaret H. Dunham Department of Computer Science and Engineering Southern Methodist.
Roger B. Hammer Assistant Professor Department of Sociology Oregon State University Conducting Social Research Logistic Regression Categorical Data Analysis.
Logistic Regression and Odds Ratios Psych DeShon.
Nonparametric Statistics
Instructor: R. Makoto 1richard makoto UZ Econ313 Lecture notes.
LOGISTIC REGRESSION. Purpose  Logistical regression is regularly used when there are only two categories of the dependent variable and there is a mixture.
Multiple Regression Analysis Bernhard Kittel Center for Social Science Methodology University of Oldenburg.
Logistic Regression: Regression with a Binary Dependent Variable.
Nonparametric Statistics
BINARY LOGISTIC REGRESSION
Chapter 7. Classification and Prediction
Logistic Regression When and why do we use logistic regression?
A priori violations In the following cases, your data violates the normality and homoskedasticity assumption on a priori grounds: (1) count data  Poisson.
Logistic Regression APKC – STATS AFAC (2016).
Notes on Logistic Regression
Chapter 13 Nonlinear and Multiple Regression
ENME 392 Regression Theory
Drop-in Sessions! When: Hillary Term - Week 1 Where: Q-Step Lab (TBC) Sign up with Alice Evans.
Generalized Linear Models
Multiple logistic regression
Jeffrey E. Korte, PhD BMTRY 747: Foundations of Epidemiology II
Nonparametric Statistics
Probability & Statistics Probability Theory Mathematical Probability Models Event Relationships Distributions of Random Variables Continuous Random.
DCAL Stats Workshop Bodo Winter.
Introduction to Logistic Regression
Modeling with Dichotomous Dependent Variables
Regression Assumptions
Chapter 6 Logistic Regression: Regression with a Binary Dependent Variable Copyright © 2010 Pearson Education, Inc., publishing as Prentice-Hall.
Logistic Regression.
Regression Assumptions
Presentation transcript:

Analyzing dichotomous dummy variables Quantitative Methods Analyzing dichotomous dummy variables

Logistic Regression Analysis Like ordinary regression and ANOVA, logistic regression is part of a category of models called generalized linear models. Generalized linear models were developed to unify various statistical models (linear regression, logistic regression, poisson regression). We can think of maximum likelihood as a general algorithm to estimate all these models.

Logistic Regression—when? Logistic regression models are appropriate for dependent variables coded 0/1. We only observe “0” and “1” for the dependent variable—but we think of the dependent variable conceptually as a probability that “1” will occur.

Logistic Regression--examples Some examples Vote for Obama (yes, no) Turned out to vote (yes, no) Sought medical assistance in last year (yes, no)

Logistic Regression—why not OLS? Why can’t we use OLS? After all, linear regression is so straightforward, and (unlike other models) actually has a “closed form solution” for the estimates.

Logistic Regression—why not OLS? Three problems with using OLS. First, what is our dependent variable, conceptually? It is the probability of y=1. But we only observe y=0 and y=1. If we use OLS, we’ll get predicted values that fall between 0 and 1—which is what we want— but we’ll also get predicted values that are greater than 1, or less than 0. That makes no sense.

Logistic Regression—Why not OLS? Three problems using OLS. Second problem—there is heteroskedasticity in the model. Think about the meaning of “residual”. The residual is the difference between the observed and the predicted Y. By definition, what will that residual look like at the center of the distribution? By definition, what will that residual look like at the tails of the distribution?

Logistic Regression—why not OLS? Three problems using OLS. The third problem is substantive. The reality is that many choice functions can be modeled by an S- shaped curve. Therefore (much as when we discussed linear transformations of the X variable), it makes sense to model a non-linear relationship.

Logistic Regression—but similar to OLS.... So. We actually could correct for the heteroskedasticity, and we could transform the equation so that it captured the “non-linear” relationship, and then use linear regression. But what we usually do....

Logistic Regression—but similar to OLS... ...is use logistic regression to predict the probability of the occurrence of an event.

Logistic Regression—s shaped curve

Logistic Regression— S shaped curve and Bernoulli variables Note that the observed dependent variable is a Bernoulli (or binary) variable. But what we are really interested in is predicting the probability that an event occurs (i.e., the probability that y=1).

Logistic Regression--advantage Logistic regression is particularly handy because (unlike, say, discriminant analysis) it makes no assumptions about how the independent variables are distributed. They don’t have to be continuous versus categorical, normally distributed—they can take any form.

Logistic Regression— exponential values and natural logs Note—”exp” is the exponential function. Ln is the natural log. These are opposites. When we take the exponential function of any number, we take 2.72 raised to the power of that number. So, exp(3)=2.72 * 2.72 * 2.72=20.09. If we take ln (20.09), we get the number 3.

Logistic Regression--transformation Note that you can think of logistic regression in terms of transforming the dependent variable so that it fits an s-shaped curve. Note that the odds ratio is the probability that a case will be a 1 divided by the probability that it will not be a 1. The natural log of the odds ratio is the “logit” and it is a linear function of the x’s (that is, of the right hand side of the model).

Logistic Regression--transformation Note that you can equivalently talk about modelling the probability that y=1 (theta, below), as below (these are the same mathematical expressions):

Logistic Regression Note that the independent variables are not related to the probability that y=1. However, the independent variables are linearly related to the logit of the dependent variables.

Logistic Regression--recap Logistic regression analysis, in other words, is very similar to OLS regression, just with a transformation of the regression formula. We also use binomial theory to conduct the tests.

Logistic Regression--interpretation Most commonly, with all other variables held constant, there is a constant increase of b1 in the logit (p) for every 1-unit increase in x1. But remember that even though the right hand side of the model is linearly related to the logit (that is, to the natural log of the odds-ratio), what does it mean for the actual probability that y=1?

Logistic Regression It’s fairly straightforward—it’s multiplicative. If b1 takes the value of 2.3 (and we know that exp(2.3)=10), then if x1 increases by 1, the odds that the dependent variable takes the value of 1 increase tenfold.

Od pravdepodobnosti šancí k logaritmom šancí Všetko začína s pojmom pravdepodobnosti. Pravdepodobnosť úspechu nejakej udalosti, je 0,8. Potom pravdepodobnosť poruchy je 1- 0,8 = 0,2. Šance na úspech sú definované ako pomer pravdepodobnosti úspechu cez pravdepodobnosti poruchy. V našom príklade, šance na úspech sú 0,8 / 0,2 = 4. To znamená, že šance na úspech sú 4 ku 1. V prípade, že pravdepodobnosť úspechu je 0,5, teda 50 až 50 percent šanca, potom šanca na úspech je 1 až 1.

Od pravdepodobnosti šancí k logaritmom šancí Transformácia z pravdepodobnosťou šanca je monotónna transformácie, čo znamená, že pravdepodobnosť zvyšovať so zvyšujúcou pravdepodobnosť alebo naopak. Pravdepodobnosť sa pohybuje od 0 a 1. kurzy v rozmedzí od 0 do kladného nekonečna.

Logistická regresia bez prediktorov Inými slovami, lokujúca konštanta modelu bez prediktora je odhadom logaritmu šance byť v triede vyznamenaných z celkovej skúmanej vzorky. Môžeme taktiež transformovať logaritmus šance späť na pravdepodobnosť.

Logistická regresia s jednou dichotomickou premennou V našom datasete. Aké sú šance mužov byť v triede vyznamenaných? Aké sú šance žien byť v triede vyznamenaných? Môžeme vypočítať ručne tieto šance od stola: u mužov, šance sú v triede vyznamenaných sú (17/91) / (74/91) = 17/74 = .23; a pre ženy, šance sú v triede vyznamenaných sú (32/109) / (77/109) = 32/77 = .42. Pomer šancí pre ženy ku šancí pre mužov je (32/77) / (17/74) = (32 * 74) / (77 * 17) = 1,809. Takže šanca pre mužov sú 17-74, šanca pre ženy je 32 až 77, a šance pre ženy sú o 81% vyššie ako šance pre mužov

Zamerajme sa na šance pre mužov a ženy a výstupu z logistickej regresie. Intercept z -1.471 záznamu je šanca pre mužov, pretože muž je referenčná skupina (female = 0). Použitie šance sme vypočítali vyššie pre mužov, môžeme potvrdiť toto: log (.23) = -1,47. Koeficient pre ženy je log pomer šancí medzi ženskými skupinami a mužskými skupinami: log (1.809) = .593. Takže môžeme dostať pomer šancí tým, že vypočítame exponenciálny koeficient pre ženy. Väčšina balíčkov pre štatistické zobrazenie oboch surové regresné koeficienty a umocňuje koeficienty pre logistické regresné modely. 

zdroje http://www.ats.ucla.edu/stat/mult_pkg/faq/genera l/odds_ratio.htm