Unit 5c: Adding Predictors to the Discrete Time Hazard Model © Andrew Ho, Harvard Graduate School of EducationUnit 5c– Slide 1

Slides:



Advertisements
Similar presentations
Judith D. Singer & John B. Willett Harvard Graduate School of Education Discrete-time survival analysis ALDA, Chapters 10, 11, and 12 Times change, and.
Advertisements

Continued Psy 524 Ainsworth
Unit 4a: Basic Logistic (Binomial Logit) Regression Analysis © Andrew Ho, Harvard Graduate School of EducationUnit 4a – Slide 1
Apr-15H.S.1 Stata: Linear Regression Stata 3, linear regression Hein Stigum Presentation, data and programs at: courses.
© Department of Statistics 2012 STATS 330 Lecture 32: Slide 1 Stats 330: Lecture 32.
Unit 6a: Motivating Principal Components Analysis © Andrew Ho, Harvard Graduate School of EducationUnit 6a– Slide 1
Irwin/McGraw-Hill © Andrew F. Siegel, 1997 and l Chapter 12 l Multiple Regression: Predicting One Factor from Several Others.
Lecture 28 Categorical variables: –Review of slides from lecture 27 (reprint of lecture 27 categorical variables slides with typos corrected) –Practice.
Logit & Probit Regression
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 13 Nonlinear and Multiple Regression.
Regression With Categorical Variables. Overview Regression with Categorical Predictors Logistic Regression.
© Willett, Harvard University Graduate School of Education, 5/21/2015S052/I.3(b) – Slide 1 More details can be found in the “Course Objectives and Content”
John B. Willett & Judith D. Singer Harvard Graduate School of Education Introducing discrete-time survival analysis ALDA, Chapter Eleven “To exist is to.
Ordinal Logistic Regression
Nemours Biomedical Research Statistics April 23, 2009 Tim Bunnell, Ph.D. & Jobayer Hossain, Ph.D. Nemours Bioinformatics Core Facility.
Notes on Logistic Regression STAT 4330/8330. Introduction Previously, you learned about odds ratios (OR’s). We now transition and begin discussion of.
1 G Lect 11M Binary outcomes in psychology Can Binary Outcomes Be Studied Using OLS Multiple Regression? Transforming the binary outcome Logistic.
An Introduction to Logistic Regression
Regression Model Building Setting: Possibly a large set of predictor variables (including interactions). Goal: Fit a parsimonious model that explains variation.
S052/Shopping Presentation – Slide #1 © Willett, Harvard University Graduate School of Education S052: Applied Data Analysis Shopping Presentation: A.
Unit 5c: Adding Predictors to the Discrete Time Hazard Model © Andrew Ho, Harvard Graduate School of EducationUnit 5c– Slide 1
Unit 4c: Taxonomies of Logistic Regression Models © Andrew Ho, Harvard Graduate School of EducationUnit 4c – Slide 1
Unit 3b: From Fixed to Random Intercepts © Andrew Ho, Harvard Graduate School of EducationUnit 3b – Slide 1
Review Guess the correlation. A.-2.0 B.-0.9 C.-0.1 D.0.1 E.0.9.
Unit 2b: Dealing “Rationally” with Nonlinear Relationships © Andrew Ho, Harvard Graduate School of EducationUnit 2b – Slide 1
Unit 4c: Taxonomies of Logistic Regression Models © Andrew Ho, Harvard Graduate School of EducationUnit 4c – Slide 1
Unit 4b: Fitting the Logistic Model to Data © Andrew Ho, Harvard Graduate School of EducationUnit 4b – Slide 1
© Willett, Harvard University Graduate School of Education, 8/27/2015S052/I.3(c) – Slide 1 More details can be found in the “Course Objectives and Content”
Marketing Research Aaker, Kumar, Day and Leone Tenth Edition
Simple Linear Regression
Essentials of survival analysis How to practice evidence based oncology European School of Oncology July 2004 Antwerp, Belgium Dr. Iztok Hozo Professor.
Unit 6b: Principal Components Analysis © Andrew Ho, Harvard Graduate School of EducationUnit 6b – Slide 1
G Lecture 121 Analysis of Time to Event Survival Analysis Language Example of time to high anxiety Discrete survival analysis through logistic regression.
© Willett & Singer, Harvard University Graduate School of Education S077/Week #4– Slide 1 S077: Applied Longitudinal Data Analysis Week #4: What Are The.
Chapter 3: Generalized Linear Models 3.1 The Generalization 3.2 Logistic Regression Revisited 3.3 Poisson Regression 1.
Andrew Ho Harvard Graduate School of Education Tuesday, January 22, 2013 S-052 Shopping – Applied Data Analysis.
Unit 5b: The Logistic Regression Approach to Life Table Analysis © Andrew Ho, Harvard Graduate School of EducationUnit 5b– Slide 1
CHAPTER 14 MULTIPLE REGRESSION
Unit 1c: Detecting Influential Data Points and Assessing Their Impact © Andrew Ho, Harvard Graduate School of EducationUnit 1c – Slide 1
University of Warwick, Department of Sociology, 2014/15 SO 201: SSAASS (Surveys and Statistics) (Richard Lampard) Week 7 Logistic Regression I.
Linear correlation and linear regression + summary of tests
April 4 Logistic Regression –Lee Chapter 9 –Cody and Smith 9:F.
Chapter 14 Inference for Regression © 2011 Pearson Education, Inc. 1 Business Statistics: A First Course.
S052/Shopping Presentation – Slide #1 © Willett, Harvard University Graduate School of Education S052: Applied Data Analysis What Would You Like To Know.
Unit 5a: Survival Analysis: Questions about Whether and When © Andrew Ho, Harvard Graduate School of EducationUnit 5a– Slide 1
Unit 3a: Introducing the Multilevel Regression Model © Andrew Ho, Harvard Graduate School of EducationUnit 3a – Slide 1
Unit 5a: Survival Analysis: Questions about Whether and When © Andrew Ho, Harvard Graduate School of EducationUnit 5a– Slide 1
© Willett, Harvard University Graduate School of Education, 12/16/2015S052/I.1(d) – Slide 1 More details can be found in the “Course Objectives and Content”
Multiple Logistic Regression STAT E-150 Statistical Methods.
Chapter 14: Inference for Regression. A brief review of chapter 4... (Regression Analysis: Exploring Association BetweenVariables )  Bi-variate data.
© Willett, Harvard University Graduate School of Education, 1/19/2016S052/I.2(a) – Slide 1 More details can be found in the “Course Objectives and Content”
© Willett, Harvard University Graduate School of Education, 2/19/2016S052/II.1(c) – Slide 1 S052/II.1(c): Applied Data Analysis Roadmap of the Course.
Roger B. Hammer Assistant Professor Department of Sociology Oregon State University Conducting Social Research Logistic Regression Categorical Data Analysis.
Logistic Regression and Odds Ratios Psych DeShon.
Nonparametric Statistics
Unit 2a: Dealing “Empirically” with Nonlinear Relationships © Andrew Ho, Harvard Graduate School of EducationUnit 2a – Slide 1
Beginners statistics Assoc Prof Terry Haines. 5 simple steps 1.Understand the type of measurement you are dealing with 2.Understand the type of question.
© Willett, Harvard University Graduate School of Education, 6/13/2016S052/II.2(a3) – Slide 1 S052/II.2(a3): Applied Data Analysis Roadmap of the Course.
LOGISTIC REGRESSION. Purpose  Logistical regression is regularly used when there are only two categories of the dependent variable and there is a mixture.
Logistic Regression: Regression with a Binary Dependent Variable.
BINARY LOGISTIC REGRESSION
Logistic Regression When and why do we use logistic regression?
Logistic Regression APKC – STATS AFAC (2016).
Notes on Logistic Regression
Chapter 13 Nonlinear and Multiple Regression
Advanced Quantitative Techniques
Jeffrey E. Korte, PhD BMTRY 747: Foundations of Epidemiology II
BIVARIATE ANALYSIS: Measures of Association Between Two Variables
BIVARIATE ANALYSIS: Measures of Association Between Two Variables
Presentation transcript:

Unit 5c: Adding Predictors to the Discrete Time Hazard Model © Andrew Ho, Harvard Graduate School of EducationUnit 5c– Slide 1

Reviewing Life Table Analysis, Hazard Functions, Survival Functions Building the Discrete Time Hazard Model Comparing Nested Models © Andrew Ho, Harvard Graduate School of Education Unit 5c– Slide 2 Multiple Regression Analysis (MRA) Multiple Regression Analysis (MRA) Do your residuals meet the required assumptions? Test for residual normality Use influence statistics to detect atypical datapoints If your residuals are not independent, replace OLS by GLS regression analysis Use Individual growth modeling Specify a Multi-level Model If time is a predictor, you need discrete- time survival analysis… If your outcome is categorical, you need to use… Binomial logistic regression analysis (dichotomous outcome) Multinomial logistic regression analysis (polytomous outcome) If you have more predictors than you can deal with, Create taxonomies of fitted models and compare them. Form composites of the indicators of any common construct. Conduct a Principal Components Analysis Use Cluster Analysis Use non-linear regression analysis. Transform the outcome or predictor If your outcome vs. predictor relationship is non-linear, Use Factor Analysis: EFA or CFA? Course Roadmap: Unit 5c Today’s Topic Area

© Andrew Ho, Harvard Graduate School of EducationUnit 5c– Slide 3 New data example … data described in FIRSTSEX_info.html ….FIRSTSEX_info.html New data example … data described in FIRSTSEX_info.html ….FIRSTSEX_info.html 822 person-period records.Sample size Singer & Willett, 2003Singer & Willett, 2003, Chapter 11.More Info Capaldi, D. M., Crosby, L., & Stoolmiller, M. (1996). Predicting The Timing Of First Sexual Intercourse For At-Risk Adolescent Males. Child Development, 67, Source Person-level dataset that records he high-school grade (7 th – 12 th ) in which at-risk adolescent boys reported experiencing heterosexual sex for first time, with data on: 1.Whether the boy had suffered a parental transition during early childhood (eg., a parental divorce and/or a parental death prior to 7 th grade). 2.The parents’ level of antisocial behavior during the boy’s early childhood. Overview FIRSTSEX.txtDataset Research Questions 1.Whether, and if so in which grade, at-risk adolescent boys report first experiencing heterosexual sex? 2.How the risk of reported first heterosexual sex depends on the boy’s experiences with parental death and divorce during early childhood? Research Questions 1.Whether, and if so in which grade, at-risk adolescent boys report first experiencing heterosexual sex? 2.How the risk of reported first heterosexual sex depends on the boy’s experiences with parental death and divorce during early childhood? New Data Example In DTSA terms, first sex is death, and survival is virginity.

© Andrew Ho, Harvard Graduate School of EducationUnit 5c– Slide 4 Getting the Data in Shape Adolescent Boy 1 first reported sexual intercourse in Grade 9, had no early parenting transitions and a high level of parent antisocial behavior during the boy’s childhood. Adolescent Boy 2 has a censored record (never had intercourse through Grade 12), had early parenting transitions, and an average level of parent antisocial behavior. Adolescent Boy 1 first reported sexual intercourse in Grade 9, had no early parenting transitions and a high level of parent antisocial behavior during the boy’s childhood. Adolescent Boy 2 has a censored record (never had intercourse through Grade 12), had early parenting transitions, and an average level of parent antisocial behavior.

© Andrew Ho, Harvard Graduate School of EducationUnit 5c– Slide 5 A Histogram of GRADE This tells us how many adolescent boys report first sex in any given grade. In red, we have the number of boys whose records are censored, who never report having sex through Grade 12. Why doesn’t this give us a good sense of how Hazard Probabilities differ over time?

© Andrew Ho, Harvard Graduate School of EducationUnit 5c– Slide 6 The Hazard Function The sample probability of reporting the loss of your virginity in Grade 12 (conditional on never doing so before entering the risk set) is 32.5%.

© Andrew Ho, Harvard Graduate School of EducationUnit 5c– Slide 7 The Survival Function How to find the median survival time…

© Andrew Ho, Harvard Graduate School of EducationUnit 5c– Slide 8 Conditional Hazard Probabilities and Functions What are the hazard probabilities conditional on whether the boy had an early parental transition? The sample probability of reporting the loss of your virginity in Grade 12 (conditional on being a reported virgin entering the risk set) is 47% if you had an early parental transition and 19% if you did not.

© Andrew Ho, Harvard Graduate School of EducationUnit 5c– Slide 9 Conditional Survival Probabilities and Functions What are the survival probabilities conditional on whether the boy had an early parental transition? The sample probability of maintaining one’s reported virginity past Grade 12 is 19% if you had an early parental transition and 47% if you did not. How to find the median survival time conditional on PT!

© Andrew Ho, Harvard Graduate School of EducationUnit 5c– Slide 10 Conditional Hazard Logits as the Target of Modeling

© Andrew Ho, Harvard Graduate School of EducationUnit 5c– Slide 11 From Person-Level to Person-Period Hazard Probabilities. tabulate EVENT GRADE, column Remember how to generate hazard probabilities in a person-period dataset, by tabulate or by egen. From before…

© Andrew Ho, Harvard Graduate School of EducationUnit 5c– Slide 12 Fitting the Discrete Time Hazard Model These z-tests test the null hypothesis that the logit is 0 (the probability is 50%) in the population. The fitted logits/probabilities reproduce the sample logits/probabilities exactly and incorporates them into a statistical framework.

© Andrew Ho, Harvard Graduate School of EducationUnit 5c– Slide 13 Now, Conditional Hazard Probabilities in a Person-Period Dataset Remember that we don’t model probabilities directly. Instead, we model their logits. By adding the categorical predictor variable, PT, to the “by” statement, we create conditional hazard probabilities in a person-period dataset. (It’s straightforward for person-level data with the “ ltable, by ” approach).

© Andrew Ho, Harvard Graduate School of EducationUnit 5c– Slide 14 Fitting the Discrete Time Hazard Model with a Predictor  As always, the interpretation of the main effect is easier to see on the logit scale than it is on the probability scale.

© Andrew Ho, Harvard Graduate School of EducationUnit 5c– Slide 15 Building the Discrete Time Hazard Model with esttab For each model, we store using eststo:, save the deviance (-2LL), save the predicted probabilities, and label the predicted probability variable. Instead of adding dummy variables for each grade, we plot a model that is linear in the logits. And then add the question predictor… We can see if the above model is more parsimonious than a model that uses all the grade dummies without sacrificing prediction.

© Andrew Ho, Harvard Graduate School of EducationUnit 5c– Slide 16 Building the Discrete Time Hazard Model with esttab Two adolescent boys that differ by one grade level have an estimated difference in log-odds of reporting first sex of.4 logits. The fitted difference in log-odds between boys with and without parental transitions is.87, if GRADE could be held constant.

© Andrew Ho, Harvard Graduate School of EducationUnit 5c– Slide 17 The Likelihood Ratio Chi-Square for DTSA Another way to do this, which we can apply to the contrast between Models 3 and 5 (Do dummies give better fit in the population, after accounting for PT?), is directly in Stata:

© Andrew Ho, Harvard Graduate School of EducationUnit 5c– Slide 18 Model 1: Constant Only

© Andrew Ho, Harvard Graduate School of EducationUnit 5c– Slide 19 Model 2: Linear in Logits on GRADE and PT

© Andrew Ho, Harvard Graduate School of EducationUnit 5c– Slide 20 Model 3: Linear in Logits on GRADE and PT Parental Transition No Parental Transition Parental Transition No Parental Transition

© Andrew Ho, Harvard Graduate School of EducationUnit 5c– Slide 21 Model 4: Dummies for GRADE

© Andrew Ho, Harvard Graduate School of EducationUnit 5c– Slide 22 Model 5: Dummies for GRADE, Controlling for PT Parental Transition No Parental Transition Parental Transition No Parental Transition

© Andrew Ho, Harvard Graduate School of EducationUnit 5c– Slide 23 Revisiting the Likelihood Ratio Chi-Square, Graphically Another way to do this, which we can apply to the contrast between Models 3 and 5 (Do dummies give better fit in the population, after accounting for PT?), is directly in Stata:

© Andrew Ho, Harvard Graduate School of EducationUnit 5c– Slide 24 Fitted Survival Functions for Model 3 Parental Transition No Parental Transition Parental Transition No Parental Transition At each discrete time point, there is a sample hazard probability and a fitted/estimated hazard probability. Each implies its own survival probability. These are simply those curves. The dotted lines are the model- implied or fitted survival functions.