EHA: Terminology and basic non-parametric graphs

Slides:



Advertisements
Similar presentations
What is Event History Analysis?
Advertisements

SC968: Panel Data Methods for Sociologists
What is Event History Analysis?
Surviving Survival Analysis
Event History Models 1 Sociology 229A: Event History Analysis Class 3
October 1999 Statistical Methods for Computer Science Marie desJardins CMSC 601 April 9, 2012 Material adapted.
Copyright © 2009 Pearson Education, Inc. Chapter 29 Multiple Regression.
Statistical Techniques I EXST7005 Lets go Power and Types of Errors.
Models with Discrete Dependent Variables
Part 9: Normal Distribution 9-1/47 Statistics and Data Analysis Professor William Greene Stern School of Business IOMS Department Department of Economics.
BSc/HND IETM Week 9/10 - Some Probability Distributions.
Main Points to be Covered
Event History Analysis 1 Sociology 8811 Lecture 14 Copyright © 2007 by Evan Schofer Do not copy or distribute without permission.
Event History Analysis: Introduction Sociology 229 Class 3 Copyright © 2010 by Evan Schofer Do not copy or distribute without permission.
Evaluating Hypotheses
CHAPTER 6 Statistical Analysis of Experimental Data
Event History Models Sociology 229: Advanced Regression Class 5
Event History Analysis 1 Sociology 8811 Lecture 16 Copyright © 2007 by Evan Schofer Do not copy or distribute without permission.
Linear Regression 2 Sociology 5811 Lecture 21 Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.
Basic Descriptive Statistics Healey, Chapter 2
Analysis of Complex Survey Data
Copyright © 2005 by Evan Schofer
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 12: Multiple and Logistic Regression Marshall University.
Today: Central Tendency & Dispersion
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 10: Survival Curves Marshall University Genomics Core.
Single and Multiple Spell Discrete Time Hazards Models with Parametric and Non-Parametric Corrections for Unobserved Heterogeneity David K. Guilkey.
Psy B07 Chapter 1Slide 1 ANALYSIS OF VARIANCE. Psy B07 Chapter 1Slide 2 t-test refresher  In chapter 7 we talked about analyses that could be conducted.
Sociology 5811: Lecture 7: Samples, Populations, The Sampling Distribution Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.
Essentials of survival analysis How to practice evidence based oncology European School of Oncology July 2004 Antwerp, Belgium Dr. Iztok Hozo Professor.
STAT02 - Descriptive statistics (cont.) 1 Descriptive statistics (cont.) Lecturer: Smilen Dimitrov Applied statistics for testing and evaluation – MED4.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Review and Preview This chapter combines the methods of descriptive statistics presented in.
1 Introduction to medical survival analysis John Pearson Biostatistics consultant University of Otago Canterbury 7 October 2008.
Making decisions about distributions: Introduction to the Null Hypothesis 47:269: Research Methods I Dr. Leonard April 14, 2010.
Chapter Eleven A Primer for Descriptive Statistics.
Probability, contd. Learning Objectives By the end of this lecture, you should be able to: – Describe the difference between discrete random variables.
© 2006 McGraw-Hill Higher Education. All rights reserved. Numbers Numbers mean different things in different situations. Consider three answers that appear.
Linear Functions 2 Sociology 5811 Lecture 18 Copyright © 2004 by Evan Schofer Do not copy or distribute without permission.
EHA: More On Plots and Interpreting Hazards Sociology 229A: Event History Analysis Class 9 Copyright © 2008 by Evan Schofer Do not copy or distribute without.
Sociology 5811: Lecture 14: ANOVA 2
Statistical analysis Prepared and gathered by Alireza Yousefy(Ph.D)
From Theory to Practice: Inference about a Population Mean, Two Sample T Tests, Inference about a Population Proportion Chapters etc.
Multiple Regression 3 Sociology 5811 Lecture 24 Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.
“Further Modeling Issues in Event History Analysis by Robert E. Wright University of Strathclyde, CEPR-London, IZA-Bonn and Scotecon.
Introduction to Survival Analysis Utah State University January 28, 2008 Bill Welbourn.
Lecture 15: Crosstabulation 1 Sociology 5811 Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.
Experimental Research Methods in Language Learning Chapter 9 Descriptive Statistics.
Central Tendency & Dispersion
Sociology 5811: Lecture 3: Measures of Central Tendency and Dispersion Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.
Issues concerning the interpretation of statistical significance tests.
UNIT 5.  The related activities of sorting, searching and merging are central to many computer applications.  Sorting and merging provide us with a.
01/20151 EPI 5344: Survival Analysis in Epidemiology Actuarial and Kaplan-Meier methods February 24, 2015 Dr. N. Birkett, School of Epidemiology, Public.
IMPORTANCE OF STATISTICS MR.CHITHRAVEL.V ASST.PROFESSOR ACN.
Review of Statistics.  Estimation of the Population Mean  Hypothesis Testing  Confidence Intervals  Comparing Means from Different Populations  Scatterplots.
Biostatistics Case Studies 2014 Youngju Pak Biostatistician Session 5: Survival Analysis Fundamentals.
Chapter 2: Frequency Distributions. Frequency Distributions After collecting data, the first task for a researcher is to organize and simplify the data.
INTRODUCTION TO CLINICAL RESEARCH Survival Analysis – Getting Started Karen Bandeen-Roche, Ph.D. July 20, 2010.
Warsaw Summer School 2015, OSU Study Abroad Program Normal Distribution.
The accuracy of averages We learned how to make inference from the sample to the population: Counting the percentages. Here we begin to learn how to make.
King Faisal University جامعة الملك فيصل Deanship of E-Learning and Distance Education عمادة التعلم الإلكتروني والتعليم عن بعد [ ] 1 جامعة الملك فيصل عمادة.
Copyright © 2013 Pearson Education, Inc. Publishing as Prentice Hall Statistics for Business and Economics 8 th Edition Chapter 9 Hypothesis Testing: Single.
Introduction Dispersion 1 Central Tendency alone does not explain the observations fully as it does reveal the degree of spread or variability of individual.
Introduction A histogram is a graph that summarizes data.
Review Design of experiments, histograms, average and standard deviation, normal approximation, measurement error, and probability.
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 13: Multiple, Logistic and Proportional Hazards Regression.
April 18 Intro to survival analysis Le 11.1 – 11.2
Event History Analysis 3
Sampling Distributions
(-4)*(-7)= Agenda Bell Ringer Bell Ringer
Xbar Chart By Farrokh Alemi Ph.D
Presentation transcript:

EHA: Terminology and basic non-parametric graphs Sociology 229 Advanced Regression Class 4 Copyright © 2010 by Evan Schofer Do not copy or distribute without permission

Announcements Assignment 2 Due Agenda: Assignment 3 handed out Event history analysis – basic issues.

Review: Why we need EHA Example: Drug dosage and mortality Question: What are the limits of using OLS regression to model time-to-mortality? Answer: Censoring: some patients don’t die Violation of normality assumptions: outcome variable is not normal This also causes issues for “censored normal regression” Question: What about Logistic Regression? Answer: Fails to utilize information on timing.

Motivation Event history analysis is more than just a “fix” for censoring and violations of normality… EHA concepts and data structures put “dynamic” processes at the foreground In short, EHA helps us think about how time matters.

EHA: Overview and Terminology EHA is referred to as “dynamic” modeling i.e., addresses the timing of outcomes: rates Dependent variable is best conceptualized as a rate of some occurrence Not a “level” or “amount” as in OLS regression Think: “How fast?” “How often?” The “occurrence” may be something that can occur only once for each case: e.g., mortality Or, it may be repeatable: e.g., marriages, strategic alliances.

EHA: Types of Questions Some types of questions EHA can address: 1. Mortality: Does drug dosage reduce rates? Does “rate” decrease with larger doses? Also: control for race, gender, treatment options, etc 2. Life stage transitions: timing of marriage Is rate affected by gender, class, religion? 3. Organizational mortality Is rate affected by size, historical era, competition? 4. Inter-state war Is rate affected by economic, political factors?

EHA: Overview EHA involves both descriptive and parametric analysis of data Just like regression: Scatterplots, partialplots = descriptive OLS model/hypothesis tests = parametric Descriptive analyses/plots Allow description of the overall rate of some outcome For all cases, or for various subgroups Parametric Models Allow hypothesis testing about variables that affect rate (and can include control variables).

EHA Terminology: States & Events EHA has evolved its own terminology: “State” = the “state of being” of a case Conceptualized in terms of discrete phenomena e.g., alive vs. dead “State space” = the set of all possible states Can be complex: Single, married, divorced, widowed “Event” = Occurrence of the outcome Also called “transition”, “failure” Shift from “alive” to “dead”, “single” to “married” Occurs at a specific, known point in time

Terminology: Risk & Spells “Risk Set” = the set of all cases capable of experiencing the event e.g., those “at risk” of experiencing mortality Note: the risk often changes over time Shrinks as cases experience events Or grows, if new cases enter the study “Spell” = A chunk of time that a case experiences, bounded by: events, and/or the start or end of the study As in “I’m gonna sit here for a spell…” Sometimes called a “duration”.

States, Spells, & Events: Visually If we assign numeric values to states, it is easy to graph cases over time As they experience 1 or more spells Example: drug & mortality study States: Alive = 0 Dead = 1 Time = measured in months Starting at zero, when the study begins Ending at 60 months, when study ends (5 years).

States, Spells, & Events: Visually Example of mortality at month 33 End of Study Event Spell #2 1 0 10 20 30 40 50 60 Time (Months) State Spell #1 Note: It takes 2 spells to describe this case But, we may only be interested in the first spell. (Because there is no possibility of change after transition to state = 1)

States, Spells, & Events: Visually Example of a patient who is cured Doesn’t experience mortality during study End of Study 1 0 10 20 30 40 50 60 Time (Months) State Spell #1 Note: Only 1 spell is needed The spell indicates a consistent state (0), for the period of time in which we have information

More Terminology: Censoring Note: In both cases, data runs out after month 60 Even if the patient is still alive In temporal analysis, we rarely have data for all relevant time for all cases “Censored” = indicates the absence of data before or after a certain point in time As in: “data on cases is censored at 60 months” “Right Censored” = no data after a time point “Left Censored” = no data before a time point

States, Spells, & Events: Visually A more complex state space: marital status 0 = single, 1 = married, 2 = divorced, 3 = widowed Individual history: Married at 20, divorced at 27, remarried at 33 Right Censored at 45 Spell #4 3 2 1 16 20 24 28 32 36 40 44 Age (Years) State Spell #1 Spell #2 Spell #3

Measuring States and Times EHA, in short, is the analysis of spells It takes into account the duration of spells, and whether or not there was a change of state at the end States at start and end of spell are measured by assigning pre-defined values to a variable Much like logit/probit or multinomial logit Times at the start and end of spell must also be measured Time Unit = The time metric in the study e.g., minutes, hours, days, months, years, etc

Time Clock Time Clock = time reference of the analysis Possibilities: Duration since start of study Chronological age of case (person, firm, country) Duration since end of last spell i.e., clock is set to zero at start of each spell Historical time – the actual calendar date The choice of time-clock can radically change the analysis and meaning of results It is crucial to choose a clock that makes sense for the hypotheses you wish to test

Time Clocks Visually: Age 3 2 1 16 20 24 28 32 36 40 44 Age (Years) State Spell #1 End of Study Spell #4 Spell #2 Spell #3 EHA examines rate of transitions as a function of a person’s age

Time Clocks Visually: Duration Single from 16-20 (4 years), married from 20-27 (7 years), divorced from 27-33 (6 yrs), remarried at 33-45 (12 yrs) Spell #3 Spell #2 Spell #4 3 2 1 0 4 6 12 18 22 Duration (Years) State Spell #1 EHA examines rate of transitions as a function of a person’s duration in their current state

Time Clocks: General Advice Different time-clocks have different strengths We’ll discuss this more… Chronological Age = good for processes clearly linked to age Biological things: fertility, mortality Liability of newness Historical time = useful for examining the impact of historical change on ongoing phenomena E.g., effects of changing regulatory regimes on rates of strategic alliances

Moving Toward Analyses: Example Example: Employee retention How long after hiring before employees quit? Data: Sample of 12 employees at McDonalds Time-Clock/Time Unit: duration of employment from time of hiring (measured in days) 2 Possible states: Employed & No longer employed We are uninterested in subsequent hires Therefore, we focus on initial spell, ending in quitting.

Example: Employee Retention Visually – red line indicates length of employment spell for each case: 0 20 40 60 80 100 120 Time (days) Cases Right Censored

Simple EHA Descriptives Question: What simple things can we do to describe this sample of 12 employees? 1. Average duration of employment Only works if all (or nearly all) have quit Many censored cases make “average” meaningless This is a fairly useful summary statistic Gives a sense of overall speed of events Especially useful when broken down by sub-groups e.g., average by gender or compensation plan.

Descriptives: Average Duration Simply calculate the mean time-to-quitting Average = 33.4 days 0 20 40 60 80 100 120 Time (days) Cases Right Censored

Simple EHA Descriptives Question: What simple things can we do to describe this sample of 12 employees? 2. Compute “Half Life” of employee tenure i.e., median failure time… a better option than “mean” Determine time at which attrition equals 50% Also highlights the overall turnover rate Note: Exact value is calculable, even if there are censored cases Again, computing for sub-groups is useful

Descriptives: Half Life Determine time when ½ of sample has had event 0 20 40 60 80 100 120 Time (days) Cases Right Censored Half Life = 23 days

Simple EHA Descriptives Question: What simple things can we do to describe this sample of 12 employees? 3. Tabulate (or plot) quitters in different time-periods: e.g., 1-20 days, 21-40 days, etc. Absolute numbers of “quitters” or “stayers” or Numbers of quitters as a proportion of “stayers” Or look at number (or proportion) who have “survived” (i.e., not quit)

Descriptives: Tables For each period, determine number or proportion quitting/staying Day 1-20 20-40 40-60 60-80 80-100 0 20 40 60 80 100 120 Time (days) Cases

EHA Descriptives: Tables Time Range Quitters: Total #, % # staying 1 Day 1-20 5 quit, 42% of all, 42% of remaining 7 left, 58 % of all 2 Day 21-40 2 quit, 16% of all 29% of remaining 5 left, 42% of all 3 Day 41-60 1 quit, 8% of all 20% of remaining 4 left, 33 % of all 4 Day 61-80 25% of remaining 3 left, 25% of all

EHA Descriptives: Tables Remarks on EHA tables: 1. Results of tables change depending on time-ranges chosen (like a histogram) E.g., comparing 20-day ranges vs. 10-day ranges 2. % quitters vs. % quitters as a proportion of those still employed Absolute % can be misleading since the number of people left in the risk set tends to decrease A low # of quitters can actually correspond to a very high rate of quitting for those remaining in the firm Typically, these ratios are more socially meaningful than raw percentages.

EHA Descriptives: Plots We can also plot tabular information:

The Survivor Function: S(t) A more sophisticated version of % remaining Calculated based on continuous time (calculus), rather than based on some arbitrary interval (e.g., day 1-20) Survivor Function – S(t): The probability (at time = t) of not having the event prior to time t. Always equal to 1 at time = 0 (when no events can have happened yet Decreases as more cases experience the event When graphed, it is typically a decreasing curve Looks a lot like % remaining

Survivor Function: S(t) McDonald’s Example: Steep decreases indicate lots of quitting at around 20 days

Survivor Function: S(t) Interpretation: The survivor function reflects the probability of surviving beyond time t A monotone, non-increasing function of time Always starts at 1, decreases as cases experience events Let’s try to draw some possible survivor functions For human mortality For the failure of a computer hard-drive For having a (first) baby For large US cities having major protests in the civil rights movement.

Survivor Ex: First Marriage Compare survivor for women, men: Survivor plot for Men (declines later) Survivor plot for Women (declines earlier)

The Hazard Function: h(t) A more sophisticated version of # events divided by # remaining Hazard Function – h(t) = The probability of an event occurring at a given point in time, given that it hasn’t already occurred Formula: Think of it as: the rate of events occurring for those at risk of experiencing the event

High (and wide) peaks indicate lots of quitting The Hazard Function Example: High (and wide) peaks indicate lots of quitting

The Hazard Function: h(t) Interpretation: The hazard function reflects the rate of events at a given point in time For cases that made it that far… It reflects the “rate that risk is accumulating” Let’s draw some hazard functions For human mortality For the failure of a computer hard-drive For having a (first) baby For large US cities having major protests in the civil rights movement.

Hazard Plot: First Marriage Hazard Rate: Full Sample

Cumulative Hazard Function: H(t) The “cumulative” or “integrated” hazard Use calculus to “integrate” the hazard function Recall – An integral represents the area under the curve of another function between 0 and t Hazard is a rate, like “60 miles per hour” Integrated hazard is total distance driven… In three hours, it would be 180 miles Integrated hazard functions always increase (opposite of the survivor function). Big increases indicates that the hazard is high

Cumulative Hazard Function: H(t) Example: “Flat” areas indicate low hazard rate Steep increases indicate peaks in hazard rate

The Cumulative Hazard: H(t) Interpretation: The cumulative hazard function reflects the total amount of risk that has accumulated at a given point in time… Let’s draw some integrated hazard functions For human mortality For the failure of a computer hard-drive For having a (first) baby For large US cities having major protests in the civil rights movement.

Integrated Hazard: First Marriage Compare Integrated Hazard for women, men: Integrated Hazard for men increases slower (and remains lower) than women

Cumulative Hazard Example Ex: Edelman et al. 1999: EEOC Grievance procedures

EHA Plots: Remarks Plotting EHA data is extremely useful Helps you understand your data Helps you figure out the correct time-clock Helps you to develop arguments about dynamics Allows you to compare different groups We’ll pick this up in the future.