BIOST 536 Lecture 1 1 Lecture 1 - Introduction Overview of course  Focus is on binary outcomes  Some ordinal outcomes considered Simple examples Definitions.

Slides:



Advertisements
Similar presentations
How would you explain the smoking paradox. Smokers fair better after an infarction in hospital than non-smokers. This apparently disagrees with the view.
Advertisements

Comparing Two Proportions (p1 vs. p2)
Logistic Regression I Outline Introduction to maximum likelihood estimation (MLE) Introduction to Generalized Linear Models The simplest logistic regression.
Basic epidemiologic analysis with Stata Biostatistics 212 Lecture 5.
Logistic Regression.
Matched designs Need Matched analysis. Incorrect unmatched analysis. cc cc exp,exact Proportion | Exposed Unexposed | Total Exposed
Lecture 17: Regression for Case-control Studies BMTRY 701 Biostatistical Methods II.
Measures of Disease Association Measuring occurrence of new outcome events can be an aim by itself, but usually we want to look at the relationship between.
BIOST 536 Lecture 3 1 Lecture 3 – Overview of study designs Prospective/retrospective  Prospective cohort study: Subjects followed; data collection in.
BIOST 536 Lecture 9 1 Lecture 9 – Prediction and Association example Low birth weight dataset Consider a prediction model for low birth weight (< 2500.
Chapter 11 Survival Analysis Part 2. 2 Survival Analysis and Regression Combine lots of information Combine lots of information Look at several variables.
Nemours Biomedical Research Statistics April 23, 2009 Tim Bunnell, Ph.D. & Jobayer Hossain, Ph.D. Nemours Bioinformatics Core Facility.
BIOST 536 Lecture 12 1 Lecture 12 – Introduction to Matching.
BIOST 536 Lecture 2 1 Lecture 2 - Modeling Need to find a model that relates the outcome to the covariates in a meaningful way  Simplification of the.
BIOST 536 Lecture 4 1 Lecture 4 – Logistic regression: estimation and confounding Linear model.
Lecture 9: p-value functions and intro to Bayesian thinking Matthew Fox Advanced Epidemiology.
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 12: Multiple and Logistic Regression Marshall University.
Logistic Regression II Simple 2x2 Table (courtesy Hosmer and Lemeshow) Exposure=1Exposure=0 Disease = 1 Disease = 0.
The Chi-Square Test Used when both outcome and exposure variables are binary (dichotomous) or even multichotomous Allows the researcher to calculate a.
Unit 6: Standardization and Methods to Control Confounding.
Analysis of Categorical Data
Multiple Choice Questions for discussion
Measuring Associations Between Exposure and Outcomes.
Biostatistics Breakdown Common Statistical tests Special thanks to: Christyn Mullen, Pharm.D. Clinical Pharmacy Specialist John Peter Smith Hospital 1.
Biostatistics Case Studies 2005 Peter D. Christenson Biostatistician Session 4: Taking Risks and Playing the Odds: OR vs.
Basic epidemiologic analysis with Stata Biostatistics 212 Lecture 5.
Evidence-Based Medicine 3 More Knowledge and Skills for Critical Reading Karen E. Schetzina, MD, MPH.
EPI 811 – Work Group Exercise #2 Team Honey Badgers Alex Montoye Kellie Mayfield Michele Fritz Anton Frattaroli.
Bandit Thinkhamrop, PhD. (Statistics) Department of Biostatistics and Demography Faculty of Public Health Khon Kaen University, THAILAND.
HSRP 734: Advanced Statistical Methods July 17, 2008.
Introduction to Survival Analysis Utah State University January 28, 2008 Bill Welbourn.
April 4 Logistic Regression –Lee Chapter 9 –Cody and Smith 9:F.
Contingency tables Brian Healy, PhD. Types of analysis-independent samples OutcomeExplanatoryAnalysis ContinuousDichotomous t-test, Wilcoxon test ContinuousCategorical.
The binomial applied: absolute and relative risks, chi-square.
MBP1010 – Lecture 8: March 1, Odds Ratio/Relative Risk Logistic Regression Survival Analysis Reading: papers on OR and survival analysis (Resources)
+ Chi Square Test Homogeneity or Independence( Association)
Biostat 200 Lecture 8 1. The test statistics follow a theoretical distribution (t stat follows the t distribution, F statistic follows the F distribution,
Lecture 18 Ordinal and Polytomous Logistic Regression BMTRY 701 Biostatistical Methods II.
Biostat 200 Lecture 8 1. Where are we Types of variables Descriptive statistics and graphs Probability Confidence intervals for means and proportions.
Logistic regression. Recall the simple linear regression model: y =  0 +  1 x +  where we are trying to predict a continuous dependent variable y from.
A short introduction to epidemiology Chapter 9: Data analysis Neil Pearce Centre for Public Health Research Massey University Wellington, New Zealand.
Measuring Associations Between Exposure and Outcomes Chapter 3, Szklo and Nieto.
1 Multivariable Modeling. 2 nAdjustment by statistical model for the relationships of predictors to the outcome. nRepresents the frequency or magnitude.
Case-Control Studies Abdualziz BinSaeed. Case-Control Studies Type of analytic study Unit of observation and analysis: Individual (not group)
POPLHLTH 304 Regression (modelling) in Epidemiology Simon Thornley (Slides adapted from Assoc. Prof. Roger Marshall)
1 Introduction to Modeling Beyond the Basics (Chapter 7)
Probability and odds Suppose we a frequency distribution for the variable “TB status” The probability of an individual having TB is frequencyRelative.
Conditional Logistic Regression Epidemiology/Biostats VHM812/802 Winter 2016, Atlantic Veterinary College, PEI Raju Gautam.
Exact Logistic Regression
THE CHI-SQUARE TEST BACKGROUND AND NEED OF THE TEST Data collected in the field of medicine is often qualitative. --- For example, the presence or absence.
Introduction to Biostatistics, Harvard Extension School, Fall, 2005 © Scott Evans, Ph.D.1 Contingency Tables.
Fall 2002Biostat Inference for two-way tables General R x C tables Tests of homogeneity of a factor across groups or independence of two factors.
Analysis of matched data Analysis of matched data.
Introdcution to Epidemiology for Medical Students Université Paris-Descartes Babak Khoshnood INSERM U1153, Equipe EPOPé (Dir. Pierre-Yves Ancel) Obstetric,
Meta-analysis of observational studies Nicole Vogelzangs Department of Psychiatry & EMGO + institute.
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 13: Multiple, Logistic and Proportional Hazards Regression.
Bandit Thinkhamrop, PhD. (Statistics) Department of Biostatistics and Demography Faculty of Public Health Khon Kaen University, THAILAND.
Measures of disease frequency Simon Thornley. Measures of Effect and Disease Frequency Aims – To define and describe the uses of common epidemiological.
BINARY LOGISTIC REGRESSION
Lecture 18 Matched Case Control Studies
Introduction to Logistic Regression
Statistics 103 Monday, July 10, 2017.
Multiple logistic regression
Jeffrey E. Korte, PhD BMTRY 747: Foundations of Epidemiology II
Narrative Reviews Limitations: Subjectivity inherent:
Problems with infinite solutions in logistic regression
Discussion Week 1 (4/1/13 – 4/5/13)
Common Statistical Analyses Theory behind them
Presentation transcript:

BIOST 536 Lecture 1 1 Lecture 1 - Introduction Overview of course  Focus is on binary outcomes  Some ordinal outcomes considered Simple examples Definitions Hypothetical example Framingham example

BIOST 536 Lecture 1 2 Binary outcome data Outcome variable Y  Response to treatment: Success versus failure  Outcome of screening or diagnostic test: Positive versus negative  Disease prevalence at a specific time or age: Present versus absent  Disease incidence in a time interval ( 0, t ) where t may be a predefined single time point and all outcomes are assessed; otherwise consider survival models  Dichotimization of a continuous variable: High blood pressure (SBP ≥ 140) Overweight (BMI ≥ 30) Low birthweight ( < 2500 grams )  Case (disease) versus Control (no disease) { Using outcome in an imprecise sense }

BIOST 536 Lecture 1 3 Covariates Vector of variables called “X”  Randomized treatment  Exposure ( Exposed = 1 versus Unexposed = 0 )  Degree of exposure (continuous; ordinal)  Demographic variables: age, race, gender  Baseline characteristics; propensity score Scientific variable of interest (usually treatment or exposure) Other “control” variables  Precision variables  Confounders or potential confounders  Effect modifiers ≥

BIOST 536 Lecture 1 4 Example 1 Prostate cancer (Hosmer & Lemeshow, 1.6.3)  Y = 1 if the tumor penetrates capsule; 0 if no penetration  X = Gleason score, age, race, rectal exam, PSA, volume  Could decide if there is any association of Gleason score with capsule penetration  Could decide if there is an ordinal association of Gleason score with outcome  May want to adjust for age, race, PSA etc or determine if they modify the association Y Gleason score

BIOST 536 Lecture 1 5 Example 2 Low birthweight determinant (Hosmer & Lemeshow, 1.6.2)  Y = low birthweight (< 2500 grams)  X = mother’s age, weight, smoking, number of prenatal visits, etc  Not dichotomizing birthweight may be more powerful, but we may have a particular interest in this definition of low birthweight  Is the goal to discover new, possibly mutable, risk factors for LBW? (Scientific hypothesis testing/hypothesis generation)  Goal may be to predict LBW from known risk factors (empirical models with validation)

BIOST 536 Lecture 1 6 Example 3 Framingham Prediction of coronary heart disease  Y = CHD within a prescribed time window  X = age, gender, blood pressure, serum cholesterol, smoking, etc  Framingham cohort is followed with regular exams and updated risk factors  If the goal is to predict 10 year CHD risk? How do we handle dropouts and deaths due to other causes < 10 years? Do we use the updated risk factors in the prediction model?  May prefer true cohort methods (survival analysis) or time-matched case-control studies

BIOST 536 Lecture 1 7 Example 4 Leprosy case-control study (Clayton & Hills 18.1)  Y = 1 if leprosy case; 0 otherwise  X = age, presence/absence of BCG scar  BCG was a vaccine against tuberculosis; but may protect against leprosy as well  Stratified case-control data Leprosy casesHealthy controls AgeBCG scarNo BCG scarBCG scarNo BCG scar ,59311, ,14310, ,6117, ,2088, ,4385, ,3561, ,2451,234

BIOST 536 Lecture 1 8 Definitions Time or age of disease occurrence T Incidence rate at time t (or age t)  Numerator is probability that a person without disease at time t gets disease in the next small interval of time  Incidence rates theoretically can change continuously in time  Often assume that the incidence rates are piecewise constant over a longer duration

BIOST 536 Lecture 1 9 Definitions Incidence rate ratio at time t (or age t) for individuals in Group 1 (exposed) versus Group 0 (unexposed)  Often assume that the IRR is constant, i.e. does not depend on time even if the incidence rate ratios do  Incidence rates can be modeled and compared with Poisson regression models  Poisson and binomial models are similar when the event rate is low, but the variances differ

BIOST 536 Lecture 1 10 Definitions - continued Risk of disease by time t (or age t) starting at time 0  Cumulative risk over the time interval assuming no competing risks  “Disease prevalence” if competing risks Risk of disease within a time interval Risk ratio at time t (or age t) for individuals in Group 1 (exposed) versus Group 0 (unexposed)  As time increases probabilities increase for common diseases  Risk ratio is usually not constant over time Risk difference at time t (or age t)

BIOST 536 Lecture 1 11 Cohort model 2 x 2 Table Cohort study design  Assemble two disease-free groups: n 1 Exposed and n 0 Unexposed individuals  Follow both groups for exactly the same period of time and measure occurrence of disease (Y=1 disease; Y=0 if no disease)  Then let r 1 = Exposed with disease and r 0 = Unexposed with disease

BIOST 536 Lecture 1 12 Measures of Association for 2 x 2 Tables Risk difference  Excess risk model  Measures absolute effect of exposure Risk ratio  Measures relative effect of exposure  Ratio is constrained by p 0 If p 0 = 0.50 then max RR = 2.0 If p 0 is small then little range restriction

BIOST 536 Lecture 1 13 Measures of Association for 2 x 2 Tables - continued Odds ratio  Measures relative effect of exposure  Not constrained  Can switch rows and columns and get the same OR or 1/OR if transposed  Natural parameter for statistical inference Logistic regression models log odds ratio Asymptotic p-values and CI’s work well even in small to moderate sample sizes  OR approximates the RR for outcomes with low probability

BIOST 536 Lecture 1 14 Statistical Inference Null hypothesis Theoretically equivalent, but not mathematically, equivalent tests Estimation X=1X=0Total Y=1 r 1 = a r 0 = ba+b Y=0 n 1 - r 1 = c n 0 - r 0 = dc+d Total n 1 n 0 N

BIOST 536 Lecture 1 15 Estimation of Risk Ratio Note: RR for modeling Y=1 ≠ (1/RR) for modeling Y=0

BIOST 536 Lecture 1 16 Odds Ratio Estimation Note: OR for modeling Y=1 = (1/OR) for modeling Y=0

BIOST 536 Lecture 1 17 Simple Hypothetical Example: Prospective Study Randomized study with n=50 in each group Estimation Technically would be a statistically significant difference ExposedUnexposedTotal Y= Y= Total50 100

BIOST 536 Lecture 1 18 Simple Hypothetical Example continued Technically would not be statistically significant

BIOST 536 Lecture 1 19 Simple Hypothetical Example continued There are alternative CI calculations for OR Technically would not be statistically significant

BIOST 536 Lecture 1 20 Stata Epitab commands. csi , or | Exposed Unexposed | Total Cases | | 29 Noncases | | Total | | 100 | | Risk |.38.2 |.29 | | | Point estimate | [95% Conf. Interval] | Risk difference |.18 | Risk ratio | 1.9 | Attr. frac. ex. | | Attr. frac. pop | | Odds ratio | | (Cornfield) chi2(1) = 3.93 Pr>chi2 =

BIOST 536 Lecture csi , or woolf | Point estimate | [95% Conf. Interval] | Risk difference |.18 | Risk ratio | 1.9 | Attr. frac. ex. | | Attr. frac. pop | | Odds ratio | | (Woolf) chi2(1) = 3.93 Pr>chi2 = csi , or exact 1-sided Fisher's exact P = sided Fisher's exact P = Using Woolf’s estimator for the SE gives a non- significant OR as does Fisher’s exact test Standard chi-square test (“score test”) and Cornfield SE give significant results Use logistic regression

BIOST 536 Lecture list | r n x | | | 1. | | 2. | |. blogit r n x, or Logistic regression for grouped data Number of obs = 100 LR chi2(1) = 3.98 Prob > chi2 = Log likelihood = Pseudo R2 = _outcome | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval] x | Blogit is a blocked logistic regression program – will usually use a standard logistic program but data were grouped here Same CI as Woolf’s estimator Note that a SE is reported for the OR, but is not used in the z, Wald test, and CI which is based on log OR Wald p-value is 0.05; LR test p-value is

BIOST 536 Lecture 1 23 Framingham study 5209 individuals identified in 1948 in Framingham, MA Biennial exams for blood pressure, serum cholesterol, weight Endpoints include occurrence of coronary heart disease (CHD) and deaths due to  CHD including sudden death (MI)  Cerebrovascular accident (CVA)  Cancer (CA)  Other causes

BIOST 536 Lecture 1 24 Framingham study Stimulus for developing the method of logistic regression  Too many potential covariates to have tables of all combinations  Simultaneous consideration of several variables  Could have continuous variables; not necessary to categorize continuous covariates Linear discriminant analysis first used; later logistic regression  Linear discriminant analysis and logistic regression can be mathematically equivalent  LDA requires normality of covariates, but logistic regression does not

BIOST 536 Lecture 1 25 Framingham data analysis. use "H:\Biostat\Biost536\Fall2007\data\Datasets\framfull.dta", clear. summ Variable | Obs Mean Std. Dev. Min Max lexam | surv | cause | cexam | chd | cva | ca | oth | sex | age | ht | wt | sc1 | sc2 | dbp | sbp | mrw | smok |

BIOST 536 Lecture 1 26 Framingham data analysis. Restrict attention to males age 40+ with known values of serum cholesterol, smoking, and relative weight with no evidence of CHD at first exam Restrict attention to males age 40+ with known values of serum cholesterol, smoking, and relative weight with no evidence of CHD at first exam. drop if sex>1 | age 1000 | smok > 1000 | mrw > 1000 | cexam==1 (4299 observations deleted) Note that Stata treats missing values as large numbers so will eliminate observations above the cutoff value of 1000 ; better to just specify nonmissing values Note that Stata treats missing values as large numbers so will eliminate observations above the cutoff value of 1000 ; better to just specify nonmissing values. drop if sex>1 | age < 40 | sc1==. | smok==. | mrw ==. | cexam==1 (4299 observations deleted) Leaves 910 observations for analysis; create grouped variables for SBP, cholesterol, and ageLeaves 910 observations for analysis; create grouped variables for SBP, cholesterol, and age. gen bpg=sbp. recode bpg min/126=1 127/146=2 147/166=3 167/max=4. gen scg=sc1. recode scg min/199=1 200/219=2 220/259=3 260/max=4. gen agp=age. recode agp 40/44=1 45/49=2 50/54=3 55/59=4 60/max=5

BIOST 536 Lecture Compare levels of CHD risk by grouped variables Compare levels of CHD risk by grouped variables. tab chd agp | agp chd | | Total | | | | Total | | 910. tabodds chd agp, or agp | Odds Ratio chi2 P>chi2 [95% Conf. Interval] | | | | | Test of homogeneity (equal odds): chi2(4) = 9.51 Pr>chi2 = Score test for trend of odds: chi2(1) = 7.55 Pr>chi2 = Overall significant association of age with CHD risk that appears to increase with ageOverall significant association of age with CHD risk that appears to increase with age

BIOST 536 Lecture tab chd bpg | bpg chd | | Total | | | | Total | | 910. tabodds chd bpg, or bpg | Odds Ratio chi2 P>chi2 [95% Conf. Interval] | | | | Test of homogeneity (equal odds): chi2(3) = Pr>chi2 = Score test for trend of odds: chi2(1) = Pr>chi2 = Overall significant association of SBP with CHD risk that appears to increase with SBP; most of the heterogeneity is explained by the trend

BIOST 536 Lecture tab chd scg | scg chd | | Total | | | | Total | | 910. tabodds chd scg, or scg | Odds Ratio chi2 P>chi2 [95% Conf. Interval] | | | | Test of homogeneity (equal odds): chi2(3) = Pr>chi2 = Score test for trend of odds: chi2(1) = Pr>chi2 = Overall significant association of cholesterol with CHD risk; trend test is significant but there is inconsistency in the odds ratios as cholesterol increases

BIOST 536 Lecture 1 30 Course Overview Summary Primarily interested in binary outcomes  2 x 2 table (Case status x Exposed/Unexposed)  2 x K table (Case status x Level of exposure)  Stratified 2 x 2 tables (Case status x Exposure controlling for confounder)  Case status x scientific variable(s) of interest controlling for other covariates Statistical methods  Hypothesis testing (Wald, score, and likelihood ratio tests, permutation tests for small samples)  Parameter estimation and CI’s (usually odds ratios)  Stratification (implicitly controlling for confounders)  Explicitly modeling confounders