Overview of Linear Models Webinar: Tuesday, May 22, 2012 Deborah Rosenberg, PhD Research Associate Professor Division of Epidemiology and Biostatistics.

Overview of Linear Models Webinar: Tuesday, May 22, 2012 Deborah Rosenberg, PhD Research Associate Professor Division of Epidemiology and Biostatistics University of IL School of Public Health Training Course in MCH Epidemiology 0

t -3-20123 0.0 0.1 0.2 0.3 0.4 Density of Student's t with 10 d.f. x 051015 0.0 0.1 0.2 0.3 0.4 0.5 0.6 Chi-Square Densities 1 d.f. 2 d.f. 3 d.f. 5 d.f. 8 d.f. Training Course in MCH EPI, 2012 Course Topics Focusing on Multivariable Regression Model Building Approaches Modeling Ordinal and Nominal Outcomes Multilevel Modeling Trend Analysis Population Attributable Fraction Propensity Scores Modeling Risk Differences We need to have some perspective... 1

t -3-20123 0.0 0.1 0.2 0.3 0.4 Density of Student's t with 10 d.f. x 051015 0.0 0.1 0.2 0.3 0.4 0.5 0.6 Chi-Square Densities 1 d.f. 2 d.f. 3 d.f. 5 d.f. 8 d.f. Introduction So, let's keep this in mind: "...technical expertise and methodology are not substitutes for conceptual coherence. Or, as one student remarked a few years ago, public health spends too much time on the "p" values of biostatistics and not enough time on values." Medicine and Public Health, Ethics and Human Rights Jonathan M. Mann The Hastings Center Report, Vol. 27, No. 3 (May - Jun., 1997), pp. 6-13 Published by: The Hastings CenterThe Hastings Center 2

t -3-20123 0.0 0.1 0.2 0.3 0.4 Density of Student's t with 10 d.f. x 051015 0.0 0.1 0.2 0.3 0.4 0.5 0.6 Chi-Square Densities 1 d.f. 2 d.f. 3 d.f. 5 d.f. 8 d.f. Introduction  Multivariable analysis implies acknowledging and accounting for the intricacies of the real world reflected in the relationships among a set of variables  Multivariable analysis is complex, particularly with observational as opposed to experimental data.  The accuracy of estimates from multivariable analysis and therefore the accuracy of conclusions drawn and any public health action taken is dependent on the application of appropriate analytic methods. 3

t -3-20123 0.0 0.1 0.2 0.3 0.4 Density of Student's t with 10 d.f. x 051015 0.0 0.1 0.2 0.3 0.4 0.5 0.6 Chi-Square Densities 1 d.f. 2 d.f. 3 d.f. 5 d.f. 8 d.f. Introduction The challenge for an MCH epidemiologist goes beyond carrying out complex multivariable analysis to include: advocating for and facilitating the routine incorporation of complex multivariable methods into the work of public health agencies, and  guiding interpretation of findings  working to design reporting templates  working to build dissemination strategies  working to link findings with action plans or policy recommendations 4

t -3-20123 0.0 0.1 0.2 0.3 0.4 Density of Student's t with 10 d.f. x 051015 0.0 0.1 0.2 0.3 0.4 0.5 0.6 Chi-Square Densities 1 d.f. 2 d.f. 3 d.f. 5 d.f. 8 d.f. 5 Basic Components of Any Statistical Analysis 1.Sample statistic(s) (observed value(s)) 2.Population parameter(s) (expected value(s)) 3.Sample Size 4.Sample variance(s)/standard error(s) 5.Critical values from the appropriate probability distribution, p, r Review of the Basics , , n z, t, chi-square, F

t -3-20123 0.0 0.1 0.2 0.3 0.4 Density of Student's t with 10 d.f. x 051015 0.0 0.1 0.2 0.3 0.4 0.5 0.6 Chi-Square Densities 1 d.f. 2 d.f. 3 d.f. 5 d.f. 8 d.f. 6 Review of the Basics The study design and the sampling strategy—cohort, case-control, cross-sectional, longitudinal, etc. will have an impact on the statistical analysis that can be carried out: Which measures of occurrence can be reported Which measures of association can be reported How will standard errors for confidence intervals and statistical testing be calculated

t -3-20123 0.0 0.1 0.2 0.3 0.4 Density of Student's t with 10 d.f. x 051015 0.0 0.1 0.2 0.3 0.4 0.5 0.6 Chi-Square Densities 1 d.f. 2 d.f. 3 d.f. 5 d.f. 8 d.f. 7 Review of the Basics Measures of Occurrence Means summarize continuous variables and are assumed to follow a normal distribution. Proportions summarize discrete variables and are assumed to follow the Binomial distribution. Some proportions are also said to be Poisson distributed if the numerator is very small compared to the denominator. Rates, also based on discrete variables, are typically said to be Poisson distributed.

t -3-20123 0.0 0.1 0.2 0.3 0.4 Density of Student's t with 10 d.f. x 051015 0.0 0.1 0.2 0.3 0.4 0.5 0.6 Chi-Square Densities 1 d.f. 2 d.f. 3 d.f. 5 d.f. 8 d.f. 8 Review of the Basics Measures of Association Difference Measures Between two or more means Between two or more proportions (attributable risk) Between a mean & a standard Between a proportion & a standard Ratio Measures Relative Risk / Relative Prevalence Odds Ratio Rate Ratio / Hazard Ratio

t -3-20123 0.0 0.1 0.2 0.3 0.4 Density of Student's t with 10 d.f. x 051015 0.0 0.1 0.2 0.3 0.4 0.5 0.6 Chi-Square Densities 1 d.f. 2 d.f. 3 d.f. 5 d.f. 8 d.f. 9 Review of the Basics The 2x2 table—framework for constructing the ratio measures

t -3-20123 0.0 0.1 0.2 0.3 0.4 Density of Student's t with 10 d.f. x 051015 0.0 0.1 0.2 0.3 0.4 0.5 0.6 Chi-Square Densities 1 d.f. 2 d.f. 3 d.f. 5 d.f. 8 d.f. 10 Review of the Basics Assessing the Accuracy of Statistics We use probability distributions to evaluate how close or far from the “truth” our statistics are by calculating a range of values which includes the “true” population value with a given probability. This range is a confidence interval, and can be calculated around both measures of occurrence, e.g. incidence or prevalence, and measures of association, e.g. odds ratios or relative risks..

t -3-20123 0.0 0.1 0.2 0.3 0.4 Density of Student's t with 10 d.f. x 051015 0.0 0.1 0.2 0.3 0.4 0.5 0.6 Chi-Square Densities 1 d.f. 2 d.f. 3 d.f. 5 d.f. 8 d.f. 11 Review of the Basics Tests of Statistical Significance Confidence intervals around measures of association provide evidence for or against equality. Statistical tests go beyond this by generating a specific probability that a given difference we see in our sample is due solely to chance imposed by the sampling process. This probability is the p-value.

t -3-20123 0.0 0.1 0.2 0.3 0.4 Density of Student's t with 10 d.f. x 051015 0.0 0.1 0.2 0.3 0.4 0.5 0.6 Chi-Square Densities 1 d.f. 2 d.f. 3 d.f. 5 d.f. 8 d.f. 12 Review of the Basics We again use probability distributions to formally test hypotheses about sample statistics.

t -3-20123 0.0 0.1 0.2 0.3 0.4 Density of Student's t with 10 d.f. x 051015 0.0 0.1 0.2 0.3 0.4 0.5 0.6 Chi-Square Densities 1 d.f. 2 d.f. 3 d.f. 5 d.f. 8 d.f. Review of the Basics Multivariable modeling should be the culmination of an analytic strategy that includes articulating a conceptual framework and carrying out preliminary analysis. BEFORE any multivariable modeling— Select variables of interest Define levels of measurement, sometimes more than once, for a given variable Examine univariate distributions Examine bivariate distributions 13

t -3-20123 0.0 0.1 0.2 0.3 0.4 Density of Student's t with 10 d.f. x 051015 0.0 0.1 0.2 0.3 0.4 0.5 0.6 Chi-Square Densities 1 d.f. 2 d.f. 3 d.f. 5 d.f. 8 d.f. Review of the Basics BEFORE any multivariable modeling— Perform single factor stratified analysis to assess confounding and effect modification Rethink variables and levels of measurement Perform multiple factor stratified analysis with different combinations of potential confounders / effect modifiers These steps should never be skipped! 14

t -3-20123 0.0 0.1 0.2 0.3 0.4 Density of Student's t with 10 d.f. x 051015 0.0 0.1 0.2 0.3 0.4 0.5 0.6 Chi-Square Densities 1 d.f. 2 d.f. 3 d.f. 5 d.f. 8 d.f. 15 With confounding, the association between a risk factor and a health outcome is the same (or close to the same) in each stratum, but the adjusted association differs from the crude. With effect modification, the association between a risk factor and a health outcome varies from stratum to stratum. Review of the Basics

t -3-20123 0.0 0.1 0.2 0.3 0.4 Density of Student's t with 10 d.f. x 051015 0.0 0.1 0.2 0.3 0.4 0.5 0.6 Chi-Square Densities 1 d.f. 2 d.f. 3 d.f. 5 d.f. 8 d.f. 16 Review of the Basics Assessing Effect Modification Stratified Analysis: Are the stratum-specific measures of association different (heterogeneous)? Regression Analysis: Is the beta coefficient resulting from the multiplication of two variables large? Regardless of the method, if the stratum-specific estimates differ, then reporting a weighted average will mask the important stratum-specific differences. Stratum-specific differences can be statistically tested.

t -3-20123 0.0 0.1 0.2 0.3 0.4 Density of Student's t with 10 d.f. x 051015 0.0 0.1 0.2 0.3 0.4 0.5 0.6 Chi-Square Densities 1 d.f. 2 d.f. 3 d.f. 5 d.f. 8 d.f. 17 Review of the Basics Assessing Confounding Standardization: Does the standardized measure differ from the unstandardized measure? Stratified Analysis: Does the adjusted measure of association differ from the crude measure of association? Regression Analysis: Does the beta coefficient for a variable in a model that includes a potential confounder differ from the beta coefficient for that same variable in a model that does not include the potential confounder?

t -3-20123 0.0 0.1 0.2 0.3 0.4 Density of Student's t with 10 d.f. x 051015 0.0 0.1 0.2 0.3 0.4 0.5 0.6 Chi-Square Densities 1 d.f. 2 d.f. 3 d.f. 5 d.f. 8 d.f. 18 Review of the Basics Assessing Confounding Regardless of the method, if the adjusted estimate differs from the crude estimate of association, then confounding is present. Determining whether a difference between the crude and adjusted measures is meaningful is a matter of judgment, since there is no formal statistical test for the presence of confounding. By convention, epidemiologists consider confounding to be present if the adjusted measure of association differs from the crude measure by >= 10%

t -3-20123 0.0 0.1 0.2 0.3 0.4 Density of Student's t with 10 d.f. x 051015 0.0 0.1 0.2 0.3 0.4 0.5 0.6 Chi-Square Densities 1 d.f. 2 d.f. 3 d.f. 5 d.f. 8 d.f. 19 Review of the Basics Moving toward Multivariable Modeling: Jointly Assessing a Set (but which set?) of Variables “A sufficient confounder group is a minimal set of one or more risk factors whose simultaneous control in the analysis will correct for joint confounding in the estimation of the effect of interest. Here, 'minimal' refers to the property that, for any such set of variables, no variable can be removed from the set without sacrificing validity.” Kleinbaum, DG, Kupper, LL., Morgenstern,H. Epidemiologic Research: Principles and Quantitative Methods, Nostrand Reinhold Company, New York, 1982, p 276.

t -3-20123 0.0 0.1 0.2 0.3 0.4 Density of Student's t with 10 d.f. x 051015 0.0 0.1 0.2 0.3 0.4 0.5 0.6 Chi-Square Densities 1 d.f. 2 d.f. 3 d.f. 5 d.f. 8 d.f. 20 Linear Models: General Considerations The most common regression models used to analyze health data express the hypothesized association between risk or other factors and an outcome as a linear (straight line) relationship: Dependent Var. = ------Independent Variables------ This equation is relevant to any linear model; what differentiates one modeling approach from another is the structure of the outcome variable, and the corresponding structure of the errors.

t -3-20123 0.0 0.1 0.2 0.3 0.4 Density of Student's t with 10 d.f. x 051015 0.0 0.1 0.2 0.3 0.4 0.5 0.6 Chi-Square Densities 1 d.f. 2 d.f. 3 d.f. 5 d.f. 8 d.f. 21 Linear Models: General Considerations The straight line relationship includes an intercept and one or more slope parameters. The differences between the actual data points and the regression line are the errors.

t -3-20123 0.0 0.1 0.2 0.3 0.4 Density of Student's t with 10 d.f. x 051015 0.0 0.1 0.2 0.3 0.4 0.5 0.6 Chi-Square Densities 1 d.f. 2 d.f. 3 d.f. 5 d.f. 8 d.f. Linear Models: General Considerations Regression analysis is an alternative to and an extension of simpler methods used to test hypotheses about associations: For means, regression analysis is an extension of t-tests and analysis of variance. For proportions or rates,, regression analysis is an extension of chi-square tests from contingency tables – crude and stratified analysis. 22

t -3-20123 0.0 0.1 0.2 0.3 0.4 Density of Student's t with 10 d.f. x 051015 0.0 0.1 0.2 0.3 0.4 0.5 0.6 Chi-Square Densities 1 d.f. 2 d.f. 3 d.f. 5 d.f. 8 d.f. Linear Models: General Considerations Why not just do stratified analysis? Why Use Regression Modeling Approaches? Unlike stratified analysis, regression approaches: 1. more efficiently handle many variables and the sparse data that stratification by many factors may imply 2. can accommodate both continuous and discrete variables, both as outcomes and as independent variables. 23

t -3-20123 0.0 0.1 0.2 0.3 0.4 Density of Student's t with 10 d.f. x 051015 0.0 0.1 0.2 0.3 0.4 0.5 0.6 Chi-Square Densities 1 d.f. 2 d.f. 3 d.f. 5 d.f. 8 d.f. Linear Models: General Considerations Unlike stratified analysis, regression approaches: 3. allow for examination of multiple factors (independent variables) simultaneously in relation to an outcome (dependent variable)—all variables can be considered "exposures" or "covariates" depending on the hypotheses 4. provide more flexibility in assessing effect modification and controlling confounding. 24

t -3-20123 0.0 0.1 0.2 0.3 0.4 Density of Student's t with 10 d.f. x 051015 0.0 0.1 0.2 0.3 0.4 0.5 0.6 Chi-Square Densities 1 d.f. 2 d.f. 3 d.f. 5 d.f. 8 d.f. Linear Models: General Considerations The Purpose of Modeling Sometimes, regression modeling is carried out in order to assess one association; other variables are included to adjust for confounding or account for effect modification. In this scenario, the focus is on obtaining the ‘best’ estimate of the single association. Sometimes, regression modeling is carried out in order to assess multiple, competing exposures, or to identify a set of variables that together predict the outcome. 25

t -3-20123 0.0 0.1 0.2 0.3 0.4 Density of Student's t with 10 d.f. x 051015 0.0 0.1 0.2 0.3 0.4 0.5 0.6 Chi-Square Densities 1 d.f. 2 d.f. 3 d.f. 5 d.f. 8 d.f. 26 Linear Model: General Considerations The utility of regression models is their ability to simultaneously handle many independent variables. Models may be quite complex, including both continuous and discrete measures, and measures at the individual level and/or at an aggregate level such as census tract, zip code, or county. Interpretation of the slopes or “beta coefficients” can be equally complex as they reflect measures of occurrence (means, proportions, rates) or measures of association (odds ratios, relative risks rate ratios) when used singly or in combination.

t -3-20123 0.0 0.1 0.2 0.3 0.4 Density of Student's t with 10 d.f. x 051015 0.0 0.1 0.2 0.3 0.4 0.5 0.6 Chi-Square Densities 1 d.f. 2 d.f. 3 d.f. 5 d.f. 8 d.f. 27 Linear Models: General Considerations The Traditional, 'Normal' Regression Model This model has the following properties: The outcome "Y" is continuous & normally distributed. The Y values are independent. The errors are independent, normally distributed; their sum equals 0, with constant variance across levels of X. The expected value (mean) of the Y's is linearly related to X (a straight line relationship exists).

t -3-20123 0.0 0.1 0.2 0.3 0.4 Density of Student's t with 10 d.f. x 051015 0.0 0.1 0.2 0.3 0.4 0.5 0.6 Chi-Square Densities 1 d.f. 2 d.f. 3 d.f. 5 d.f. 8 d.f. 28 Linear Models: General Considerations When the outcome variable is not continuous and normally distributed, a linear model cannot be written in the same way, and the properties listed above no longer pertain. For example, if the outcome variable is a proportion or rate: The errors are not normally distributed The variance across levels of X is not constant. (By definition, p(1-p) changes with p and r changes with r). The expected value (proportion or rate) is not linearly related to X (a straight line relationship does not exist).

t -3-20123 0.0 0.1 0.2 0.3 0.4 Density of Student's t with 10 d.f. x 051015 0.0 0.1 0.2 0.3 0.4 0.5 0.6 Chi-Square Densities 1 d.f. 2 d.f. 3 d.f. 5 d.f. 8 d.f. 29 When an outcome is a proportion or rate, its relationship with a risk factors is not linear. Linear Models: General Considerations Proportion with the outcome x

t -3-20123 0.0 0.1 0.2 0.3 0.4 Density of Student's t with 10 d.f. x 051015 0.0 0.1 0.2 0.3 0.4 0.5 0.6 Chi-Square Densities 1 d.f. 2 d.f. 3 d.f. 5 d.f. 8 d.f. 30 Linear Models: General Considerations General Linear Models How can a linear modeling approach be applied to the many health outcomes that are proportions or rates? The normal, binomial, Poisson, exponential, chi-square, and multinomial distributions are all in the exponential family. Therefore, it is possible to define a “link function” that transforms an outcome variable from any of these distributions so that it is linearly related to a set of independent variables; the error terms can also be defined to correspond to the form of the outcome variable.

t -3-20123 0.0 0.1 0.2 0.3 0.4 Density of Student's t with 10 d.f. x 051015 0.0 0.1 0.2 0.3 0.4 0.5 0.6 Chi-Square Densities 1 d.f. 2 d.f. 3 d.f. 5 d.f. 8 d.f. 31 Linear Models: General Considerations General Linear Models Some common link functions: identity (untransformed) natural log logit cumulative logit generalized logit The interpretation of the parameter estimates—the beta coefficients—changes depending on whether and how the outcome variable has been transformed (which link function has been used).

t -3-20123 0.0 0.1 0.2 0.3 0.4 Density of Student's t with 10 d.f. x 051015 0.0 0.1 0.2 0.3 0.4 0.5 0.6 Chi-Square Densities 1 d.f. 2 d.f. 3 d.f. 5 d.f. 8 d.f. 32 Linear Models: General Considerations Linear equation The logit link function: (logistic regression) Non-linear equation

t -3-20123 0.0 0.1 0.2 0.3 0.4 Density of Student's t with 10 d.f. x 051015 0.0 0.1 0.2 0.3 0.4 0.5 0.6 Chi-Square Densities 1 d.f. 2 d.f. 3 d.f. 5 d.f. 8 d.f. 33 Linear Models: General Considerations The natural log link function: log-binomial or Poisson regression with count data Non-linear model The linear model

t -3-20123 0.0 0.1 0.2 0.3 0.4 Density of Student's t with 10 d.f. x 051015 0.0 0.1 0.2 0.3 0.4 0.5 0.6 Chi-Square Densities 1 d.f. 2 d.f. 3 d.f. 5 d.f. 8 d.f. 34 Linear Models: General Considerations 'Normal' Regression—Link=Identity, Dist=Normal Logistic Regression—Link=Logit, Dist=Binomial Log-Binomial or Poisson Regression with Count Data— Link=Log, Dist=Binomial or Dist=Poisson

t -3-20123 0.0 0.1 0.2 0.3 0.4 Density of Student's t with 10 d.f. x 051015 0.0 0.1 0.2 0.3 0.4 0.5 0.6 Chi-Square Densities 1 d.f. 2 d.f. 3 d.f. 5 d.f. 8 d.f. Ordinal and Nominal Model For an ordinal outcomeFor a nominal outcomewith k+1 categories Both the numerator andFixed denominator denominator change(reference) category http://www.indiana.edu/%7Estatmath/stat/all/cat/2b1.html 35 Linear Models: General Considerations

t -3-20123 0.0 0.1 0.2 0.3 0.4 Density of Student's t with 10 d.f. x 051015 0.0 0.1 0.2 0.3 0.4 0.5 0.6 Chi-Square Densities 1 d.f. 2 d.f. 3 d.f. 5 d.f. 8 d.f. Some Models with Correlated Errors Mixed Models ♦ Multilevel/clustered data ♦ Repeated measures/longitudinal data ♦ Matched data ♦ Time series analysis ♦ Spatial analysis 36 Linear Models: General Considerations

t -3-20123 0.0 0.1 0.2 0.3 0.4 Density of Student's t with 10 d.f. x 051015 0.0 0.1 0.2 0.3 0.4 0.5 0.6 Chi-Square Densities 1 d.f. 2 d.f. 3 d.f. 5 d.f. 8 d.f. Some Other Multivariable Statistical Approaches ●Survival Analysis—censored data Parametric Semi-parametric / proportional hazards ●Structural Equation Modeling / mediation analysis—exploring causal pathways ●Bayesian modeling 37 Linear Models: General Considerations

t -3-20123 0.0 0.1 0.2 0.3 0.4 Density of Student's t with 10 d.f. x 051015 0.0 0.1 0.2 0.3 0.4 0.5 0.6 Chi-Square Densities 1 d.f. 2 d.f. 3 d.f. 5 d.f. 8 d.f. 38 Regression Modeling Results Measures of Occurrence Predicted Values: Crude, Adjusted, or Stratum-Specific The predicted values are points on the regression line given particular values of the set of independent variables ‘Normal’ model yields means Logistic model yields ln(odds) Binomial / Poisson models yield ln(proportions / rates) Linear Models: General Considerations

t -3-20123 0.0 0.1 0.2 0.3 0.4 Density of Student's t with 10 d.f. x 051015 0.0 0.1 0.2 0.3 0.4 0.5 0.6 Chi-Square Densities 1 d.f. 2 d.f. 3 d.f. 5 d.f. 8 d.f. 39 Linear Models: General Considerations Regression Modeling Results Measures of Association Beta coefficients: Crude, Adjusted, or Stratum-Specific The measures of association are comparisons of points on the regression line at differing values of the independent variables

t -3-20123 0.0 0.1 0.2 0.3 0.4 Density of Student's t with 10 d.f. x 051015 0.0 0.1 0.2 0.3 0.4 0.5 0.6 Chi-Square Densities 1 d.f. 2 d.f. 3 d.f. 5 d.f. 8 d.f. 40 Linear Models: General Considerations Regression Modeling Approaches Measures of Association ‘Normal” regression Differences between means Log-Binomial or Poisson regression Differences between log proportions: Relative Risk / Relative Prevalence Logistic regression (binary, cumulative, generalized) Differences between log odds: Odds Ratio(s) for—  a single binary outcome  a set of binary outcomes  an ordinal outcome Binomial Regression Differences between proportions: Risk Differences / Attributable Risks Poisson regression (person-time data) Differences between log rates: Rate Ratio

t -3-20123 0.0 0.1 0.2 0.3 0.4 Density of Student's t with 10 d.f. x 051015 0.0 0.1 0.2 0.3 0.4 0.5 0.6 Chi-Square Densities 1 d.f. 2 d.f. 3 d.f. 5 d.f. 8 d.f. 41 Regression Modeling Results Measures of Association General Form of Confidence Intervals and Hypothesis Testing for a Simple Comparison— a Single Beta Coefficient Linear Models: General Considerations

Common Linear Regression Models Examples with Smoking and Birthweight 42

t -3-20123 0.0 0.1 0.2 0.3 0.4 Density of Student's t with 10 d.f. x 051015 0.0 0.1 0.2 0.3 0.4 0.5 0.6 Chi-Square Densities 1 d.f. 2 d.f. 3 d.f. 5 d.f. 8 d.f. 43 ‘Normal’ Regression Predicted Values (Means): Predicted values use the entire regression equation, including the intercept. Measures of Association (Differences Between Means): When comparing two predicted values—a measure of association— the intercept terms cancel out.

t -3-20123 0.0 0.1 0.2 0.3 0.4 Density of Student's t with 10 d.f. x 051015 0.0 0.1 0.2 0.3 0.4 0.5 0.6 Chi-Square Densities 1 d.f. 2 d.f. 3 d.f. 5 d.f. 8 d.f. ‘Normal’ Regression in SAS /* Continuous Birthweight, OLS Regression */ proc reg data=one; model dbirwt = smoking; run; proc reg data=one; model dbirwt = smoking late_no_pnc; run; /* Continuous Birthweight, Regression Using ML */ proc genmod data=one; model dbirwt = smoking / link=identity dist=normal; run; proc genmod data=one; model dbirwt = smoking late_no_pnc / link=identity dist=normal; run; 44

t -3-20123 0.0 0.1 0.2 0.3 0.4 Density of Student's t with 10 d.f. x 051015 0.0 0.1 0.2 0.3 0.4 0.5 0.6 Chi-Square Densities 1 d.f. 2 d.f. 3 d.f. 5 d.f. 8 d.f. ‘Normal’ Regression Descriptive Statistics and Simple t-test for Smoking and Birthweight 45

t -3-20123 0.0 0.1 0.2 0.3 0.4 Density of Student's t with 10 d.f. x 051015 0.0 0.1 0.2 0.3 0.4 0.5 0.6 Chi-Square Densities 1 d.f. 2 d.f. 3 d.f. 5 d.f. 8 d.f. 46 'Normal' Regression “dbirwt” = Birthweight (grams) from vital records

t -3-20123 0.0 0.1 0.2 0.3 0.4 Density of Student's t with 10 d.f. x 051015 0.0 0.1 0.2 0.3 0.4 0.5 0.6 Chi-Square Densities 1 d.f. 2 d.f. 3 d.f. 5 d.f. 8 d.f. 'Normal' Regression model dbirwt = smoking; Predicted value for smokers: Mean birthweight = 3155.85 = 3352.74–196.89(1) Predicted value for non-smokers: Mean birthweight = 3352.74 = 3352.74–196.89(0) Measure of Association / comparison of predicted values: Difference between means = 3155.85-3352.74 = -196.89 95% CI = -196.89 +/- 1.96*6.29 = (-184.6, -209.2) 47

t -3-20123 0.0 0.1 0.2 0.3 0.4 Density of Student's t with 10 d.f. x 051015 0.0 0.1 0.2 0.3 0.4 0.5 0.6 Chi-Square Densities 1 d.f. 2 d.f. 3 d.f. 5 d.f. 8 d.f. 48 'Normal' Regression with OLS in SAS

t -3-20123 0.0 0.1 0.2 0.3 0.4 Density of Student's t with 10 d.f. x 051015 0.0 0.1 0.2 0.3 0.4 0.5 0.6 Chi-Square Densities 1 d.f. 2 d.f. 3 d.f. 5 d.f. 8 d.f. 49 Logistic Regression Predicted Values When the outcome is a proportion with a logistic transformation, the predicted values are log odds Dichotomous Independent Variable Coded 1 and 0: In general:

t -3-20123 0.0 0.1 0.2 0.3 0.4 Density of Student's t with 10 d.f. x 051015 0.0 0.1 0.2 0.3 0.4 0.5 0.6 Chi-Square Densities 1 d.f. 2 d.f. 3 d.f. 5 d.f. 8 d.f. 50 Logistic Regression Measures of Association—Beta Coefficients— Differences Between Log Odds, and the Odds Ratio Dichotomous Independent Variable Coded 1 and 0

t -3-20123 0.0 0.1 0.2 0.3 0.4 Density of Student's t with 10 d.f. x 051015 0.0 0.1 0.2 0.3 0.4 0.5 0.6 Chi-Square Densities 1 d.f. 2 d.f. 3 d.f. 5 d.f. 8 d.f. 51 Measures of Association—Beta Coefficients— Differences Between Log Odds, and the Odds Ratio In General, The beta coefficient is the change in the logit for every unit change in X. For an ordinal or continuous variable, the test of the beta coefficient will be a test of linear trend. Logistic Regression

t -3-20123 0.0 0.1 0.2 0.3 0.4 Density of Student's t with 10 d.f. x 051015 0.0 0.1 0.2 0.3 0.4 0.5 0.6 Chi-Square Densities 1 d.f. 2 d.f. 3 d.f. 5 d.f. 8 d.f. 52 Confidence Intervals for Estimated Odds Ratios from a Logistic Regression Model For dichotomous variables coded 1 and 0: In general, for a single beta coefficient: where "diff" is the difference of interest in the values of the independent variable being analyzed Logistic Regression

t -3-20123 0.0 0.1 0.2 0.3 0.4 Density of Student's t with 10 d.f. x 051015 0.0 0.1 0.2 0.3 0.4 0.5 0.6 Chi-Square Densities 1 d.f. 2 d.f. 3 d.f. 5 d.f. 8 d.f. Logistic Regression in SAS /* Dichotomous Birthweight, Logistic Regression */ proc logistic order=formatted data=one; model lbw = smoking; run; proc logistic order=formatted data=one; model lbw = smoking late_no_pnc; run; proc genmod data=one; model lbw = smoking / link=logit dist=bin; estimate 'Crude OR smoking' smoking 1 / exp; run; proc genmod data=one; model lbw = smoking late_no_pnc / link=logit dist=bin; estimate 'AOR smoking' smoking 1 / exp; estimate 'AOR Late_no_pnc' late_no_pnc 1 / exp; run; 53

t -3-20123 0.0 0.1 0.2 0.3 0.4 Density of Student's t with 10 d.f. x 051015 0.0 0.1 0.2 0.3 0.4 0.5 0.6 Chi-Square Densities 1 d.f. 2 d.f. 3 d.f. 5 d.f. 8 d.f. 54 First looking at a contingency table using proc freq in SAS Crude Association between Smoking and Low Birthweight Logistic Regression

t -3-20123 0.0 0.1 0.2 0.3 0.4 Density of Student's t with 10 d.f. x 051015 0.0 0.1 0.2 0.3 0.4 0.5 0.6 Chi-Square Densities 1 d.f. 2 d.f. 3 d.f. 5 d.f. 8 d.f. 55 Output from proc logistic Logistic Regression

t -3-20123 0.0 0.1 0.2 0.3 0.4 Density of Student's t with 10 d.f. x 051015 0.0 0.1 0.2 0.3 0.4 0.5 0.6 Chi-Square Densities 1 d.f. 2 d.f. 3 d.f. 5 d.f. 8 d.f. 56 Logistic Regression Risk Diff 0.0582 0.0438-0.0727 Risk Diff 0.0387 0.0316-0.0457 Case-Control (OR) Mantel-Haenszel 1.8355 1.7028-1.9784 Cohort (RP) Mantel-Haenszel 1.7499 1.6349-1.8731 Is there is evidence of any confounding or effect modification?

t -3-20123 0.0 0.1 0.2 0.3 0.4 Density of Student's t with 10 d.f. x 051015 0.0 0.1 0.2 0.3 0.4 0.5 0.6 Chi-Square Densities 1 d.f. 2 d.f. 3 d.f. 5 d.f. 8 d.f. 57 Logistic Regression Output from proc logistic:

t -3-20123 0.0 0.1 0.2 0.3 0.4 Density of Student's t with 10 d.f. x 051015 0.0 0.1 0.2 0.3 0.4 0.5 0.6 Chi-Square Densities 1 d.f. 2 d.f. 3 d.f. 5 d.f. 8 d.f. Binomial and Poisson Regression Predicted Values When the outcome is a proportion with a natural log transformation, the predicted values are log proportions In general 58

t -3-20123 0.0 0.1 0.2 0.3 0.4 Density of Student's t with 10 d.f. x 051015 0.0 0.1 0.2 0.3 0.4 0.5 0.6 Chi-Square Densities 1 d.f. 2 d.f. 3 d.f. 5 d.f. 8 d.f. Binomial and Poisson Regression Measures of Association—Beta Coefficients— Differences Between Log Proportions/rates, and the Relative Prevalence / Relative Risk Dichotomous Independent Variable Coded 1 and 0 59

t -3-20123 0.0 0.1 0.2 0.3 0.4 Density of Student's t with 10 d.f. x 051015 0.0 0.1 0.2 0.3 0.4 0.5 0.6 Chi-Square Densities 1 d.f. 2 d.f. 3 d.f. 5 d.f. 8 d.f. Binomial and Poisson Regression In General, the beta coefficient is the change in the log proportion / rate for every unit change in X. 60

t -3-20123 0.0 0.1 0.2 0.3 0.4 Density of Student's t with 10 d.f. x 051015 0.0 0.1 0.2 0.3 0.4 0.5 0.6 Chi-Square Densities 1 d.f. 2 d.f. 3 d.f. 5 d.f. 8 d.f. Binomial and Poisson Regression T he more common the outcome, the greater the difference in the binomial and Poisson standard errors When the outcome is rare (e.g. per 10,000, per 100,000), the binomial and Poisson standard errors will be almost identical 61

t -3-20123 0.0 0.1 0.2 0.3 0.4 Density of Student's t with 10 d.f. x 051015 0.0 0.1 0.2 0.3 0.4 0.5 0.6 Chi-Square Densities 1 d.f. 2 d.f. 3 d.f. 5 d.f. 8 d.f. Binomial and Poisson Regression For infant mortality, calculated per 1,000 live births, what difference will using the binomial or Poisson distribution make? Suppose the IMR is 7 per 1,000, or 0.007: 62

t -3-20123 0.0 0.1 0.2 0.3 0.4 Density of Student's t with 10 d.f. x 051015 0.0 0.1 0.2 0.3 0.4 0.5 0.6 Chi-Square Densities 1 d.f. 2 d.f. 3 d.f. 5 d.f. 8 d.f. /* Dichotomous Birthweight, Log-Binomial Regression */ proc genmod data=one; model lbw = smoking / link=log dist=bin; estimate 'Crude RP smoking' smoking 1 / exp; run; proc genmod data=one; model lbw = smoking late_no_pnc / link=log dist=bin; estimate 'ARP smoking' smoking 1 / exp; estimate 'ARP Late_no_pnc' late_no_pnc 1 / exp; run; /* Dichotomous Birthweight, Poisson Regression */ proc genmod data=one; model lbw = smoking / link=log dist=poisson; estimate 'Crude RP smoking' smoking 1 / exp; run; proc genmod data=one; model lbw = smoking late_no_pnc / link=log dist=poisson; estimate 'ARP smoking' smoking 1 / exp; estimate 'ARP Late_no_pnc' late_no_pnc 1 / exp; run; 63 Binomial and Poisson Regression in SAS

t -3-20123 0.0 0.1 0.2 0.3 0.4 Density of Student's t with 10 d.f. x 051015 0.0 0.1 0.2 0.3 0.4 0.5 0.6 Chi-Square Densities 1 d.f. 2 d.f. 3 d.f. 5 d.f. 8 d.f. Binomial and Poisson Regression Output from proc genmod 64

t -3-20123 0.0 0.1 0.2 0.3 0.4 Density of Student's t with 10 d.f. x 051015 0.0 0.1 0.2 0.3 0.4 0.5 0.6 Chi-Square Densities 1 d.f. 2 d.f. 3 d.f. 5 d.f. 8 d.f. 65 Binomial and Poisson Regression Risk Diff 0.0582 0.0438-0.0727 Risk Diff 0.0387 0.0316-0.0457 Case-Control (OR) Mantel-Haenszel 1.8355 1.7028-1.9784 Cohort (RP) Mantel-Haenszel 1.7499 1.6349-1.8731

t -3-20123 0.0 0.1 0.2 0.3 0.4 Density of Student's t with 10 d.f. x 051015 0.0 0.1 0.2 0.3 0.4 0.5 0.6 Chi-Square Densities 1 d.f. 2 d.f. 3 d.f. 5 d.f. 8 d.f. Binomial and Poisson Regression Output from proc genmod 66

t -3-20123 0.0 0.1 0.2 0.3 0.4 Density of Student's t with 10 d.f. x 051015 0.0 0.1 0.2 0.3 0.4 0.5 0.6 Chi-Square Densities 1 d.f. 2 d.f. 3 d.f. 5 d.f. 8 d.f. Binomial and Poisson Regression Comparison between Binomial and Poisson Results Binomial Poissson 67

t -3-20123 0.0 0.1 0.2 0.3 0.4 Density of Student's t with 10 d.f. x 051015 0.0 0.1 0.2 0.3 0.4 0.5 0.6 Chi-Square Densities 1 d.f. 2 d.f. 3 d.f. 5 d.f. 8 d.f. Cumulative and Generalized Logit /*vlbw, mlbw, and normal bw as an ordinal variable*/ proc logistic order=formatted data=one; model bwcat = smoking; run; /*vlbw, mlbw, and normal bw as a nominal variable*/ proc logistic order=formatted data=one; model bwcat (ref='normal bw') = smoking / link=glogit; run; Since this is logistic regression, predicted values are log(odds) and the measures of association—the beta coefficients—are differences between the log odds ratios, which when exponentiated are odds ratios. 68

t -3-20123 0.0 0.1 0.2 0.3 0.4 Density of Student's t with 10 d.f. x 051015 0.0 0.1 0.2 0.3 0.4 0.5 0.6 Chi-Square Densities 1 d.f. 2 d.f. 3 d.f. 5 d.f. 8 d.f. Cumulative and Generalized Logit Output from proc logistic: Ordinal Birthweight 69

t -3-20123 0.0 0.1 0.2 0.3 0.4 Density of Student's t with 10 d.f. x 051015 0.0 0.1 0.2 0.3 0.4 0.5 0.6 Chi-Square Densities 1 d.f. 2 d.f. 3 d.f. 5 d.f. 8 d.f. Cumulative and Generalized Logit Output from proc logistic: Nominal Birthweight 70

t -3-20123 0.0 0.1 0.2 0.3 0.4 Density of Student's t with 10 d.f. x 051015 0.0 0.1 0.2 0.3 0.4 0.5 0.6 Chi-Square Densities 1 d.f. 2 d.f. 3 d.f. 5 d.f. 8 d.f. Risk Differences /* Dichotomous Birthweight, Modeling Risk Differences */ proc genmod data=one; model dbirwt = smoking / link=identity dist=bin; run; proc genmod data=one; model dbirwt = smoking late_no_pnc / link=identity dist=bin; run; Since the outcome variable is a proportion, but it is not transformed in any way, the predicted values are the proportions themselves, and the measures of association— the beta coefficients—are the differences in the proportions, or "risk" differences. 71

t -3-20123 0.0 0.1 0.2 0.3 0.4 Density of Student's t with 10 d.f. x 051015 0.0 0.1 0.2 0.3 0.4 0.5 0.6 Chi-Square Densities 1 d.f. 2 d.f. 3 d.f. 5 d.f. 8 d.f. 72 Risk Differences Risk Diff 0.0582 0.0438-0.0727 Risk Diff 0.0387 0.0316-0.0457 Case-Control (OR) Mantel-Haenszel 1.8355 1.7028-1.9784 Cohort (RP) Mantel-Haenszel 1.7499 1.6349-1.8731

t -3-20123 0.0 0.1 0.2 0.3 0.4 Density of Student's t with 10 d.f. x 051015 0.0 0.1 0.2 0.3 0.4 0.5 0.6 Chi-Square Densities 1 d.f. 2 d.f. 3 d.f. 5 d.f. 8 d.f. Risk Differences Output form proc genmod Crude and Adjusted Risk Differences 73

t -3-20123 0.0 0.1 0.2 0.3 0.4 Density of Student's t with 10 d.f. x 051015 0.0 0.1 0.2 0.3 0.4 0.5 0.6 Chi-Square Densities 1 d.f. 2 d.f. 3 d.f. 5 d.f. 8 d.f. Linear Models: General Considerations  Conceptual Framework  Level of measurement of the outcome variable  Unit of Analysis  Error Structure / Distribution  Hypothesis formulation  Continuous  Dichotomous  Polytomous-nominal  Polytomous-ordinal  Individual  Aggregate  Individual and aggregate  Uncorrelated  Correlated imposed by study design or by ‘natural’ structure of the data 74

t -3-20123 0.0 0.1 0.2 0.3 0.4 Density of Student's t with 10 d.f. x 051015 0.0 0.1 0.2 0.3 0.4 0.5 0.6 Chi-Square Densities 1 d.f. 2 d.f. 3 d.f. 5 d.f. 8 d.f. Until next week... Again, let's keep this in mind... "...technical expertise and methodology are not substitutes for conceptual coherence. Or, as one student remarked a few years ago, public health spends too much time on the "p" values of biostatistics and not enough time on values." Medicine and Public Health, Ethics and Human Rights Jonathan M. Mann The Hastings Center Report, Vol. 27, No. 3 (May - Jun., 1997), pp. 6-13 Published by: The Hastings CenterThe Hastings Center 75

Overview of Linear Models Webinar: Tuesday, May 22, 2012 Deborah Rosenberg, PhD Research Associate Professor Division of Epidemiology and Biostatistics.

Similar presentations

Presentation on theme: "Overview of Linear Models Webinar: Tuesday, May 22, 2012 Deborah Rosenberg, PhD Research Associate Professor Division of Epidemiology and Biostatistics."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Overview of Linear Models Webinar: Tuesday, May 22, 2012 Deborah Rosenberg, PhD Research Associate Professor Division of Epidemiology and Biostatistics.

Similar presentations

Presentation on theme: "Overview of Linear Models Webinar: Tuesday, May 22, 2012 Deborah Rosenberg, PhD Research Associate Professor Division of Epidemiology and Biostatistics."— Presentation transcript:

Similar presentations

About project

Feedback