Comparing methods for addressing limits of detection in environmental epidemiology Roni Kobrosly, PhD, MPH Department of Preventive Medicine Icahn School.

Slides:



Advertisements
Similar presentations
Uncertainty and confidence intervals Statistical estimation methods, Finse Friday , 12.45–14.05 Andreas Lindén.
Advertisements

Prediction, Correlation, and Lack of Fit in Regression (§11. 4, 11
Statistical Issues in Research Planning and Evaluation
Sensitivity Analysis for Observational Comparative Effectiveness Research Prepared for: Agency for Healthcare Research and Quality (AHRQ)
Cox Model With Intermitten and Error-Prone Covariate Observation Yury Gubman PhD thesis in Statistics Supervisors: Prof. David Zucker, Prof. Orly Manor.
CJT 765: Structural Equation Modeling Class 3: Data Screening: Fixing Distributional Problems, Missing Data, Measurement.
Chapter 15 (Ch. 13 in 2nd Can.) Association Between Variables Measured at the Interval-Ratio Level: Bivariate Correlation and Regression.
Some Terms Y =  o +  1 X Regression of Y on X Regress Y on X X called independent variable or predictor variable or covariate or factor Which factors.
 Once you know the correlation coefficient for your sample, you might want to determine whether this correlation occurred by chance.  Or does the relationship.
Correcting for measurement error in nutritional epidemiology Ruth Keogh MRC Biostatistics Unit MRC Centre for Nutritional Epidemiology in Cancer Prevention.
Statistics II: An Overview of Statistics. Outline for Statistics II Lecture: SPSS Syntax – Some examples. Normal Distribution Curve. Sampling Distribution.
C82MCP Diploma Statistics School of Psychology University of Nottingham 1 Overview of Lecture Independent and Dependent Variables Between and Within Designs.
How to deal with missing data: INTRODUCTION
Educational Research by John W. Creswell. Copyright © 2002 by Pearson Education. All rights reserved. Slide 1 Chapter 8 Analyzing and Interpreting Quantitative.
Today Concepts underlying inferential statistics
Review for Exam 2 Some important themes from Chapters 6-9 Chap. 6. Significance Tests Chap. 7: Comparing Two Groups Chap. 8: Contingency Tables (Categorical.
Statistical Methods for Missing Data Roberta Harnett MAR 550 October 30, 2007.
Introduction to Regression Analysis, Chapter 13,
Multiple Linear Regression A method for analyzing the effects of several predictor variables concurrently. - Simultaneously - Stepwise Minimizing the squared.
Correlation & Regression
Regression and Correlation Methods Judy Zhong Ph.D.
Inference for regression - Simple linear regression
LEARNING PROGRAMME Hypothesis testing Intermediate Training in Quantitative Analysis Bangkok November 2007.
Chapter 8 Introduction to Hypothesis Testing
Evidence-Based Medicine 4 More Knowledge and Skills for Critical Reading Karen E. Schetzina, MD, MPH.
Simple Linear Regression
N318b Winter 2002 Nursing Statistics Specific statistical tests: Correlation Lecture 10.
Section #6 November 13 th 2009 Regression. First, Review Scatter Plots A scatter plot (x, y) x y A scatter plot is a graph of the ordered pairs (x, y)
TWO-STAGE CASE-CONTROL STUDIES USING EXPOSURE ESTIMATES FROM A GEOGRAPHICAL INFORMATION SYSTEM Jonas Björk 1 & Ulf Strömberg 2 1 Competence Center for.
BPS - 3rd Ed. Chapter 211 Inference for Regression.
Evidence-Based Medicine 3 More Knowledge and Skills for Critical Reading Karen E. Schetzina, MD, MPH.
Review of Statistical Models and Linear Regression Concepts STAT E-150 Statistical Methods.
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
University of Ottawa - Bio 4118 – Applied Biostatistics © Antoine Morin and Scott Findlay 08/10/ :23 PM 1 Some basic statistical concepts, statistics.
Instructor Resource Chapter 5 Copyright © Scott B. Patten, Permission granted for classroom use with Epidemiology for Canadian Students: Principles,
01/20151 EPI 5344: Survival Analysis in Epidemiology Interpretation of Models March 17, 2015 Dr. N. Birkett, School of Epidemiology, Public Health & Preventive.
Random Regressors and Moment Based Estimation Prepared by Vera Tabakova, East Carolina University.
Lecture 8 Simple Linear Regression (cont.). Section Objectives: Statistical model for linear regression Data for simple linear regression Estimation.
April 4 Logistic Regression –Lee Chapter 9 –Cody and Smith 9:F.
1 Chapter 10: Introduction to Inference. 2 Inference Inference is the statistical process by which we use information collected from a sample to infer.
Issues concerning the interpretation of statistical significance tests.
© 2006 by The McGraw-Hill Companies, Inc. All rights reserved. 1 Chapter 12 Testing for Relationships Tests of linear relationships –Correlation 2 continuous.
–The shortest distance is the one that crosses at 90° the vector u Statistical Inference on correlation and regression.
Master’s Essay in Epidemiology I P9419 Methods Luisa N. Borrell, DDS, PhD October 25, 2004.
Missing Values Raymond Kim Pink Preechavanichwong Andrew Wendel October 27, 2015.
BC Jung A Brief Introduction to Epidemiology - XIII (Critiquing the Research: Statistical Considerations) Betty C. Jung, RN, MPH, CHES.
Simulation Study for Longitudinal Data with Nonignorable Missing Data Rong Liu, PhD Candidate Dr. Ramakrishnan, Advisor Department of Biostatistics Virginia.
Chapter 8: Simple Linear Regression Yang Zhenlin.
Université d’Ottawa / University of Ottawa 2001 Bio 8100s Applied Multivariate Biostatistics L1a.1 Lecture 1a: Some basic statistical concepts l The use.
Linear Correlation (12.5) In the regression analysis that we have considered so far, we assume that x is a controlled independent variable and Y is an.
Education 793 Class Notes Inference and Hypothesis Testing Using the Normal Distribution 8 October 2003.
26134 Business Statistics Week 4 Tutorial Simple Linear Regression Key concepts in this tutorial are listed below 1. Detecting.
Introduction to Biostatistics, Harvard Extension School, Fall, 2005 © Scott Evans, Ph.D.1 Contingency Tables.
AP Statistics Section 15 A. The Regression Model When a scatterplot shows a linear relationship between a quantitative explanatory variable x and a quantitative.
BPS - 5th Ed. Chapter 231 Inference for Regression.
Direct method of standardization of indices. Average Values n Mean:  the average of the data  sensitive to outlying data n Median:  the middle of the.
Chapter 22 Inferential Data Analysis: Part 2 PowerPoint presentation developed by: Jennifer L. Bellamy & Sarah E. Bledsoe.
NURS 306, Nursing Research Lisa Broughton, MSN, RN, CCRN RESEARCH STATISTICS.
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
Correlation & Simple Linear Regression Chung-Yi Li, PhD Dept. of Public Health, College of Med. NCKU 1.
26134 Business Statistics Week 4 Tutorial Simple Linear Regression Key concepts in this tutorial are listed below 1. Detecting.
1 Basics of Inferential Statistics Mark A. Weaver, PhD Family Health International Office of AIDS Research, NIH ICSSC, FHI Lucknow, India, March 2010.
Inference about the slope parameter and correlation
Hex-Tox 논문초독회 이 장 우.
12 Inferential Analysis.
Jeffrey E. Korte, PhD BMTRY 747: Foundations of Epidemiology II
Comparisons among methods to analyze clustered multivariate biomarker predictors of a single binary outcome Xiaoying Yu, PhD Department of Preventive Medicine.
12 Inferential Analysis.
Missing Data Mechanisms
Research Techniques Made Simple: Interpreting Measures of Association in Clinical Research Michelle Roberts PhD,1,2 Sepideh Ashrafzadeh,1,2 Maryam Asgari.
Presentation transcript:

Comparing methods for addressing limits of detection in environmental epidemiology Roni Kobrosly, PhD, MPH Department of Preventive Medicine Icahn School of Medicine at Mount Sinai

A familiar diagram… Environmental Exposure Internal Dose Biologically Effective Dose Altered Structure/ Function Clinical Disease Biomarker of Exposure DeCaprio, 1997

Biomarkers and Limits of Detection (LOD)

It is difficult to quantify the concentration because it is so low LOD Higher concentration

Handling LODs in analysis Easiest approach: simply delete these observations Problems with this: o However, values < LOD are informative: analyte may have a concentration between 0 and LOD o Studies are expensive and you lose covariate data! o Excluding observations from analyses *may* substantially bias results Chen et al. 2011

Handling LODs in analysis Hornung & Reed describe approach that involves substituting a single value for each observation <LOD Three suggested substitutions: LOD/2, LOD/√2, or just LOD Problem: Replacing a sizable portion of the data with a single value increases the likelihood of bias and reduces power! Helsel, 2005; Hughes 2000; Hornung & Reed, 1990

Citations in Google Scholar Hornung & Reed, 1990

Comparing LOD methods While there are many studies testing individual methods, relatively little work comparing performance of several methods Even fewer studies have compared methods in context of multivariable data Comparative studies that do exist provide contradictory recommendations. No consensus!

Simulation Study Objectives Compare performance of LOD methods when independent variable is subject to limit of detection in multiple regression Compare performance across a range of “experimental” conditions Create flowchart to aid researchers in their analysis decision making

Statistical Bias Nat’l Library of Med definition: “Any deviation of results or inferences from the truth” UnbiasedBiased

Variable Definitions Four continuous variables: Y: Dependent variable (outcome) X: Independent variable (exposure, subject to LOD) C1, C2: Independent variables (covariates)

6 “Experimental Conditions” 1) Dataset sample size: n = {100, 500}

2) % of exposure variable with values in LOD region: LOD % = {0.05, 0.25}

3) Distribution of Exposure Variable: Normal versus Skewed

4) R 2 of full model: R 2 = {0.10, 0.20}

5) Strength & direction of exposure-outcome association: Beta = {-10, 0, 10}

6) Direction of confounding: Strong Positive, versus Strong Negative, versus None + -

LOD methods considered 1. Deletion of subjects with LOD values 2. Substitution with LOD/√(2) 3. Substitution with LOD/2 4. Substitution with just LOD value 5. Multiple imputation ( King’s Amelia II ) 6. MLE-imputation method ( Helsel & Krishnamoorthy )

Method 1: Deletion YXC1C <LOD <LOD YXC1C <LOD <LOD

Method 2: Sub with LOD/√(2) YXC1C <LOD <LOD LOD X = 9.0 YXC1C /√2 = 6.4

Method 3: Sub with LOD/(2) YXC1C <LOD <LOD LOD X = 9.0 YXC1C /2 = 4.5

Method 4: Sub with just LOD YXC1C <LOD <LOD LOD X = 9.0 YXC1C

Method 5: Multiple Imputation “Amelia II” by Dr. Gary King Assumes pattern of observations below LOD only depends on observed data (not unobserved data) Lets you constrain imputed values (very helpful when working with LODs!)

Method 5: Multiple Imputation YXC1C <LOD <LOD YXC1C YXC1C YXC1C YXC1C YXC1C M = 5

Method 5: Multiple Imputation YXC1C YXC1C YXC1C YXC1C YXC1C β 1 = 10.1 β 2 = 9.5β 3 = 8.3β 4 = 12.1 β 5 = 10.4

Method 6: MLE-Imputation YXC1C <LOD <LOD

Method 6: MLE-Imputation YXC1C <LOD <LOD

Method 6: MLE-Imputation YXC1C

Two-step Data Generation Process 1 st Step: Select “true” regression parameters for following two models: o 2 nd Step: Use “true” parameters to guide the drawing of random numbers

“TRUTH” Y = (X) + 4.5(C1) + 6(C2) Dataset1.1Dataset1.2Dataset1.3 SIMULATED DATASETS X = (C1) + 1.5(C2) Obs #YXC1C iyiyi xixi c1 i c2 i

Y = (X) + 4.5(C1) + 6(C2) Create a set of “true” parameters Dataset1.1 Dataset1.2 Dataset1.3 Dataset Create 1500 simulated datasets for set of “true” parameters, using specific set of experimental conditions Apply a LOD correction method and run regression for each dataset Bias = 2.2 – 2 = 0.2 Take difference of estimated coefficient and “true” parameter. Produce 1000 bias estimates with 95% CI’s

Help from Minerva Minerva runtime ~ 5 minutes

n = 100, 25% LOD, Skewed Dist, R 2 = 0.20, Negative X-Y Association, Negative confounding Mean Bias (with 95% CI) Deletion LOD/sqrt(2) LOD/2 LOD Multi Impu MLE Impu -2.0

Mean Bias (with 95% CI) Deletion LOD/sqrt(2) LOD/2 LOD Multi Impu MLE Impu -8.0 n = 100, 25% LOD, Skewed Dist, R 2 = 0.20, Positive X-Y Association, Negative confounding

n = 100, 25% LOD, Skewed Dist, R 2 = 0.20, Negative X-Y Association, No confounding Mean Bias (with 95% CI) Deletion LOD/sqrt(2) LOD/2 LOD Multi Impu MLE Impu

n = 100, 25% LOD, Skewed Dist, R 2 = 0.20, Positive X-Y Association, No confounding Mean Bias (with 95% CI) Deletion LOD/sqrt(2) LOD/2 LOD Multi Impu MLE Impu

n = 100, 25% LOD, Skewed Dist, R 2 = 0.20, Negative X-Y Association, Positive confounding Mean Bias (with 95% CI) Deletion LOD/sqrt(2) LOD/2 LOD Multi Impu MLE Impu -2.0

n = 100, 25% LOD, Skewed Dist, R 2 = 0.20, Positive X-Y Association, Positive confounding Mean Bias (with 95% CI) Deletion LOD/sqrt(2) LOD/2 LOD Multi Impu MLE Impu -8.0

An overview of results Relative bias of methods is highly dependent on experimental conditions (i.e. no simple answers) Covariates and confounding matters! Simulations that only consider bivariate, X-Y relationships with LODs are limited

Deletion method results Surprisingly… provides unbiased estimates across all conditions! If sample size is large and LOD % is small, this may be a good option. As LOD % becomes larger, deletion is more costly Important caveat: deletion method works well if true associations are linear

Deletion method with linear effects Bottom 8% of X variable deleted

Substitution method results Not surprisingly… these methods are generally terrible! Just LOD substitution is worst type In most scenarios, these will bias associations towards the null … but, works reasonably well when distribution is highly skewed, no confounding, and LOD% is low

Multiple Imputation results Amelia II performs relatively well! Particularly when R 2 is higher Does well even when LOD% is high Problematic when there is no confounding (reason: this indicates there are no/weak associations between variables)

MLE Imputation results Associated with severe bias in most cases Highly reliant on parametric assumptions and the code is daunting: recommend avoiding this method However, performed reasonably well when exposure is normally distributed, no confounding, and LOD% is low

A Case Study…

Sarah’s SFF Analysis Study for Future Families (SFF): a multicenter pregnancy cohort study that recruited mothers from Sarah Evans’ analysis: prenatal exposure to Bisphenol A (BPA) and neurobehavioral scores in 153 children at ages (18%) children have BPA levels below the LOD

Sarah’s SFF Analysis Maternal urinary BPA collected during late pregnancy Neurobehavioral scores obtained through School- age Child Behavior Checklist (CBCL). Used multiple regression adjusting for child age at CBCL assessment, mother’s education level, family stress, urinary creatinine

Anxiety/Dep Withdrawn/Dep Somatic Social Thought Attention Rule-Break Aggressive Internalizing Externalizing Total Problems LOD/sqrt(2) Deletion