Michael R. Elliott2, Xiaobi (Shelby) Huang1, Sioban Harlow3

Slides:



Advertisements
Similar presentations
Handling attrition and non- response in longitudinal data Harvey Goldstein University of Bristol.
Advertisements

Introduction to Monte Carlo Markov chain (MCMC) methods
Assumptions underlying regression analysis
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
If we use a logistic model, we do not have the problem of suggesting risks greater than 1 or less than 0 for some values of X: E[1{outcome = 1} ] = exp(a+bX)/
MENOPAUSE CURRICULUM SLIDE SET. What is menopause? Menopause is a normal, natural event, defined as the final menstrual period (FMP), confirmed after.
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
Multilevel survival models A paper presented to celebrate Murray Aitkin’s 70 th birthday Harvey Goldstein ( also 70 ) Centre for Multilevel Modelling University.
8-1 Introduction In the previous chapter we illustrated how a parameter can be estimated from sample data. However, it is important to understand how.
Slide 1 Testing Multivariate Assumptions The multivariate statistical techniques which we will cover in this class require one or more the following assumptions.
1 Simple Linear Regression 1. review of least squares procedure 2. inference for least squares lines.
Inference for regression - Simple linear regression
Modeling Menstrual Cycle Length in Pre- and Peri-Menopausal Women Michael Elliott Xiaobi Huang Sioban Harlow University of Michigan School of Public Health.
1 Least squares procedure Inference for least squares lines Simple Linear Regression.
BPS - 3rd Ed. Chapter 211 Inference for Regression.
Bayesian Analysis and Applications of A Cure Rate Model.
Chapter 4 Linear Regression 1. Introduction Managerial decisions are often based on the relationship between two or more variables. For example, after.
Lesson Multiple Regression Models. Objectives Obtain the correlation matrix Use technology to find a multiple regression equation Interpret the.
Lecture 2: Statistical learning primer for biologists
Simulation Study for Longitudinal Data with Nonignorable Missing Data Rong Liu, PhD Candidate Dr. Ramakrishnan, Advisor Department of Biostatistics Virginia.
Chapter 20 Classification and Estimation Classification – Feature selection Good feature have four characteristics: –Discrimination. Features.
A shared random effects transition model for longitudinal count data with informative missingness Jinhui Li Joint work with Yingnian Wu, Xiaowei Yang.
Tutorial I: Missing Value Analysis
BPS - 5th Ed. Chapter 231 Inference for Regression.
I. Statistical Methods for Genome-Enabled Prediction of Complex Traits OUTLINE THE CHALLENGES OF PREDICTING COMPLEX TRAITS ORDINARY LEAST SQUARES (OLS)
Confidential and Proprietary Business Information. For Internal Use Only. Statistical modeling of tumor regrowth experiment in xenograft studies May 18.
Stats Methods at IC Lecture 3: Regression.
Statistics & Evidence-Based Practice
Bootstrap and Model Validation
Confidence Intervals Cont.
Missing data: Why you should care about it and what to do about it
Bayesian Semi-Parametric Multiple Shrinkage
Inference for Least Squares Lines
Logistic Regression APKC – STATS AFAC (2016).
April 18 Intro to survival analysis Le 11.1 – 11.2
Linear Regression.
Bayesian Generalized Product Partition Model
Inference for Regression
Journal Club Notes.
Linear Mixed Models in JMP Pro
Figure 1. The solution of the Faddy–Gosden differential equation for the primordial follicle population from birth to menopause. The primordial follicle.
Multiple Regression Analysis and Model Building
STA 216 Generalized Linear Models
Meta-analysis of joint longitudinal and event-time outcomes
CJT 765: Structural Equation Modeling
Basic Statistics Overview
Pure Serial Correlation
Michael Epstein, Ben Calderhead, Mark A. Girolami, Lucia G. Sivilotti 
Remember that our objective is for some density f(y|) for observations where y and  are vectors of data and parameters,  being sampled from a prior.
Statistical Methods For Engineers
Dose-finding designs incorporating toxicity data from multiple treatment cycles and continuous efficacy outcome Sumithra J. Mandrekar Mayo Clinic Invited.
STA 216 Generalized Linear Models
CHAPTER 26: Inference for Regression
Mark Rothmann U.S. Food and Drug Administration September 14, 2018
Sampling Studies for Longitudinal Functional Data
Multiple Regression Models
Basic Practice of Statistics - 3rd Edition Inference for Regression
Volume 111, Issue 2, Pages (July 2016)
Fixed, Random and Mixed effects
EVENT PROJECTION Minzhao Liu, 2018
Parametric Methods Berlin Chen, 2005 References:
Fast Sequences of Non-spatial State Representations in Humans
CHAPTER – 1.2 UNCERTAINTIES IN MEASUREMENTS.
Timescales of Inference in Visual Adaptation
Longitudinal Data & Mixed Effects Models
MGS 3100 Business Analysis Regression Feb 18, 2016
Presumptions Subgroups (samples) of data are formed.
The ReSTAGE Collaboration: defining optimal bleeding criteria for onset of early menopausal transition  Siobán D. Harlow, Ph.D., Ellen S. Mitchell, Ph.D.,
CHAPTER – 1.2 UNCERTAINTIES IN MEASUREMENTS.
Classical regression review
Presentation transcript:

Bayesian Change Point Models for Analysis of Menstrual Diary Data at the Approach of Menopause Michael R. Elliott2, Xiaobi (Shelby) Huang1, Sioban Harlow3 1. Genzyme, a Sanofi Company 2. Department of Biostatistics, University of Michigan 3. Department of Epidemiology, University of Michigan Modern Model Methods 2016

Introduction Goal for women’s menstrual studies: identify associations between women’s menstrual characters and women’s health Why? Menstrual cycles are the most easily observed markers of ovarian functions. Alterations in bleeding are a significant source of gyne- cologic morbidity, especially in late reproductive life. Menopausal transition is a period of critical change in women's biology and health status.

How do we define and identify ONSET of the transition? 1981: Metcalf and Livesey. Pituitary–ovarian function in normal women during the menopausal transition. 1994: Brambilla et al. Defining the Perimenopause for Application in Epidemiologic Investigations. 2000: Mitchell et al. Three stages of the menopausal transition from the Seattle Midlife Women's Health Study: Toward a more precise definition. 2001: Soules et al. Executive summary: Stages of reproductive aging workshop (STRAW). 2002: Taffe and Dennerstein. Menstrual patterns leading to the final menstrual period. 2007: The ReSTAGE Collaboration. Recommendations from a multi-study evaluation of proposed criteria for Staging Reproductive Aging.

Visual browsing of menstruation patterns. Previous Approaches Visual browsing of menstruation patterns. Summary statistics of sliding windows over age. Linear mixed model. Major problem: lack of precision - Traditional longitudinal models tend to underutilize information from subject-level in clinical and epidemiological research settings, at least in part because of the lack of methods for such analyses. Tranditional longitudinal models tend to underutilize information from subject-level in clinical and epidemiological research settings, at least in part because of the lack of methods for such analyses.

Goals Initial goal: compare menstrual pattern changes between two generations of women Subsequent goals: Model how menstrual cycle length and variability change when women approach menopause. Develop method to impute various types of missingness. Find potential biomarkers for women’s menopausal transition. Define subgroups of menstruation patterns. Our initial goal is to… However, we only have one generation’ s data… so we are more focused on the subsequent goals at this time and the model will be compatible when we have 2nd generation’s data.

Outline TREMIN Trust Data Bayesian Changepoint Model Missing Data Imputation Menstruation Patterns

TREMIN Trust Data TREMIN: Ongoing 70 year longitudinal menstrual calendar study Initiated by Dr. Alan Treloar of University of Minnesota in 1934 Cohort I: 1936-1939, 2350 U. Minnesota undergraduates Cohort II: 1961-1964, 1367 U. Minnesota undergraduates One of the only two data sets worldwide for individual women’s menstrual diary data across their reproductive life span.

Data in analysis 2350 Women in TREMIN Cohort I <25 years old at enrollment, participated >5 yrs, have at least 15 consecutive segments after 35 yrs old 735 Women Used hormone < 4 yrs, no gynecological surgery before 40 yrs old 617 Women In Analysis; Each Has 15-321 Observations 105 (17.0%) with complete data 313(50.7%) have observed final menstruation periods (FMP)

Missing due to hormone use Missing Data Missing due to hormone use Hysterectomy or bilateral oophorectomy surgeries Non-reporting or withdrawal from the study Non- menstrual intervals are not treated as missing: Pregnancy intervals First two cycles after a birth First cycle after a spontaneous abortion

Four Typical Women in TREMIN Cohort -Blue line: cycle lengths (on log scale). -Black dot (●): Observed FMP. -Red dot (●): Truncated by surgery. -Green bars (||): Pregnancy interval. -Red bars (||): Missing due to hormone use. -Black bars (||)/circle (○): Intermittent missingness due to nonreporting.

Outline TREMIN Trust Data Bayesian Changepoint Model Missing Data Imputation Menstruation Patterns Subgroups of Menstruation Patterns

Patterns of Menstruation Cycle Lengths Regular cycling Premenopausal irregularity (plot form Lisabeth et al. 2004)

Thoughts of Modeling Common pattern: how menstrual cycle length changes over age Variability has the same pattern Despite the overall pattern, individual women have their unique change points, intercepts and slopes

Bayesian Changepoint Model for Mean and Variance Subject Level: Population Level: Some notations: - ith subject’s tth cycle length. - age of ith subject’s tth menstruation cycle. - covariates of ith subject.

Inference Joint posterior distribution:

Outline TREMIN Trust Data Bayesian Changepoint Model Missing Data Imputation Menstruation Patterns Subgroups of Menstruation Patterns

Imputation of Missing Data - Complexities Large amount of missingness Various reasons of missing: hormone, surgery, loss of follow up Cycle lengths and ages should match When to stop if FMP was not observed? How to impute FMP?

Imputation Procedure Step 1: Obtain initial parameters from complete data analysis: subjects with complete cases, assign subjects with missing data, draw Step 2: Impute the missing data using : Imputation draws are from the model prediction: Update ages and cycle lengths together:

Imputation: How to fill the missing gaps Age 42.0 42.35 End Start (L) Original data Imputed age Imputed cycle length (year) 42.0 42.07 42.16 42.28 42.40 0.07 0.09 0.12 0.12 (L’) Imputation Cut the last segment length to fit the gap length Adjusted imputed age 42.0 42.07 42.16 42.28 42.35 0.07 0.09 0.12 0.08 Adjusted imputed cycle length (year) of one set Adjusting Find 50 sets of imputations and perform importance sampling 20

Imputation: Final Menstruation Periods If FMPs are not observed: impute and update the data until imputed FMP or when , whichever happens first. Model the age at FMP as a piecewise exponential distribution with hazard , for Knots are set at one year or 0.5 year gaps between age 40 and 60, assuming the risk of having FMP before age 40 is zero. Find the probability of FMP occurring between time interval , given the event has not occurred before

Imputation: Gaps till FMPs Every time after a segment is imputed, draw a bernoulli variable to judge whether it is the final menstruation period. If any imputed cycle is longer than 365 days or an imputed age is larger than 60, stop imputing and treat the corresponding age as FMP. Censoring FMP Age 48.0 48.07 48.16 48.28 52.3 W=0 W=0 W=0 W=1

Imputation Procedure Step 3: Update parameters using Gibbs steps based on the imputed data set we obtained in step 2. Step 4: Using the updated parameters in 3 to impute another imputed data set using method stated in step 2. Step 5: Repeat step 3 and 4 for many times until we obtain converged MCMC chains.

Posterior Model Check Convergence: Model fit: Two MCMC chains with different starting values;10,000 iterations each after “burn-in”. Gelman and Rubin statistic: 99.2% individual level parameters and all population level parameters achieved convergence. Model fit: Posterior predictive Chi-square test for cycle lengths. Compare observed FMP with replicated FMPs.

Outline TREMIN Trust Data Bayesian Changepoint Model Missing Data Imputation Menstruation Patterns Subgroups of Menstruation Patterns

Results: Individual Level Parameters Histogram of

Individual Level Parameters Posterior mean and associated 95% posterior predictive interval of the cycle length mean and the upper and lower 2.5 percentiles for the cycle distribution:

Population Level Parameters Posterior mean and 95% predictive intervals for mean population level parameters :

Menstruation Pattern Characteristics Mean cycle length declines slightly until changepoint, then increases rapidly. Cycle lengths are stable on average until change- point, then variability explodes. Variability begins increasing well in advance (3 years) of longer cycle lengths.

Population Level Parameters Posterior mean for correlations: Mean intercept Mean slope before change-point slope after change- point Change-point Log-Var intercept Log-Var Point Var 1 -0.13 -0.01 0.29 0.17 -0.14 0.27 Mean slope before changepoint -0.02 0.00 -0.07 0.03 -0.00 0.01 Mean slope after changepoint 0.25 0.08 -0.08 0.33 0.24 Changepoint for mean 0.15 -0.25 0.43 0.79 Log-Variance intercept -0.69 0.44 0.02 Log-Variance slope before changepoint -0.74 0.09 Log-Variance slope after changepoint 0.34 Variance changepoint

Correlations Among Characteristics Later change points for variance are highly associated with later change points for mean. Later change points for both mean and variance are also correlated with longer and more variable segment lengths, and more rapid increases in mean and variance after the change point; consequently mean and variance slopes after change points are positively correlated. Greater mean length at age 35 is associated with greater declines in variability before the variance change point and greater increases in variability after. Larger segment variability is associated with longer mean segment length. Larger segment variability is highly associated with more rapid declines in variability before but larger increases in variability after the variance change point: thus change in variability before and after the variance change point is negatively correlated.

Menstruation Patterns and Menopause Accelerated failure time model with gaussian link: Age of FMP ~ pattern parameters Women with late menopause have: Later changepoints Smaller variance of cycle lengths at age of 35 Less rapid decrease in variance of cycle lengths before changepoints Less rapid increase in mean and variance of cycle lengths after changepoints Less abrupt changes of variance slopes before and after changepoints

Publication and Related Work Publication of this work: "Modeling Menstrual Cycle Length and Variability at the Approach of Menopause Using Bayesian Changepoint Model," X. Huang, S. D. Harlow, M. R. Elliott, 2014, Journal of the Royal Statistical Society C: Applied Statistics, 63(3): 445-466 Comparing changepoints to previously defined transition markers. Publication: "Distinguishing 6 Population Subgroups by Timing and Characteristics of the Menopausal Transition," X. Huang, S. D. Harlow, M. R. Elliott, 2012. American Journal of Epidemiology, 175(1): 74-83 Include data from cohort II and study the difference of women’s menstruation patterns between cohort I and cohort II.

Acknowledgement Grant R01HD055524 from the National Institute of Child Health and Development. Data from TREMIN Trust.

Thank You!

Additional Literatures 1987: Davidian and Caroll. Variance function estimation. 2000: Harlow et al. Analysis of menstrual diary data across the reproductive life span: Application of the bipartite model approach and the importance of within-woman variance. 2001: Thum and Bhattacharya. Detecting a change in school performance: a Bayesian analysis for a multilevel joint point problem. 2003: Hall et al. Bayesian and profile likelihood changepoint methods for modeling cognitive function over time. 2004: Lisabeth et al. A new statistical approach demonstrated menstrual patterns during the menopausal transition did not vary by age at menopause 2007: Crainiceanu et al. Spatially adaptive Bayesian penalized splines with heteroscedastic errors.

Appendix: Gibbs Sampling are the corresponding part of prior multivariate normal mean and covariance matrix conditional on other parameters

Gibbs Sampling - Continue

Gibbs Sampling - Continued

Gibbs Sampling - Continued

Appendix – Survival Model of FMPs Assume that last observed ages of all subjects are from piecewise exponential distribution Use prior: The posterior distribution is

Appendix – Predict FMPs The cumulative hazard and survival function: Conditional and unconditional distribution of FMP occurrence by time

Posterior Predictive Model Check -Cycle Length Posterior predictive Chi-square test: Created histogram of p-values of Chi-square tests for all subjects, each test based on 200 replications.

Observed and Predicted FMPs

Observed and Imputed FMPs - observed FMP - imputed FMP and 95% predictive interval x - age at censoring To consider the appropriateness of the final menstrual period modeling, we plot the observed and predicted FMPs together with the censoring ages for 100 randomly selected women in Figure 6. The method for estimating FMP when not observed appears to have worked well, with the distribution for the predicted FMPs corresponding closely to the observed FMPs when the censoring age is relatively early and little information is usually available to predict FMP.

Posterior Model Check – FMPs Replicate imputations for FMPs for subjects with observed FMPs Compare each observed FMPs with corresponding 200 draws of predicted FMPs Histogram of proportion of mean(FMPrep) > Observed FMP For subjects with observed FMPs, we imputed their cycle lengths from the beginning to get their predicted FMPs using the method described previously. We then compared the each subject's observed FMP to corresponding 200 replicated FMPs and summarized the proportion of mean replicated FMPs Larger than oserved FMP in Figure 5.

Changepoints

Principle Component Analysis of Pattern Measures

Sensitivity Analysis