Wavelets and excess disease models for analysis of time series data Dan Weinberger Fogarty International Center National Institutes of Health.

Slides:



Advertisements
Similar presentations
Time series modelling and statistical trends
Advertisements

Marian Scott SAGES, March 2009
Decomposition Method.
DFT/FFT and Wavelets ● Additive Synthesis demonstration (wave addition) ● Standard Definitions ● Computing the DFT and FFT ● Sine and cosine wave multiplication.
1 SSS II Lecture 1: Correlation and Regression Graduate School 2008/2009 Social Science Statistics II Gwilym Pryce
Wisconsin HIV/AIDS Surveillance Annual Review: Slide Set New diagnoses, prevalent cases, and deaths through December 2014 April 2015 P Wisconsin.
U.S. Surveillance Update Anthony Fiore, MD, MPH CAPT, USPHS Influenza Division National Center for Immunizations and Respiratory Disease Centers for Disease.
Error Propagation. Uncertainty Uncertainty reflects the knowledge that a measured value is related to the mean. Probable error is the range from the mean.
Chapter 5 Time Series Analysis
Data Sources The most sophisticated forecasting model will fail if it is applied to unreliable data Data should be reliable and accurate Data should be.
Environmental Data Analysis with MatLab Lecture 24: Confidence Limits of Spectra; Bootstraps.
Chapter 13 Forecasting.
Chapter Topics Types of Regression Models
Part II – TIME SERIES ANALYSIS C2 Simple Time Series Methods & Moving Averages © Angel A. Juan & Carles Serrat - UPC 2007/2008.
MODELLING INFLUENZA- ASSOCIATED MORTALITY USING TIME-SERIES REGRESSION APPROACH Stefan Ma, CStat, PhD Epidemiology & Disease Control.
Business Statistics - QBM117 Statistical inference for regression.
R. Werner Solar Terrestrial Influences Institute - BAS Time Series Analysis by descriptive statistic.
Time Series and Forecasting
Slides 13b: Time-Series Models; Measuring Forecast Error
So are how the computer determines the size of the intercept and the slope respectively in an OLS regression The OLS equations give a nice, clear intuitive.
Incidence of Influenza in Ontario Following the Universal Influenza Immunization Campaign Dianne Groll PhD, University of Ottawa David J Thomson PhD, Queen’s.
Study Design and Analysis in Epidemiology: Where does modeling fit? Meaningful Modeling of Epidemiologic Data, 2010 AIMS, Muizenberg, South Africa Steve.
Diane Stockton Trend analysis. Introduction Why do we want to look at trends over time? –To see how things have changed What is the information used for?
Objectives of Multiple Regression
Inference for regression - Simple linear regression
Time Series “The Art of Forecasting”. What Is Forecasting? Process of predicting a future event Underlying basis of all business decisions –Production.
Multiple Choice Questions for discussion
1 Brainstorming for Presentation of Variability in Current Practices Scenario B. Contor August 2007.
TIME SERIES by H.V.S. DE SILVA DEPARTMENT OF MATHEMATICS
Managing Software Projects Analysis and Evaluation of Data - Reliable, Accurate, and Valid Data - Distribution of Data - Centrality and Dispersion - Data.
1/26/09 1 Community Health Assessment in Small Populations: Tools for Working With “Small Numbers” Region 2 Quarterly Meeting January 26, 2009.
BPS - 3rd Ed. Chapter 211 Inference for Regression.
AP Statistics Section 15 A. The Regression Model When a scatterplot shows a linear relationship between a quantitative explanatory variable x and a quantitative.
OPIM 303-Lecture #8 Jose M. Cruz Assistant Professor.
University of Ottawa - Bio 4118 – Applied Biostatistics © Antoine Morin and Scott Findlay 08/10/ :23 PM 1 Some basic statistical concepts, statistics.
Copyright © 2015 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education.
DSc 3120 Generalized Modeling Techniques with Applications Part II. Forecasting.
Various topics Petter Mostad Overview Epidemiology Study types / data types Econometrics Time series data More about sampling –Estimation.
Introductory Statistics Week 4 Lecture slides Exploring Time Series –CAST chapter 4 Relationships between Categorical Variables –Text sections.
Signals CY2G2/SE2A2 Information Theory and Signals Aims: To discuss further concepts in information theory and to introduce signal theory. Outcomes:
Examining Relationships in Quantitative Research
Calculation of excess influenza mortality for small geographic regions Al Ozonoff, Jacqueline Ashba, Paola Sebastiani Boston University School of Public.
1 Using ESSENCE-FL and a serosurvey to estimate total influenza infections, 2009 Richard S. Hopkins, MD, MSPH Kate Goodin, MPH Mackenzie Weise, MPH Aaron.
Lecture 8 Simple Linear Regression (cont.). Section Objectives: Statistical model for linear regression Data for simple linear regression Estimation.
Time series Decomposition Farideh Dehkordi-Vakil.
CEN st Lecture CEN 4021 Software Engineering II Instructor: Masoud Sadjadi Monitoring (POMA)
William W. Thompson, PhD Immunization Safety Office Office of the Chief Science Officer Centers for Disease Control and Prevention Impact of Seasonal Influenza.
Ilona Verburg Nicolette de Keizer Niels Peek
Time Series Analysis and Forecasting. Introduction to Time Series Analysis A time-series is a set of observations on a quantitative variable collected.
Organization of statistical research. The role of Biostatisticians Biostatisticians play essential roles in designing studies, analyzing data and.
10 May Understanding diagnostic tests Evan Sergeant AusVet Animal Health Services.
Forecasting is the art and science of predicting future events.
Components of Time Series Su, Chapter 2, section II.
Time Series and Forecasting Chapter 16 McGraw-Hill/Irwin Copyright © 2012 by The McGraw-Hill Companies, Inc. All rights reserved.
Biostatistics Regression and Correlation Methods Class #10 April 4, 2000.
AP Statistics Section 15 A. The Regression Model When a scatterplot shows a linear relationship between a quantitative explanatory variable x and a quantitative.
BPS - 5th Ed. Chapter 231 Inference for Regression.
Date of download: 6/22/2016 Copyright © 2016 SPIE. All rights reserved. Schematic representation of the near-infrared (NIR) structured illumination instrument,
Central Bank of Egypt Basic statistics. Central Bank of Egypt 2 Index I.Measures of Central Tendency II.Measures of variability of distribution III.Covariance.
Date of download: 6/28/2016 Copyright © 2016 American Medical Association. All rights reserved. From: Impact of Influenza Vaccination on Seasonal Mortality.
Examining the Role of Viral Evolution on Seasonal Influenza Incidence
What is Correlation Analysis?
Flu epidemiology in Scotland – season 2017/18
Lecture 1: Fundamentals of epidemiologic study design and analysis
Challenges of ascertaining seasonality in tropical countries, when to vaccinate? Good morning. K Vandemaele GIP/WHO.
Basic Practice of Statistics - 3rd Edition Inference for Regression
15.1 The Role of Statistics in the Research Process
Geology 491 Spectral Analysis
Exponential Smoothing
Presentation transcript:

Wavelets and excess disease models for analysis of time series data Dan Weinberger Fogarty International Center National Institutes of Health

Analyzing time series data Wavelets: evaluate timing of peaks and dominant frequency Regression models: estimate seasonal baseline and calculate excess incidence

Part 1: Wavelets

Motivating example: Measles Figures from Grenfell et al, Nature 2001 Does the frequency of the measles epidemics change after vaccination?

Shift in dominant frequency Time

Wavelets: a powerful solution – Identification of the dominant frequencies in a series (ie annual, monthly…) at each specific time – Determine the “phase” of these cycles and compare series – Filter to smooth a series (remove high frequency noise) – And many other applications…

Basic concepts Higher frequency Sample time series (wave with 10 unit cycle =sin(2* π /10*t) -Wavelets:little waves of a specific shape -”slide” wavelet along time series to determine strength of correlation -repeat, while shrinking and expanding the wavelet -Can use different shapes of wavelets for different situations

Wavelet spectrum of a sine wave 10 year cycle dominates

Wavelet spectrum of a sine wave “Global wavelet”= Average across entire time period Grey area= Cone of influence: Less confidence in this region Color=power of spectra: Red=higher amplitude at that frequency and time Significance tested by a permutation test

Multiple frequencies

Wavelet with changing frequencies Interpretation: Wavelength increases from ~0.25 to ~0.5 (from 4 cycles/year to 2 cycles/year)

Example: epidemic timing

Example: Using wavelets to extract phase (timing) information year component

Applying wavelets to “real” data Step 1: remove any long term trends from the data (calculate baseline using spline of summer months and then divide by baseline) Step 2: Square root or log-transform the data Step 3: Use transformed data in wavelet transform, evaluate spectra, extract phase data Note: it is important to have a complete time series without missing data for the wavelets. Need to have relatively long time series since accuracy of wavelets is poor at the beginning and end of the time series

Part 2: Excess Disease models

Part 2: Calculation of excess mortality for seasonal diseases Motivating question: How can we quantify the annual disease burden attributable to specific pathogens for diseases that show a strong seasonal pattern? Answer: Count excess above a “typical” seasonal baseline Thanks to Cécile Viboud and Vivek Charu for contributing slides to this section

Serfling Regression Step 1: Define influenza, non-influenza period Step 2: Set a baseline and threshold (95% confidence interval) for pneumonia during non-influenza period Step 3: Calculate excess mortality for each year – Sum of observed mortality subtracted from the model baseline during “epidemic months” (when flu deaths cross threshold)

USA P&I Deaths per 100,000 ( ), All Ages

USA May-Nov P&I Deaths per 100,000 ( ), All Ages OBTAIN A BASELINE MODEL FROM THESE DATA

USA May-Nov P&I Deaths per 100,000 ( ), with Model Baseline, All Ages Full Model: E(Y i ) = α + β 1 cos(2π t i /12) + β 2 sin(2π ti/12) + β 3 t i + β 4 t i 2 + β 5 t i 3 + β 6 t i 4 + β 7 t i 5 + ε i Best-fit Model: E(Y i ) = α + β 1 cos(2π t i /12) + β 2 sin(2π t i /12) + β 3 t i + β 4 t i 2 + β 5 t i 3 + β 6 t i 4 + ε i 12-month sine and cosine wave to account for baseline seasonal variations Linear time trend Polynomial time trend To account for long-term fluctuations

USA P&I Deaths per 100,000 ( ), Model Baseline and Upper 95% Confidence Band, All Ages

Epidemic Months Highlighted in Grey

USA P&I Deaths per 100,000 ( ) and Model Baseline Year Olds Epidemic Months (Grey) Defined by All-Ages P&I Model Excess

Calculating Excess Mortality Monthly Excess Mortality: – For epidemic months (months in which the observed P&I mortality exceeds the upper 95% CI of the model baseline for all-ages): Observed P&I mortality – model baseline predicted P&I mortality Seasonal Excess Mortality: – For each influenza season (defined as Nov.-May in the US): Σ Monthly excess mortality

Seasonal US Excess Mortality Table SeasonAge Group No. of Epidemic Months Excess P&I Deaths per 100,000 Excess A-C Deaths per 100, / / / / / / / / AVERAGES -- 3 (median = 3, range = 1-5)

Pros and Cons of the Serfling Approach Very flexible: can be used without virological data—especially useful for data on past pandemics – However, need at least three years of data Only works if disease is seasonal – Needs clear periods with no viral activity that can be used to create the baseline – Cannot be used as is for tropical countries that have year-round influenza circulation There are techniques to adapt Serfling models for these purposes

An alternative/ complementary approach What proportion of “pneumonia and influenza” hospitalizations can be attributed to influenza? Use regression models with terms for seasonal variation, influenza, RSV (can be viral surveillance data, viral-specific hospitalization codes…)

A quick review of regression Linear regression: – Y=β 1 x 1 + β 2 x 2 + a – β1 : 1 unit increase in x1 results in β 1 increase in Y Poisson regression – Used when “Y” is a count variable/incidence rate rather than continuous – Usually has a skewed distribution – Multiplicative Poisson model: Y=e (β 1 x1+ β 2 x2 + a) – If data are not Poisson distributed, use an alternative model, such as negative binomial

Estimation of influenza hospitalization burden Outcome = weekly pneumonia and influenza hospitalization rate Explanatory variables=influenza-specific and RSV-specific hospitalizations (proxies of viral activity), Seasonal estimates of influenza-related hospitalization rates obtained as sum of predicted rates minus baseline rates (influenza covariate set to 0)

Example: Estimation of influenza hospitalization burden in California in seniors over 65 yrs

Comparison between Serfling and Poisson regression Newall, Viboud, and Wood, 2009 Epidemiolo Infect.

Alternative Models for Estimating Influenza Burden Peri-season models Use months surrounding influenza epidemics as baseline ARIMA models Estimate seasonal baseline by adjusting for serial autocorrelation. Serfling-Poisson combined models Serfling seasonal excess mortality estimates are regressed against seasonal virus prevalence. Takes care of random variations in virus prevalence at small time scales Iterative Serfling models (for non-seasonal data)

Validity tests for influenza disease burden models Regression diagnostics Checks based on the epidemiology of influenza – A/H3N2 vs A/H1N1/B dominant seasons (2-3 ratio) – RSV vs influenza (age!) – Higher rates in 65 yrs and over – Multiple years: seasons will little influenza circulation very precious! Difficult to estimate disease burden with precision – Mild seasons – Young children – Middle age groups

Acknowledgement Cecile Viboud for providing some R program samples Cécile Viboud and Vivek Charu for slides on Serfling regression (Jacques Lewalle) for some ideas on presentation