Combining prevalence estimates from multiple sources Julian Flowers.

Slides:



Advertisements
Similar presentations
Healthy Lifestyles Synthetic Estimates Project Shaun Scholes, Kevin Pickering and Claire Deverill.
Advertisements

Overview of Sampling Methods II
Smoking Prevalence Estimates – latest update Julian Flowers erpho September 2007 EoE RDPH/DsPH meeting.
T-tests continued.
Previous Lecture: Distributions. Introduction to Biostatistics and Bioinformatics Estimation I This Lecture By Judy Zhong Assistant Professor Division.
An Assessment of the Impact of Two Distinct Survey Design Modifications on Health Insurance Coverage Estimates in a National Health Care Survey Steven.
Measurement Reliability and Validity
Meta-analysis for GWAS BST775 Fall DEMO Replication Criteria for a successful GWAS P
Economics 105: Statistics Review #1 due next Tuesday in class Go over GH 8 No GH’s due until next Thur! GH 9 and 10 due next Thur. Do go to lab this week.
Methodological and Analytical Issues Gaia Dallera 6 June,
Business Statistics for Managerial Decision
15 de Abril de A Meta-Analysis is a review in which bias has been reduced by the systematic identification, appraisal, synthesis and statistical.
PSY 307 – Statistics for the Behavioral Sciences
Dr. Chris L. S. Coryn Spring 2012
Intro to Statistics for the Behavioral Sciences PSYC 1900 Lecture 10: Hypothesis Tests for Two Means: Related & Independent Samples.
Clustered or Multilevel Data
Chapter 14 Conducting & Reading Research Baumgartner et al Chapter 14 Inferential Data Analysis.
Inference.ppt - © Aki Taanila1 Sampling Probability sample Non probability sample Statistical inference Sampling error.
Sampling Methods.
Impact Evaluation Session VII Sampling and Power Jishnu Das November 2006.
FINAL REPORT: OUTLINE & OVERVIEW OF SURVEY ERRORS
Trends in Chronic Diseases by Demographic Variables, Hawaii’s Older Population, Hawaii Health Survey (HHS) K. Kromer Baker 1, A. T. Onaka 1, B. Horiuchi.
LEARNING PROGRAMME Hypothesis testing Intermediate Training in Quantitative Analysis Bangkok November 2007.
Copyright © 2010 Pearson Education, Inc. Chapter 22 Comparing Two Proportions.
Measurement Error.
Sampling: Theory and Methods
by B. Zadrozny and C. Elkan
Comparing Two Proportions
Chap 20-1 Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chapter 20 Sampling: Additional Topics in Sampling Statistics for Business.
Applying statistical tests to microarray data. Introduction to filtering Recall- Filtering is the process of deciding which genes in a microarray experiment.
Research Methodology Lecture No :14 (Sampling Design)
Sampling Class 7. Goals of Sampling Representation of a population Representation of a population Representation of a specific phenomenon or behavior.
Scot Exec Course Nov/Dec 04 Survey design overview Gillian Raab Professor of Applied Statistics Napier University.
Department of SOCIAL MEDICINE Producing Small Area Estimates of the Need for Hip and Knee Replacement Surgery ANDY JUDGE Nicky Welton Mary Shaw Yoav Ben-Shlomo.
Developing a Tool to Measure Health Worker Motivation in District Hospitals in Kenya Patrick Mbindyo, Duane Blaauw, Lucy Gilson, Mike English.
Various topics Petter Mostad Overview Epidemiology Study types / data types Econometrics Time series data More about sampling –Estimation.
1 f02kitchenham5 Preliminary Guidelines for Empirical Research in Software Engineering Barbara A. Kitchenham etal IEEE TSE Aug 02.
Sampling Design and Analysis MTH 494 LECTURE-12 Ossam Chohan Assistant Professor CIIT Abbottabad.
Chi-Square Procedures Chi-Square Test for Goodness of Fit, Independence of Variables, and Homogeneity of Proportions.
CHAPTER 12 Descriptive, Program Evaluation, and Advanced Methods.
Simon Power Managing Consultant John Rae Director Understanding Communities Through PayCheck
5-4-1 Unit 4: Sampling approaches After completing this unit you should be able to: Outline the purpose of sampling Understand key theoretical.
Copyright © 2010 Pearson Education, Inc. Chapter 22 Comparing Two Proportions.
Understanding Sampling
Eurostat Statistical matching when samples are drawn according to complex survey designs Training Course «Statistical Matching» Rome, 6-8 November 2013.
1 G Lect 13W Imputation (data augmentation) of missing data Multiple imputation Examples G Multiple Regression Week 13 (Wednesday)
Bangor Transfer Abroad Programme Marketing Research SAMPLING (Zikmund, Chapter 12)
Sampling technique  It is a procedure where we select a group of subjects (a sample) for study from a larger group (a population)
Chapter 22 Comparing Two Proportions.  Comparisons between two percentages are much more common than questions about isolated percentages.  We often.
Using geo-demographic segmentation tools to help inform health insight planning Simon Orange, Public Health Information Analyst Yorkshire and Humber Public.
Producing Data: Experiments BPS - 5th Ed. Chapter 9 1.
Chapter 8 Estimation ©. Estimator and Estimate estimator estimate An estimator of a population parameter is a random variable that depends on the sample.
Parameter Estimation. Statistics Probability specified inferred Steam engine pump “prediction” “estimation”
Confidence Intervals and Hypothesis Testing Mark Dancox Public Health Intelligence Course – Day 3.
Statistical Inference for the Mean Objectives: (Chapter 8&9, DeCoursey) -To understand the terms variance and standard error of a sample mean, Null Hypothesis,
Review Statistical inference and test of significance.
Synthetic Approaches to Data Linkage Mark Elliot, University of Manchester Jerry Reiter Duke University Cathie Marsh Centre.
Introduction to Lifestyle Data Peter Cornish South East Public Health Intelligence Analyst Training Day 2, Session 4 11 th February 2016.
Statistics 22 Comparing Two Proportions. Comparisons between two percentages are much more common than questions about isolated percentages. And they.
Small area estimation combining information from several sources Jae-Kwang Kim, Iowa State University Seo-Young Kim, Statistical Research Institute July.
“New methods in generating evidence for everyone: Can we improve evidence synthesis approaches?” Network Meta-Analyses and Economic Evaluations Petros.
Sullivan – Statistics: Informed Decisions Using Data – 2 nd Edition – Chapter 1 Section 3 – Slide 1 of 28 Chapter 1 Section 3 Other Effective Sampling.
STA248 week 121 Bootstrap Test for Pairs of Means of a Non-Normal Population – small samples Suppose X 1, …, X n are iid from some distribution independent.
Methods of multivariate analysis Ing. Jozef Palkovič, PhD.
Simple Linear Regression and Correlation (Continue..,) Reference: Chapter 17 of Statistics for Management and Economics, 7 th Edition, Gerald Keller. 1.
Research Design
Sample Size Determination
SAMPLING (Zikmund, Chapter 12).
Salah Merad Methodology Division, ONS
Clinical prediction models
Presentation transcript:

Combining prevalence estimates from multiple sources Julian Flowers

The problem (1)... No systematic way of monitoring health behaviours at small area level in England => Have smoking targets but don’t know smoking prevalence for PCTs/ districts But multiple potential sources of data –Surveys –Commercial datasets –GP data –Synthetic estimates

Tend to use “favourite” data sources Different datasets give different answers But all may have useful information about smoking Question...what is the best estimate of smoking prevalence given the data we have...? The problem (2)...

7 Datasets about districts... Synthetic estimates (from DH) for districts based on Health Survey for England Estimates based on commercial data abut tobacco expenditure by households at small area (actually a synthetic estimate) 3 years of commercial data based on responses to market research data Separate analysis of HSE by ASH

7 datasets... All biased in someway – some estimates looked to low; some not well correlated – which one(s) to believe ? Could/ should they be combined –if so how (heptangulation...)

908/01/2008 The situation in the East of England: different estimates from different sources Motivation for combining estimates Basildon: pooled smoking prevalence estimates

Bayesian modelling Work with MRC Biostatistics Unit Based on work looking at bias adjusted meta-analysis Idea is that in meta analysis should include all relevant studies which contain relevant information but weight them according to bias

1108/01/2008  Bayesian hierarchical model structure.  Developed in WinBUGS.  Allows for additive bias (Turner et al. 2007, Spiegelhalter and Best 2003).  The model assumes the biases affecting the SP estimates to vary between data sources.  Let be the SP estimate obtained from data source j (j=1,…,7) for LA i (i=1,…,48 for the East of England), be the corresponding sampling variance (obtained from the 95% confidence limits and assumed known) and the corresponding biases assumed exchangeable within data sources. Then the SP estimates are believed to be generated by a normal distribution with mean and variance, where is the true SP estimate for the i-th LA.  A constraint is needed: our choice is an overall 23% smoking prevalence for the East of England.  Several variants of this model (included a multivariate model aiming to detect correlation among data sources) have been performed with no significant differences. The basic model Model

1208/01/2008 Synthetic + classical + recent approaches Statistical literature  Multilevel synthetic estimation (Twigg et al. 2000): using a multilevel modelling approach and nesting individuals within postcode sectors within health authorities, multilevel-derived synthetic estimates are obtained by means of ecological and individual variables associated with the phenomenon of interest. Prevalence estimates can be combined directly from surveys.  Multiple-frame estimation (Lohr and Rao, 2000; 2006): different sampling frames (not necessarily non-overlapping) whose union covers the whole population are considered and probability samples are drawn independently from each frame. Samples are then properly combined to obtain optimal linear estimators of population quantities. The survey database is needed.  Statistical matching (Rodgers, 1984; Moriarity and Scheuren, 2001) considers records of subjects having “similar profiles” from different data sources, and puts together different information from them. The survey database is needed.  Scoring method (Elliot and Davis, 2005): this method is based on adjusting the survey weights such that the complementary strengths of each survey in terms of sample size or unbiasedness are exchanged. The surveys are therefore scored consequently. The survey database is needed.  Bayesian hierarchical methods: a recent work by Raghunathan et al., 2007 addresses the problem of combining prevalence rates from two surveys by means of a hierarchical Bayesian approach. One of the two surveys is believed less biased in terms of coverage and contains information about the presence of a telephone line at home. Survey respondents are then divided in two groups, depending on whether or not they have a telephone at home. The other survey is based on telephone interviews only and for this reason is believed more biased, but its size is bigger. The hierarchical Bayesian model maps the bigger survey with the information on telephone provided by the less biased survey. Prevalence estimates can be combined directly from surveys.

Modelled estimates with CIs

Comparison with 2008 survey

Conclusions Bayesian hierarchical models can be used to pool prevalence estimates from different sources adjusting for measured bias in each source. This is a type of formal triangulation of data. This method can be used to when direct estimates are not available. It could be applied to any life-style or prevalence data where multiple sources are available Further work is need to compare modelled estimates with direct estimates and other for other life-style behaviours Further work is needed to implement the modelling in conventional statistical packages Local surveys can help to recalibrate the models on a regular basis

Modelling bias in combining small-area prevalence estimates from multiple surveys Giancarlo Manzi 1,, David J Spiegelhalter 1, Rebecca M Turner 1, Julian Flowers 2, Simon G Thompson 1 1 MRC Biostatistics Unit, Institute of Public Health, Cambridge, UK 2 Eastern Region Public Health Observatory, Institute of Public Health, Cambridge, UK Current address: Department of Economics, Business and Statistics, University of Milan, Italy. Conclusions Bayesian hierarchical models can be used to pool prevalence estimates from different sources adjusting for measured bias in each source. This is a type of formal triangulation of data. This method can be used to when direct estimates are not available. It could be applied to any life-style or prevalence data where multiple sources are available Further work is need to compare modelled estimates with direct estimates and other for other life-style behaviours Further work is needed to implement the modelling in conventional statistical packages References Contact