A Bayesian mixture model for detecting unusual time trends Modelling burglary counts in Cambridge Guangquan (Philip) Li 4 th ESRC Research Methods Festival.

Slides:



Advertisements
Similar presentations
Introduction Describe what panel data is and the reasons for using it in this format Assess the importance of fixed and random effects Examine the Hausman.
Advertisements

STATISTICAL TOOLS FOR SYNTHESIZING LISTS OF DIFFERENTIALLY EXPRESSED FEATURES IN MICROARRAY EXPERIMENTS Marta Blangiardo and Sylvia Richardson 1 1 Centre.
Model checks for complex hierarchical models Alex Lewin and Sylvia Richardson Imperial College Centre for Biostatistics.
1 Alex Lewin Centre for Biostatistics Imperial College, London Joint work with Natalia Bochkina, Sylvia Richardson BBSRC Exploiting Genomics grant Mixture.
Bayesian mixture models for analysing gene expression data Natalia Bochkina In collaboration with Alex Lewin, Sylvia Richardson, BAIR Consortium Imperial.
Model checking in mixture models via mixed predictive p-values Alex Lewin and Sylvia Richardson, Centre for Biostatistics, Imperial College, London Mixed.
Sources and effects of bias in investigating links between adverse health outcomes and environmental hazards Frank Dunstan University of Wales College.
F rontiers in S patial E pidemiology S ymposium Searching for needles in haystacks: A Bayesian approach to chronic disease surveillance Nicky Best Department.
Space-Time Modelling to support local policing Robert Haining Department of Geography, University of Cambridge, England. AAG; New York; Feb
Statistical Modeling and Data Analysis Given a data set, first question a statistician ask is, “What is the statistical model to this data?” We then characterize.
Clinical Trial Designs for the Evaluation of Prognostic & Predictive Classifiers Richard Simon, D.Sc. Chief, Biometric Research Branch National Cancer.
Budapest May 27, 2008 Unifying mixed linear models and the MASH algorithm for breakpoint detection and correction Anders Grimvall, Sackmone Sirisack, Agne.
BACKGROUND Benzene is a known carcinogen. Occupational exposure to benzene is an established risk factor for leukaemia. Less is known about the effects.
© Imperial College London Analysis of space time patterns of disease risk Sylvia Richardson Centre for Biostatistics Joint work with Juanjo Abellan and.
Nicky Best and Chris Jackson With Sylvia Richardson Department of Epidemiology and Public Health Imperial College, London
GIS and Spatial Statistics: Methods and Applications in Public Health
Evaluating Peterborough’s ‘no cold calling’ initiative using space-time Bayesian hierarchical modelling Guangquan Li *, Robert Haining +, Sylvia Richardson.
Predictive Automatic Relevance Determination by Expectation Propagation Yuan (Alan) Qi Thomas P. Minka Rosalind W. Picard Zoubin Ghahramani.
Sérgio Pequito Phd Student
Software Quality Control Methods. Introduction Quality control methods have received a world wide surge of interest within the past couple of decades.
GIS in Spatial Epidemiology: small area studies of exposure- outcome relationships Robert Haining Department of Geography University of Cambridge.
Bayesian space-time models for surveillance and policy evaluation using small area data Nicky Best Department of Epidemiology and Biostatistics Imperial.
Spatial Statistics for Cancer Surveillance Martin Kulldorff Harvard Medical School and Harvard Pilgrim Health Care.
A simulation study of the effect of sample size and level of interpenetration on inference from cross-classified multilevel logistic regression models.
Using ArcGIS/SaTScan to detect higher than expected breast cancer incidence Jim Files, BS Appathurai Balamurugan, MD, MPH.
Competence Centre on Information Extraction and Image Understanding for Earth Observation Matteo Soccorsi (1) and Mihai Datcu (1,2) A Complex GMRF for.
Epidemiology The Basics Only… Adapted with permission from a class presentation developed by Dr. Charles Lynch – University of Iowa, Iowa City.
Disease Prevalence Estimates for Neighbourhoods: Combining Spatial Interpolation and Spatial Factor Models Peter Congdon, Queen Mary University of London.
The Audit Process Tahera Chaudry March Clinical audit A quality improvement process that seeks to improve patient care and outcomes through systematic.
Estimating future household formation: some observations BSPS Seminar: 16 December 2013 Neil McDonald: Visiting Fellow CCHPR 1.
1 A Bayesian Method for Guessing the Extreme Values in a Data Set Mingxi Wu, Chris Jermaine University of Florida September 2007.
Estimating parameters in a statistical model Likelihood and Maximum likelihood estimation Bayesian point estimates Maximum a posteriori point.
Chapter 8: Preliminary Survey & Internal Control Review
Inference from ecological models: air pollution and stroke using data from Sheffield, England. Ravi Maheswaran, Guangquan Li, Jane Law, Robert Haining,
Measuring Socially and Economically Sustainable Rural Communities A policy based approach Pippa Gibson Defra.
Maximum Likelihood - "Frequentist" inference x 1,x 2,....,x n ~ iid N( ,  2 ) Joint pdf for the whole random sample Maximum likelihood estimates.
Bayesian vs. frequentist inference frequentist: 1) Deductive hypothesis testing of Popper--ruling out alternative explanations Falsification: can prove.
Center for Radiative Shock Hydrodynamics Fall 2011 Review Assessment of predictive capability Derek Bingham 1.
1 University of Texas at Austin Machine Learning Group 图像与视频处理 计算机学院 Motion Detection and Estimation.
Living near to burglars: estimating the small area level risk of burglary in Cambridgeshire Robert Haining Department of Geography University of Cambridge.
BACKGROUND Benzene is a known carcinogen. Occupational exposure to benzene is an established risk factor for leukaemia. Less is known about the effects.
Statistical Testing with Genes Saurabh Sinha CS 466.
Information Theory for Mobile Ad-Hoc Networks (ITMANET): The FLoWS Project Competitive Scheduling in Wireless Networks with Correlated Channel State Ozan.
Computational Approaches for Biomarker Discovery SubbaLakshmiswetha Patchamatla.
Guest lecture: Feature Selection Alan Qi Dec 2, 2004.
Comp. Genomics Recitation 10 4/7/09 Differential expression detection.
Spatial Smoothing and Multiple Comparisons Correction for Dummies Alexa Morcom, Matthew Brett Acknowledgements.
1 Estimation of Gene-Specific Variance 2/17/2011 Copyright © 2011 Dan Nettleton.
A Spatial-Temporal Model for Identifying Dynamic Patterns of Epidemic Diffusion Tzai-Hung Wen Associate Professor Department of Geography,
Child social exclusion: development of a small area indicator for Australia Justine McNamara.
Statistical Significance: Tests for Spatial Randomness.
1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.
URBDP 591 A Lecture 16: Research Validity and Replication Objectives Guidelines for Writing Final Paper Statistical Conclusion Validity Montecarlo Simulation/Randomization.
Review Statistical inference and test of significance.
Spatial Scan Statistic for Geographical and Network Hotspot Detection C. Taillie and G. P. Patil Center for Statistical Ecology and Environmental Statistics.
Exposure Prediction and Measurement Error in Air Pollution and Health Studies Lianne Sheppard Adam A. Szpiro, Sun-Young Kim University of Washington CMAS.
Increasing Power in Association Studies by using Linkage Disequilibrium Structure and Molecular Function as Prior Information Eleazar Eskin UCLA.
Power and Meta-Analysis Dr Geraldine M. Clarke Wellcome Trust Advanced Courses; Genomic Epidemiology in Africa, 21 st – 26 th June 2015 Africa Centre for.
Evaluating Cambridgeshire Constabulary’s “No Cold Calling” scheme: an application of spatial-temporal data modelling. Robert Haining Department of Geography,
Methods of multivariate analysis Ing. Jozef Palkovič, PhD.
Predictive Automatic Relevance Determination by Expectation Propagation Y. Qi T.P. Minka R.W. Picard Z. Ghahramani.
Bayesian Semi-Parametric Multiple Shrinkage
Dept of Biostatistics, Emory University
Statistical Testing with Genes
Alan Qi Thomas P. Minka Rosalind W. Picard Zoubin Ghahramani
Statistical Data Analysis
Bayesian Biosurveillance of Disease Outbreaks
Image and Video Processing
Statistical Data Analysis
Statistical Testing with Genes
Presentation transcript:

A Bayesian mixture model for detecting unusual time trends Modelling burglary counts in Cambridge Guangquan (Philip) Li 4 th ESRC Research Methods Festival July 5-8, 2010 Joint work with Nicky Best, Sylvia Richardson and Robert Haining

Outline 1.Motivations 2.A Bayesian mixture model for detecting unusual time trends 3.Preliminary results from analysis of burglary data in Cambridge 2

Reasons for detecting unusual time trends Emergence of local risk factors? Change of population composition? Impact of a new policing scheme? Modelling Highlighting areas deserving of further scrutiny Identifying possible risk factors Informing policy making Assessing effect of policy 3

The report describes the work of the Domestic Burglary Task Force (DBTF) in Cambridge, which was established to examine the nature of residential burglary in Cambridge and to design and implement initiatives to prevent it. Analysis of the burglary counts ( ), the DBFT identified the largest ‘hot spot’ in the north of the City, and the two wards which contained the ‘hot spot’, as the targeted area. After a series of seminars, a number of burglary prevention strategies were identified and implemented. Question: whether the strategies helped to reduce residential burglary rates? Preventing residential burglary in Cambridge Police research Series paper 108 4

Trend comparisons Cambridge city as a whole Two targeted wards 5 Need modelling

A Bayesian detection model We have proposed a detection method that, for each area, provides estimates independently from the common trend component and the area-specific trend component and selects estimates between the two to describe the observed data. For each area, the posterior probability of selecting the common trend component is used to classify the area/trend as “unusual” or not. 6

A schematic diagram of the detection model Space-Time variations Common time trend Area-specific time trends Common spatial pattern Area-specific time trends Common spatial pattern Common time trend Space-time separable Space-time inseparable 7

Specific model components (1) A conditional autoregressive (CAR) model is used to impose the spatial correlation. Spatial smoothing 8

Specific model components (2) A random walk of order 1 is used to define the temporal structure. Temporal smoothing tt-1t+1 Time Non-informative priors are assigned to other parameters in the model 1.S varies 2.S fixed (>1)

Classification For each area, the posterior mean of z i (denoted by p i ) presents evidence for area i to follow the common trend pattern  a small value of p i suggests that the area is unlikely to follow the common trend The area is unusual if the above probability is less than some threshold, i.e.,

The idea of classification Unusual Usual pipi Prob (An area follows the common trend pattern) Choose cutoff to achieve pre-specified false detection rate (FDR) Cutoff values cannot be obtained using conventional approaches such as Storey 2002 since null hypothesis is specific to each areas. We have proposed a novel simulation approach to obtain area-specific cutoffs so that we can maximize the sensitivity while controlling for FDR. Choose cutoff to achieve pre-specified false detection rate (FDR) Cutoff values cannot be obtained using conventional approaches such as Storey 2002 since null hypothesis is specific to each areas. We have proposed a novel simulation approach to obtain area-specific cutoffs so that we can maximize the sensitivity while controlling for FDR. ⌘ 11

A simulation study 12

Simulation results Scenario 1 Scenario 2 Scenario 3 Small departures Large departures 15 (out of 354) areas were selected according to the population sizes and spatial risks and assigned the unusual trend. Comparing the gain/loss of sensitivity amongst the following 4 models 1. S-vary 2. S=2 (the optimal setting, the reference) 3. S=5 4. SaTScan (space-time permutation test) S-vary S=5 SaTScan Scenario 1 13

Simulation results s-vary S-vary S=5 SaTScan Reference: S = 2 14

Summary: Key features of the model The comprehensive simulation study has shown some key features of our model: 1.Our model can detect various realistic departure patterns; 2.The performance is robust over different model settings; 3.Our model outperforms the popular SaTScan; 4.Our detection model works relatively well on sparse data. 15

Burglary data in Cambridge Geo-referenced offence records in Cambridgeshire ( ) are made available by the Cambridgeshire Constabulary; In this analysis, we focus on the burglary counts in Cambridge at the Lower Super Output Area (LSOA) level for each quarter from 2001 to 2002 (2584 reported burglary cases). Numbers of houses were taken from the 2001 Census then aggregated to LSOA level (≈600 houses). 16

Overall spatial/temporal pattern 17

Detected LSOA (FDR=0.01) 18

High risks and unusual LSOA 19

Future work We are currently working closely with the Cambridgeshire Police to assess effectiveness of possible policing schemes; The framework can be extended to a prospective surveillance system by applying the detection model sequentially to observed data; Incorporation of time-varying covariates (e.g., unemployment from surveys) can enrich the detection analysis. 20

Summary We have proposed a Bayesian mixture model for detecting unusual time trends; The extensive simulation study has shown the superior performance of the model in detecting various “real” departures; Applying the model to the offence data can assist/inform policy making (by identifying abrupt changes) and help to assess policy. 21

Acknowledgement Funded by ESRC The BIAS project (PI Nicky Best), based at Imperial College London, is a node of the Economic and Social Research Council’s National Centre for Research Methods (NCRM) The offences data are kindly provided by the Cambridgeshire Constabulary. 22

References SaTScan Kulldorff M, Heffernan R, Hartman J, Assunção RM, Mostashari F. A space-time permutation scan statistic for the early detection of disease outbreaks. PLoS Medicine, 2: , False discovery rate (FDR) Storey J. A direct approach to false discovery rates. JRSS(B), 64: , Newton M, Noueiry A, Sarkar D, Ahlquist P. Detecting differential gene expression with a semiparametric hierarchical mixture method. Biostatistics, 5: , Crime Bennett T. and Durie L. Preventing residential burglary in Cambridge: From crime audits to targeted strategies. Police research series Paper 108,