Crime survey Neuchatel, 7-8 July 2011

Slides:

Advertisements

Similar presentations

Computational Statistics. Basic ideas  Predict values that are hard to measure irl, by using co-variables (other properties from the same measurement.

Advertisements

Notes Sample vs distribution “m” vs “µ” and “s” vs “σ” Bias/Variance Bias: Measures how much the learnt model is wrong disregarding noise Variance: Measures.

Pattern Recognition and Machine Learning

Irwin/McGraw-Hill © Andrew F. Siegel, 1997 and l Chapter 12 l Multiple Regression: Predicting One Factor from Several Others.

Variance reduction techniques. 2 Introduction Simulation models should be coded such that they are efficient. Efficiency in terms of programming ensures.

Regression Analysis Once a linear relationship is defined, the independent variable can be used to forecast the dependent variable. Y ^ = bo + bX bo is.

Lecture 6 (chapter 5) Revised on 2/22/2008. Parametric Models for Covariance Structure We consider the General Linear Model for correlated data, but assume.

A Bayesian hierarchical modeling approach to reconstructing past climates David Hirst Norwegian Computing Center.

G. Cowan Lectures on Statistical Data Analysis 1 Statistical Data Analysis: Lecture 10 1Probability, Bayes’ theorem, random variables, pdfs 2Functions.

Sérgio Pequito Phd Student

Linear Regression Models Based on Chapter 3 of Hastie, Tibshirani and Friedman Slides by David Madigan.

End of Chapter 8 Neil Weisenfeld March 28, 2005.

Small Area Estimation of Public Safety Indicators in the Netherlands Bart Buelens Statistics Netherlands Conference on Indicators and Survey Methodology.

Ensemble Learning (2), Tree and Forest

Arun Srivastava. Small Areas What is a small area? Sub - population Domain The Domain need not necessarily be geographical. Examples Geographical Subpopulations.

Introduction to Multilevel Modeling Using SPSS

Regression and Correlation Methods Judy Zhong Ph.D.

Evaluating proactive policing Maryland June 6, 2006 Evaluating proactive policing in the Netherlands Evidence from a victimization survey Ben Vollaard.

1 MULTI VARIATE VARIABLE n-th OBJECT m-th VARIABLE.

Object Orie’d Data Analysis, Last Time

© 2002 Prentice-Hall, Inc.Chap 14-1 Introduction to Multiple Regression Model.

Slide 1 Estimating Performance Below the National Level Applying Simulation Methods to TIMSS Fourth Annual IES Research Conference Dan Sherman, Ph.D. American.

1 Ratio estimation under SRS Assume Absence of nonsampling error SRS of size n from a pop of size N Ratio estimation is alternative to under SRS, uses.

VI. Evaluate Model Fit Basic questions that modelers must address are: How well does the model fit the data? Do changes to a model, such as reparameterization,

Repeated Measurements Analysis. Repeated Measures Analysis of Variance Situations in which biologists would make repeated measurements on same individual.

Center for Radiative Shock Hydrodynamics Fall 2011 Review Assessment of predictive capability Derek Bingham 1.

Manu Chandran. Outline Background and motivation Over view of techniques Cross validation Bootstrap method Setting up the problem Comparing AIC,BIC,Crossvalidation,Bootstrap.

Evaluating generalised calibration / Fay-Herriot model in CAPEX Tracy Jones, Angharad Walters, Ria Sanderson and Salah Merad (Office for National Statistics)

Population Pharmacokinetic Characteristics of Levosulpiride and Terbinafine in Healthy Male Korean Volunteers Yong-Bok Lee College of Pharmacy and Institute.

Eurostat Weighting and Estimation. Presented by Loredana Di Consiglio Istituto Nazionale di Statistica, ISTAT.

Multiple Regression. Simple Regression in detail Y i = β o + β 1 x i + ε i Where Y => Dependent variable X => Independent variable β o => Model parameter.

Return to Big Picture Main statistical goals of OODA: Understanding population structure –Low dim ’ al Projections, PCA … Classification (i. e. Discrimination)

Over-fitting and Regularization Chapter 4 textbook Lectures 11 and 12 on amlbook.com.

Designing Factorial Experiments with Binary Response Tel-Aviv University Faculty of Exact Sciences Department of Statistics and Operations Research Hovav.

Machine Learning 5. Parametric Methods.

Tutorial I: Missing Value Analysis

Lecture 1: Basic Statistical Tools. A random variable (RV) = outcome (realization) not a set value, but rather drawn from some probability distribution.

1 Statistics 262: Intermediate Biostatistics Regression Models for longitudinal data: Mixed Models.

1 ESSnet on Small Area Estimation Meeting no. 4 ESSnet on Small Area Estimation Meeting no. 4 Neuchatel, 7-8 July 2011 WP4: Software Tools.

G Lecture 71 Revisiting Hierarchical Mixed Models A General Version of the Model Variance/Covariances of Two Kinds of Random Effects Parameter Estimation.

Parameter Estimation. Statistics Probability specified inferred Steam engine pump “prediction” “estimation”

Tree and Forest Classification and Regression Tree Bagging of trees Boosting trees Random Forest.

Overfitting, Bias/Variance tradeoff. 2 Content of the presentation Bias and variance definitions Parameters that influence bias and variance Bias and.

Ch 1. Introduction Pattern Recognition and Machine Learning, C. M. Bishop, Updated by J.-H. Eom (2 nd round revision) Summarized by K.-I.

ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.

Unsupervised Learning

Chapter 14 Introduction to Multiple Regression

Chapter 3: Maximum-Likelihood Parameter Estimation

Probability Theory and Parameter Estimation I

CH 5: Multivariate Methods

Further Inference in the Multiple Regression Model

Statistics in MSmcDESPOT

Multiple Regression Analysis and Model Building

G Lecture 6 Multilevel Notation; Level 1 and Level 2 Equations

BA 275 Quantitative Business Methods

Small area estimation of violent crime victim rates in the Netherlands

'Linear Hierarchical Models'

Chapter 8: Weighting adjustment

Linear Model Selection and regularization

Cross-validation for the selection of statistical models

OVERVIEW OF LINEAR MODELS

Multivariate Methods Berlin Chen

Multivariate Methods Berlin Chen, 2005 References:

Marie Reijo, Population and Social Statistics

The European Statistical Training Programme (ESTP)

SMALL AREA ESTIMATION FOR CITY STATISTICS

N. Ganesh, Adrijo Chakraborty, Vicki Pineau, and J. Michael Dennis

Small area estimation for the Dutch Investment Survey

Probabilistic Surrogate Models

Unsupervised Learning

Presentation transcript:

Crime survey Neuchatel, 7-8 July 2011 CBS case study Crime survey Neuchatel, 7-8 July 2011

Crime and victimization survey Planned domains: police districts Introduction Crime and victimization survey Planned domains: police districts Sample size approx 750 / district 2005-2008: NSM 2008 onwards: ISM SAE of crime statistics

Data collection: sequential mixed-mode Different questionnaire From NSM to ISM Local oversampling Data collection: sequential mixed-mode Different questionnaire Discontinuities expected SAE of crime statistics

Quantifying discontinuities Survey transition from NSM to ISM Small scale NSM in parallel to new ISM (full scale: approx 18,000; small: 1/3rd) Discontinuities at national level Now: police district level discontinuities required But: NSM sample too small => SAE SAE of crime statistics

Example of discontinuity Bicycle thefts NSM and ISM 2009 Total: 541,000 (NSM) ; 897,000 (ISM) SAE of crime statistics

Coeff of variation, bicycle theft, 2009 NSM: 0.41 ; ISM: 0.24 SAE of crime statistics

SAE to increase precision of NSM Fay-Herriot model: linear mixed, area level EBLUP and HB estimators Bayesian estimation of model variance also in EBLUP (avoiding zero-estimates of model variance) SAE of crime statistics

Bayesian estimation of model variance SAE of crime statistics

Covariates from registers Police: Reported offences: property crimes, violence, assaults, threats, illicit drugs, weapons, vandalism, traffic offences Administration: age, ethnicity, urbanisation, house prices, welfare claimants SAE of crime statistics

Covariates from ISM survey Design based GREG estimates as auxiliary information (Ybarra & Lohr 2008) Consequences for small area estimates Model estimate weighted lower in BLUP due to error in covariate Achieved through higher estimate of model variance in EBLUP (not Y & L adjustment) Variance of GREGs approx. equal for all areas (Other idea: multivariate FH model) SAE of crime statistics

Simulating errors in covariates Bicycle thefts: NSM survey ~ police-reported No error post. mean model var = 1.22 Adding error, mean 0, sd 2, iterate 1,000 x post. mean model var = 1.32 To add detail, e.g. estimated beta SAE of crime statistics

Dimension reduction: PCA Rather than using a small subset of covariates, use small dimension of PC subspace Not guaranteed to work as correlation with survey variables not used in PCA Use as a separate set of potential covariates in model selection pc 1 2 3 4 5 6 … 12 var. expl. .39 .55 .67 .75 .81 .87 .99 SAE of crime statistics

PC space of dim 2 SAE of crime statistics

Cross validation (CV) LOO: leave-one-out, predictive accuracy Model selection Conditional AIC (Vaida & Blanchart 2005) cAIC = - 2 cond_llh + 2 eff_d ( AIC = - 2 llh + 2 d ) Cross validation (CV) LOO: leave-one-out, predictive accuracy Start from minimal model, and add terms, maximizing improvement wrt cAIC or CV, until no further improvement Llh = log likelihood evaluated at ML estimates of fixed effects coeffs beta and model var Cond_llh = conditional llh: conditional on random effects D = model complexity = number of fixed effects Eff_d = effective degrees of freedom, between no random effect and random effect as fixed (between 1 and the number of areas), is trace of hat matrix SAE of crime statistics

Model selection results For each NSM survey variable: 2 years, 2 criteria CV-models are larger cAIC are nested within CV models Hence: Use cAIC models Models differ between years Alternative: choose single model for both years SAE of crime statistics

Selected models 2008 2009 violent crimes satisf. police victimization ISM-bicycle-theft, REG-property, REG-weapons, ISM-property pc21, pc10, pc4 satisf. police ISM-satisf age,ISM-satisf,urbanisation victimization ISM-property, REG-property pc1, pc21,pc5,pc6 property crimes ISM-victim, elderly pc1, pc21,pc2,pc5,pc6 nuisance ISM-nuisance, elderly ISM-victim, REG-traffic, ISM-property feeling unsafe ISM-nuisance, house val, ISM-satisf ISM-unsfae, ISM-satisf degradation pc1,pc4,pc10,pc22 ISM-degrad bicycle theft ISM-bicycle ISM-bicycle, ISM-satisf SAE of crime statistics

Selected models excl. ISM 2008 2009 violent crimes PC satisf. police victimization REG-property, elderly property crimes REG-property, age REG-property, REG-traffic, REG-weapons nuisance feeling unsafe degradation urban, house val, REG-vandalism bicycle theft SAE of crime statistics

SAE results (hybrid EBLUP), reduction in coeff. of variation incl. ISM excl. ISM violent crimes -40 % satisf. police -47 % -46 % victimization -43 % -41 % property crimes -44 % -42 % nuisance -33 % feeling unsafe -25 % degradation -35 % -16 % bicycle theft -39 % SAE of crime statistics

Bicycle theft, cv, 2009 NSM: 0.41, EBLUP: 0.23, ISM:0.24 SAE of crime statistics

SAE results, weight of direct est. in BLUP incl. ISM excl. ISM violent crimes 0.21 0.27 satisf. police 0.24 0.35 victimization 0.20 0.22 property crimes 0.19 nuisance feeling unsafe 0.39 0.41 degradation 0.31 0.64 bicycle theft 0.32 SAE of crime statistics

EBLUP vs. Hierarchical Bayes Diff. point est. Diff. var est. violent crimes -0.1 % -4.7 % satisf. police +0.0 % -3.8 % victimization -0.0 % property crimes -4.5 % nuisance -4.6 % feeling unsafe -3.1 % degradation -4.0 % bicycle theft -0.2 % -2.6 % HB accounts for uncertainty in estimating the model variance SAE of crime statistics

Considerable increase in precision with SAE Conclusions Considerable increase in precision with SAE Gain in precision depends on variable PCA is important for some variables Using ISM outcomes important for some variables MSE estimates HB higher (preferable) SAE of crime statistics

Sort out errors in input data! And re-run everything. To do (maybe) Sort out errors in input data! And re-run everything. Calibration to direct estimate of totals (is model diagnostic) Study residuals Elaborate on errors in covariates Use past survey outcomes as covariates More detailed comparison of HB-NSM estimates with ISM SAE of crime statistics

Future work (post-ESSnet) Multivariate modelling of NSM and ISM variables Consider model averaging Using more detailed areas, with smaller sample sizes: beneficial? SAE of crime statistics