Statistical disclosure control on visualising geocoded population data using a structure in quadtrees Eduard Suñé, Cristina Rovira, Daniel Ibáñez, Mireia.

Slides:



Advertisements
Similar presentations
Estimation of TLD dose measurement uncertainties and thresholds at the Radiation Protection Service Du Toit Volschenk SABS.
Advertisements

Hypothesis testing Another judgment method of sampling data.
5. Estimation 5.3 Estimation of the mean K. Desch – Statistical methods of data analysis SS10 Is an efficient estimator for μ ?  depends on the distribution.
7-1 Introduction The field of statistical inference consists of those methods used to make decisions or to draw conclusions about a population. These.
1 Marketing Research Aaker, Kumar, Day Ninth Edition Instructor’s Presentation Slides.
6.1 Confidence Intervals for the Mean (Large Samples)
Joint UNECE/Eurostat Meeting on Population and Housing Censuses (13-15 May 2008) Sample results expected accuracy in the Italian Population and Housing.
Chapter 6: Sampling Distributions
Graph-based consensus clustering for class discovery from gene expression data Zhiwen Yum, Hau-San Wong and Hongqiang Wang Bioinformatics, 2007.
Aim: How do we find confidence interval? HW#9: complete question on last slide on loose leaf (DO NOT ME THE HW IT WILL NOT BE ACCEPTED)
Marketing Research Aaker, Kumar, Day Seventh Edition Instructor’s Presentation Slides.
© 2002 Thomson / South-Western Slide 8-1 Chapter 8 Estimation with Single Samples.
Aim: How do we find an appropriate sample size? HW#10: Complete question on last slide Quiz Friday.
Unit 1 Accuracy & Precision.  Data (Singular: datum or “a data point”): The information collected in an experiment. Can be numbers (quantitative) or.
QBM117 Business Statistics Estimating the population mean , when the population variance  2, is known.
1 Introduction to Estimation Chapter Concepts of Estimation The objective of estimation is to determine the value of a population parameter on the.
© The McGraw-Hill Companies, Inc., Chapter 6 Estimates and Sample Size with One Sample.
Chapter 7 Estimates and Sample Sizes
Random Sampling, Point Estimation and Maximum Likelihood.
Lecture 12 Statistical Inference (Estimation) Point and Interval estimation By Aziza Munir.
Simulation Prepared by Amani Salah AL-Saigaly Supervised by Dr. Sana’a Wafa Al-Sayegh University of Palestine.
University of Ottawa - Bio 4118 – Applied Biostatistics © Antoine Morin and Scott Findlay 08/10/ :23 PM 1 Some basic statistical concepts, statistics.
Health Datasets in Spatial Analyses: The General Overview Lukáš MAREK Department of Geoinformatics, Faculty.
7-1 Introduction The field of statistical inference consists of those methods used to make decisions or to draw conclusions about a population. These.
Dynamic Lines. Dynamic analysis n Health of people and activity of medical establishments change in time. n Studying of dynamics of the phenomena is very.
1 Estimation From Sample Data Chapter 08. Chapter 8 - Learning Objectives Explain the difference between a point and an interval estimate. Construct and.
Statistical Hypotheses & Hypothesis Testing. Statistical Hypotheses There are two types of statistical hypotheses. Null Hypothesis The null hypothesis,
WP 19 Assessment of Statistical Disclosure Control Methods for the 2001 UK Census Natalie Shlomo University of Southampton Office for National Statistics.
Chapter 8 : Estimation.
CZECH STATISTICAL OFFICE | Na padesátém 81, Prague 10 | czso.cz1/X Ing. Jaroslav Kraus, Ph.D. Mgr. Štěpán Moravec DISAGGREGATION METHODS FOR GEOREFERENCING.
Robust data filtering in wind power systems
Chapter 7 Point Estimation of Parameters. Learning Objectives Explain the general concepts of estimating Explain important properties of point estimators.
1 1 Slide © 2007 Thomson South-Western. All Rights Reserved Chapter 8 Interval Estimation Population Mean:  Known Population Mean:  Known Population.
CONFIDENCE INTERVALS.
Chapter 20 Classification and Estimation Classification – Feature selection Good feature have four characteristics: –Discrimination. Features.
Chapter 7 Confidence Intervals and Sample Size © Copyright McGraw-Hill
Business Statistics: Contemporary Decision Making, 3e, by Black. © 2001 South-Western/Thomson Learning 8-1 Business Statistics, 3e by Ken Black Chapter.
Chapter 7 Introduction to Sampling Distributions Business Statistics: QMIS 220, by Dr. M. Zainal.
Joint UNECE-Eurostat worksession on confidentiality, 2011, Tarragona Sampling as a way to reduce risk and create a Public Use File maintaining weighted.
Hypothesis Testing. Suppose we believe the average systolic blood pressure of healthy adults is normally distributed with mean μ = 120 and variance σ.
EUROPEAN FORUM FOR GEOSTATISTICS CONFERENCE (EFGS) October Lisbon, Portugal 1 Lessons learned from disaggregating population data by using.
Data disclosure control Nordic Forum for Geography and Statistics Stockholm, 10 th September 2015.
1 How to produce population gridded data - the aggregation approach Ola Nordbeck Statistics Norway.
7 th Grade Math Vocabulary Word, Definition, Model Emery Unit 4.
Chapter 6: Sampling Distributions
Confidence Intervals and Sample Size
Confidence Intervals and Sample Size
Aaker, Kumar, Day Ninth Edition Instructor’s Presentation Slides
ESTIMATION.
ECO 173 Chapter 10: Introduction to Estimation Lecture 5a
7-1 Introduction The field of statistical inference consists of those methods used to make decisions or to draw conclusions about a population. These.
Task: It is necessary to choose the most suitable variant from some set of objects by those or other criteria.
Ch9 Random Function Models (II)
Dissemination of Sensitive Variables on a Grid Dataset :
Behavioral Statistics
UZAKTAN ALGIILAMA UYGULAMALARI Segmentasyon Algoritmaları
ECO 173 Chapter 10: Introduction to Estimation Lecture 5a
CONCEPTS OF ESTIMATION
Arithmetic Mean This represents the most probable value of the measured variable. The more readings you take, the more accurate result you will get.
Monte Carlo Simulation of Neutrino Mass Measurements
Ch13 Empirical Methods.
Geographical Information Systems for Statistics Mar 2007
○ Hisashi Shimosaka (Doshisha University)
Technical guidance for grid based provision of data for MSFD reporting
OMGT LECTURE 10: Elements of Hypothesis Testing
Chapter 7 Lecture 3 Section: 7.5.
DESIGN OF EXPERIMENT (DOE)
Unfolding with system identification
Chapter 12 Statistics.
A New Fiscal Rule Karnit Flug Research department The Bank of Israel
Presentation transcript:

Statistical disclosure control on visualising geocoded population data using a structure in quadtrees Eduard Suñé, Cristina Rovira, Daniel Ibáñez, Mireia Farré www.idescat.cat NTTS 2017 9A-005

Disclosure control by spatial aggregation using quadtrees 125 m. 250 m. 500 m. 1 Km. Errors in population calculation Risk of disclosure resolution + - European Standard Grid A quadtree is defined by {maximum resolution, minimum resolution, georeferenced data, threshold} Decision  QT{125m,250m,PR2014,17} Of the two methods of preserving statistical confidentiality, disturbance coordinates and spatial aggregation, we have employed the latter. On the basis of the European standard grid, the aggregation process uses the known data structure in quadtrees. In this slide we can see an example of the aggregation mechanism: it starts from a certain level of resolution and all the elements that have a population below the threshold are added spatially with their 4 siblings in the hierarchy, and so on, recursively. In any case, it is necessary to decide what the maximum and minimum resolutions to limit this process of aggregation are. We decided that the quadtree parameters for the Population Register 2014 case would be: 17 inhabitants for the threshold, 125m for maximum resolution and 250m for minimum resolution. We believe these are parameters that result in a suitable compromise between the risk of disclosure and errors in population calculation. However, the quadtree building algorithm may cause undesirable aggregations when there is a high degree of population variance between siblings in the hierarchy. The solution adopted consisted of translating population between these siblings, until the threshold was reached. This prevents aggregation, provided that the absolute error of the translation is smaller than that of the aggregation, when comparing the results with the initial circumstances. 4 111 552 621 322 1.288 17 110 546 615 4 Border effect: Avoided by translations when the absolute error is less than the aggregation QT {125m,250m,PR2014,17} NTTS 2017 9A-005

Estimations of errors. Monte Carlo experiment Quartile 1 Median Quartile 3 Mean QT{125m, 250m, PR2014, 17, t} 0.02 0.05 0.19 0.28 QT{125m, 250m, PR2014, 17} 0.07 0.22 0.33 QT{125m, 125m, PR2014, 17} 0.01 0.04 0.14 0.23 QT{125m,250m,PR2014,17,t} ≈ 50,000 random polygons QT{125m,250m,PR2014,17} In order to estimate the relative error in population calculation according to the quadtree used in comparison with the disaggregated layer of points, we have designed Monte Carlo experiments. The results of the estimate are that the average relative error with the quadtree at 125 m, 250 m and a threshold of 17, with translations, coloured green on the slide, is of 5%. The quadtree with translations produces lower relative errors than quadtrees without translations. For each polygon Sᵢ, relative error is εᵢ = |𝒏′ᵢ−𝒏ᵢ| 𝒏ᵢ [1] QT{125m,125m,PR2014,17} nᵢ = Population within the X geometry Sᵢ n'ᵢ = ∑ nᵣ * AREA ( Qᵣ ∩ Sᵢ ) / AREA ( Qᵣ ) relative error NTTS 2017 9A-005

6. Conclusions The use of quadtrees for the dissemination of georeferenced data is a good method for the preservation of statistical confidentiality, as a certain balance between security and accuracy is achieved. This preservation method may lead to undesirable aggregations in areas which correspond to siblings in the hierarchy, due to the high values of population variance (border effect). A solution to the border effect consists of translating microdata under the condition that the absolute error of the aggregation is greater than that of the translation. Monte Carlo techniques allow the estimation of the relative error distribution for the population calculated within the quadtree structure QT{125m,250m,PR2014,17,t}. We have obtained a value of 5.3% for the median of these errors. Finally, in this last slide we show the main conclusions of this work. Firstly, the use of quadtrees for preserving statistical confidentiality in spatial information. Secondly, in order to avoid the border effect, the proposed solution is to make translations of microdata, as this improves accuracy. Finally, we have estimated the relative error in population calculations in different quadtrees using Monte Carlo methods . NTTS 2017 9A-005