Presentation is loading. Please wait.

Presentation is loading. Please wait.

Statistical disclosure control on visualising geocoded population data using a structure in quadtrees Eduard Suñé, Cristina Rovira, Daniel Ibáñez, Mireia.

Similar presentations


Presentation on theme: "Statistical disclosure control on visualising geocoded population data using a structure in quadtrees Eduard Suñé, Cristina Rovira, Daniel Ibáñez, Mireia."— Presentation transcript:

1 Statistical disclosure control on visualising geocoded population data using a structure in quadtrees Eduard Suñé, Cristina Rovira, Daniel Ibáñez, Mireia Farré NTTS A-005

2 Disclosure control by spatial aggregation using quadtrees
125 m. 250 m. 500 m. 1 Km. Errors in population calculation Risk of disclosure resolution + - European Standard Grid A quadtree is defined by {maximum resolution, minimum resolution, georeferenced data, threshold} Decision  QT{125m,250m,PR2014,17} Of the two methods of preserving statistical confidentiality, disturbance coordinates and spatial aggregation, we have employed the latter. On the basis of the European standard grid, the aggregation process uses the known data structure in quadtrees. In this slide we can see an example of the aggregation mechanism: it starts from a certain level of resolution and all the elements that have a population below the threshold are added spatially with their 4 siblings in the hierarchy, and so on, recursively. In any case, it is necessary to decide what the maximum and minimum resolutions to limit this process of aggregation are. We decided that the quadtree parameters for the Population Register 2014 case would be: 17 inhabitants for the threshold, 125m for maximum resolution and 250m for minimum resolution. We believe these are parameters that result in a suitable compromise between the risk of disclosure and errors in population calculation. However, the quadtree building algorithm may cause undesirable aggregations when there is a high degree of population variance between siblings in the hierarchy. The solution adopted consisted of translating population between these siblings, until the threshold was reached. This prevents aggregation, provided that the absolute error of the translation is smaller than that of the aggregation, when comparing the results with the initial circumstances. 4 111 552 621 322 1.288 17 110 546 615 4 Border effect: Avoided by translations when the absolute error is less than the aggregation QT {125m,250m,PR2014,17} NTTS A-005

3 Estimations of errors. Monte Carlo experiment
Quartile 1 Median Quartile 3 Mean QT{125m, 250m, PR2014, 17, t} 0.02 0.05 0.19 0.28 QT{125m, 250m, PR2014, 17} 0.07 0.22 0.33 QT{125m, 125m, PR2014, 17} 0.01 0.04 0.14 0.23 QT{125m,250m,PR2014,17,t} ≈ 50,000 random polygons QT{125m,250m,PR2014,17} In order to estimate the relative error in population calculation according to the quadtree used in comparison with the disaggregated layer of points, we have designed Monte Carlo experiments. The results of the estimate are that the average relative error with the quadtree at 125 m, 250 m and a threshold of 17, with translations, coloured green on the slide, is of 5%. The quadtree with translations produces lower relative errors than quadtrees without translations. For each polygon Sᵢ, relative error is εᵢ = |𝒏′ᵢ−𝒏ᵢ| 𝒏ᵢ [1] QT{125m,125m,PR2014,17} nᵢ = Population within the X geometry Sᵢ n'ᵢ = ∑ nᵣ * AREA ( Qᵣ ∩ Sᵢ ) / AREA ( Qᵣ ) relative error NTTS A-005

4 6. Conclusions The use of quadtrees for the dissemination of georeferenced data is a good method for the preservation of statistical confidentiality, as a certain balance between security and accuracy is achieved. This preservation method may lead to undesirable aggregations in areas which correspond to siblings in the hierarchy, due to the high values of population variance (border effect). A solution to the border effect consists of translating microdata under the condition that the absolute error of the aggregation is greater than that of the translation. Monte Carlo techniques allow the estimation of the relative error distribution for the population calculated within the quadtree structure QT{125m,250m,PR2014,17,t}. We have obtained a value of 5.3% for the median of these errors. Finally, in this last slide we show the main conclusions of this work. Firstly, the use of quadtrees for preserving statistical confidentiality in spatial information. Secondly, in order to avoid the border effect, the proposed solution is to make translations of microdata, as this improves accuracy. Finally, we have estimated the relative error in population calculations in different quadtrees using Monte Carlo methods . NTTS A-005


Download ppt "Statistical disclosure control on visualising geocoded population data using a structure in quadtrees Eduard Suñé, Cristina Rovira, Daniel Ibáñez, Mireia."

Similar presentations


Ads by Google