Presentation is loading. Please wait.

Presentation is loading. Please wait.

© Federal Statistical Office, Research Data Centre, Maurice Brandt Folie 1 Analytical validity and confidentiality protection of anonymised longitudinal.

Similar presentations


Presentation on theme: "© Federal Statistical Office, Research Data Centre, Maurice Brandt Folie 1 Analytical validity and confidentiality protection of anonymised longitudinal."— Presentation transcript:

1 © Federal Statistical Office, Research Data Centre, Maurice Brandt Folie 1 Analytical validity and confidentiality protection of anonymised longitudinal enterprise microdata – Survey of a German Project Maurice Brandt 1, Michael Konold 2, Rainer Lenz 3 and Martin Rosemann 4 Research Data Centres of the Federal Statistical Office 1 and the Statistical Offices of the Länder 2, University of Applied Sciences Mainz 3 Institute for Applied Economic Research 4 Work session on statistical data confidentiality Manchester 17-19 December 2007

2 © Federal Statistical Office, Research Data Centre, Maurice Brandt Folie 2 Overview 1.Introduction 2.The data sets of the project 3.Anonymisation methods and analytical validity 4.Approaches to assessing anonymity 5.Conclusions

3 © Federal Statistical Office, Research Data Centre, Maurice Brandt Folie 3 1. Introduction “Business Panel data and de facto anonymisation” new project since the beginning of 2006  improve the data infrastructure in Germany regarding longitudinal data on local units and enterprises  guarantee the access of the scientific community to the panel data of economic statistics  the formerly project “De facto anonymisation of business microdata” has shown that de facto anonymisation can be achieved on a cross-section basis

4 © Federal Statistical Office, Research Data Centre, Maurice Brandt Folie 4 1. Introduction  In this project different business statistics are linked to longitudinal datasets  it is planned to complement the data with information from the official business register  the data sets can already be used for scientific work  the final aim is to produce a scientific use file

5 © Federal Statistical Office, Research Data Centre, Maurice Brandt Folie 5 2.1 The data sets of the project Units of analysis are the local units in manufactoring and mining Complete enumeration of local units with 20 or more employees Monthly reports  years from 1995 to 2005  Information about employees, wages, salaries, turnover Survey of investments  years from 1995 to 2005  Information on highly different types of investments Survey of small units  years from 1995 to 2002  Local units with 19 or fewer employees

6 © Federal Statistical Office, Research Data Centre, Maurice Brandt Folie 6 2.2 The data sets of the project Cost Structure Survey Stratified sample of enterprises with 20 or more employees in the manufacturing and mining sector  years from 1995 to 2005  all together over 43.000 enterprises  Information on output, production factors, employees  from 1999 to 2002 13.300 enterprises available in the whole period  studies regarding investments in research and development are possible

7 © Federal Statistical Office, Research Data Centre, Maurice Brandt Folie 7 2.3 The data sets of the project Turnover Tax Statistics  Very large data set of a total of 4.3 million enterprises years from 2000 to 2004 (1.8 million for the whole period)  Information on all taxable turnovers, turnover tax, prior tax and of tax liability IAB Panel of local units  Information on employment trend, staff structure, hours worked, turnover, export share, investments and innovation  Since year 1993 various waves on about 4.300 to a max. of 16.000 local units

8 © Federal Statistical Office, Research Data Centre, Maurice Brandt Folie 8 3. Anonymisation methods and analytical validity Anonymisation methods  methods reducing the information (suppression of variables or presenting key variables in broader categories)  methods modifying the values of numerical data (data perturbating methods) Data perturbating methods for panel data  Micro aggregation: (a) separately for all variables and all periods (Individual Ranking), (b) separately for all variables but jointly for all periods, (c) separately for all periods but jointly for all variables and (d) jointly for all periods and all variable  Multiplicative stochastic noise: mixture distribution (approach of Höhne)  Multiple Imputation

9 © Federal Statistical Office, Research Data Centre, Maurice Brandt Folie 9 3. Anonymisation methods and analytical validity In Focus Impacts of data perturbating methods on  descriptive distribution measures  the estimation of econometric panel models, particularly on the within-estimator to control for individual unobservable heterogeneity First Results  the within estimator is consistent in the case of anonymisation by individual ranking  Project team derived consistent within-estimators in the case of anonymisation by multiplicative stochastic noise (including the method of Höhne) and no autocorrelation  Case of autocorrelation: work in progress  Multiple Imputation: separate speech on this conference

10 © Federal Statistical Office, Research Data Centre, Maurice Brandt Folie 10 4. Approaches to assessing anonymity We calculate coefficients (AP)Minimize s.t. and and obtain: {a 1,...,a n } external data {b 1,...,b n } target data

11 © Federal Statistical Office, Research Data Centre, Maurice Brandt Folie 11 4. Approaches to assessing anonymity Four approaches in order to estimate the coefficients of the linear program (AP) are used:  Conventional distance based approach  Correlation based approach  Distribution based approach  Collinearity based approach

12 © Federal Statistical Office, Research Data Centre, Maurice Brandt Folie 12 5. Conclusions Within the scope of the project the panel data sets can be used by  remote data processing  safe scientific work stations in the office They are already used in some research projects First scientific use files for data use on one‘s own workstation are probably available at the beginning of 2009

13 © Federal Statistical Office, Research Data Centre, Maurice Brandt Folie 13 Thank you for your attention


Download ppt "© Federal Statistical Office, Research Data Centre, Maurice Brandt Folie 1 Analytical validity and confidentiality protection of anonymised longitudinal."

Similar presentations


Ads by Google