Repeated anonymised samples of administrative records: an application to social security data in Brazil Rigan A. C. Gonzalez (DATAPREV-Brazil) Pedro L.

Slides:



Advertisements
Similar presentations
Statistics NZs experience in using Administrative Data in an Integrated Programme of Economic Vince Galvin General Manager Strategy & Communications.
Advertisements

Evaluating the Effects of Business Register Updates on Monthly Survey Estimates Daniel Lewis.
EVAULATION OF THE NSCRG SCHOOL SAMPLE Donsig Jang and Xiaojing Lin Third International Conference on Establishment Surveys Montreal, Canada, June 21, 2007.
Sampling Strategy for Establishment Surveys International Workshop on Industrial Statistics Beijing, China, 8-10 July 2013.
Enhancing Data Quality of Distributive Trade Statistics Workshop for African countries on the Implementation of International Recommendations for Distributive.
Riku Salonen Regression composite estimation for the Finnish LFS from a practical perspective.
STAT262: Lecture 5 (Ratio estimation)
Quality Indicators for Learning and Teaching 2014 Survey Managers’ Information Forum 18 July 2014 Dr Andrew Taylor Branch Manager, Higher Education Data.
A new sampling method: stratified sampling
Impact Evaluation Session VII Sampling and Power Jishnu Das November 2006.
Trade and business statistics: use of administrative data Lunch Seminar Enrico Giovannini Italian National Statistical Institute (ISTAT) New York, February,
Joint UNECE/Eurostat Meeting on Population and Housing Censuses (13-15 May 2008) Sample results expected accuracy in the Italian Population and Housing.
United Nations Workshop on the 2010 World Programme on Population and Housing Censuses: Census Evaluation and Post Enumeration Surveys, Amman, Jordan,
Eurostat Repeated surveys. Presented by Eva Elvers Statistics Sweden.
Improving Quality in the Office for National Statistics’ Annual Earnings Statistics Pete Brodie & Kevin Moore UK Office for National Statistics.
The new HBS Chisinau, 26 October Outline 1.How the HBS changed 2.Assessment of data quality 3.Data comparability 4.Conclusions.
Measuring population development from social cohesion perspective by women and men according to the Census data Urve Kask Statistics Estonia.
Metadata: Integral Part of Statistics Canada Quality Framework International Conference on Agriculture Statistics October 22-24, 2007 Marcelle Dion Director.
Labour Market Inequality in India and Brazil: Comparing Labour Market Institutions in India and Brazil Taniya Chakrabarty 18th December 2014.
American Community Survey Presented at the Meeting of the National Neighborhood Indicators Partnership Susan Schechter May
Copyright 2010, The World Bank Group. All Rights Reserved. Agricultural Census Sampling Frames and Sampling Section A 1.
COLLECTING QUANTITATIVE DATA: Sampling and Data collection
MGT-491 QUANTITATIVE ANALYSIS AND RESEARCH FOR MANAGEMENT OSMAN BIN SAIF Session 13.
Definitions Observation unit Target population Sample Sampled population Sampling unit Sampling frame.
Near East Regional Workshop - Linking Population and Housing Censuses with Agricultural Censuses. Amman, Jordan, June 2012 Improving Efficiency.
Joint UNECE/Eurostat Meeting on Population and Housing Censuses (28-30 October 2009) Accuracy evaluation of Nuts level 2 hypercubes with the adoption of.
Crop area estimates with area frames in the presence of measurement errors Elisabetta Carfagna University of Bologna Department.
Multiple Indicator Cluster Surveys Survey Design Workshop Sampling: Overview MICS Survey Design Workshop.
9 th Workshop on Labour Force Survey Methodology – Rome, May 2014 The Italian LFS sampling design: recent and future developments 9 th Workshop on.
Use of Administrative Data in Statistics Canada’s Annual Survey of Manufactures Steve Matthews and Wesley Yung May 16, 2004 The United Nations Statistical.
Record matching for census purposes in the Netherlands Eric Schulte Nordholt Senior researcher and project leader of the Census Statistics Netherlands.
Optimal Allocation in the Multi-way Stratification Design for Business Surveys (*) Paolo Righi, Piero Demetrio Falorsi 
Plans for Access to UK Microdata from 2011 Census Emma White Office for National Statistics 24 May 2012.
Comparing SPI and SSI Data Formats The case of Sri Lanka Ruwanthi Elwalagedara Joint ADB / ILO / OECD Korea Policy Centre Technical Workshop on Social.
1 Sources of gender statistics Angela Me UNECE Statistics Division.
The Effect of Questionnaire Length on Response Rates and Level of Estimates in the German Job Vacancy Survey Hans Kiesl Institute for Employment Research.
Scot Exec Course Nov/Dec 04 Survey design overview Gillian Raab Professor of Applied Statistics Napier University.
European Conference on Quality in Official Statistics Roma, July 8-11, 2008 New Sampling Design of INSEE’s Labour Force Survey Sébastien Hallépée Vincent.
CZECH STATISTICAL OFFICE | Na padesátém 81, Prague 10 | czso.cz1/16 Ondřej Nývlt, Ilona Nováková, Lukáš Savko EFFECTIVE DISTRIBUTION OF SAMPLE OVER.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 1-1 Statistics for Managers Using Microsoft ® Excel 4 th Edition Chapter.
Sampling Design and Analysis MTH 494 Ossam Chohan Assistant Professor CIIT Abbottabad.
United Nations Regional Workshop on the 2010 World Programme on Population and Housing Censuses: Census Evaluation and Post Enumeration Surveys, Bangkok,
United Nations Economic Commission for Europe Statistical Division Mapping Data Production Processes to the GSBPM Steven Vale UNECE
MDG data at the sub-national level: relevance, challenges and IAEG recommendations Workshop on MDG Monitoring United Nations Statistics Division Kampala,
5-4-1 Unit 4: Sampling approaches After completing this unit you should be able to: Outline the purpose of sampling Understand key theoretical.
Reasons for differences between national and international reported indicators CountryData Workshop: Building better dissemination systems for national.
WP 19 Assessment of Statistical Disclosure Control Methods for the 2001 UK Census Natalie Shlomo University of Southampton Office for National Statistics.
ICES 2007 Labour Cost Index and Sample Allocation Outi Ahti-Miettinen and Seppo Laaksonen Statistics Finland (+ University of Helsinki) Labour cost index.
Census of Economic Establishments in Ethiopia Yasin Mossa Central Statistics Agency of Ethiopia July 2009.
United Nations Statistics Division Work Programme on Economic Census Vladimir Markhonko, Chief Trade Statistics Branch, UNSD Youlia Antonova, Senior Statistician,
The availability of Dutch census microdata Eric Schulte Nordholt Senior researcher and project leader of the Census Statistics Netherlands Division Social.
Chapter 6: 1 Sampling. Introduction Sampling - the process of selecting observations Often not possible to collect information from all persons or other.
Improving of Household Sample Surveys Data Quality on Base of Statistical Matching Approaches Ganna Tereshchenko Institute for Demography and Social Research,
Inflation Report May Output and supply Chart 3.1 Whole-economy GDP (a) (a) Chained volume measures. Annual growth of GDP at basic prices for 2005.
Bangor Transfer Abroad Programme Marketing Research SAMPLING (Zikmund, Chapter 12)
Statistics Canada Citizenship and Immigration Canada Methodological issues.
Exploring Microsimulation Methodologies for the Estimation of Household Attributes Dimitris Ballas, Graham Clarke, and Ian Turton School of Geography University.
Sampling technique  It is a procedure where we select a group of subjects (a sample) for study from a larger group (a population)
Comments for Hungarian and South Africa’s PRESENTATION Wu Jie Department of Population and Employment National Bureau of Statistics of China 27 – 30 June.
1 The World Census of Agriculture 2010 Programme : a Modular Approach Jack Colwell Hiek Som FAO Statistics Division MEXSAI, November 2004.
Joint UNECE-Eurostat worksession on confidentiality, 2011, Tarragona Sampling as a way to reduce risk and create a Public Use File maintaining weighted.
Census Office Fernando Casimiro Geneva, July 2010 Portugal – Census results tailored to user needs «
Sinclair Sutherland Labour supply: Finding and using statistics.
Rotating Panels – Especially with Regards to Business Statistics Peter Tibert Stoltze Statistical Methodology Forum for Sample Survey and Estimation April.
Eurostat Overview of the project Meeting of the Expert Group on the integration of the European social surveys January 2015.
11 Measuring Disclosure Risk and Data Utility for Flexible Table Generators Natalie Shlomo, Laszlo Antal, Mark Elliot University of Manchester
Copyright 2010, The World Bank Group. All Rights Reserved. Agricultural Census Sampling Frames and Sampling Section B 1.
Regression composite estimation for the Finnish LFS from a practical perspective Riku Salonen.
The use of Linked Employer-Employee Data in Maintaining the Statistics New Zealand Business Frame and in producing Business Demographic Statistics Geoff.
Point 6. Eurostat plans for Time Use Survey data processing and dissemination Working Group on Time Use Surveys 10 April 2013.
Presentation transcript:

Repeated anonymised samples of administrative records: an application to social security data in Brazil Rigan A. C. Gonzalez (DATAPREV-Brazil) Pedro L. N. Silva (University of Southampton-UK)

2 Outline Introduction and motivation Sample design and selection Some results from the selected anonymised samples Conclusions and discussion

3 Social Security Databases Brazilian Social Security Administration (SSA) maintains huge databases of contributors and beneficiaries enrolled in the social security system Records held provide a rich source of information about participation in the formal labour market and in distribution of social security benefits In particular, they provide a longitudinal perspective that is unavailable from other sources –There are no major longitudinal surveys covering the working age population in Brazil

4 SSA databases – main issues Confidentiality and security means that they are inaccessible for research purposes Currently used only for production of aggregate level summaries, published on regular basis –Pre-defined cross-classified tables, at high-level aggregation –Broad indicators only Not available for user specific analysis One idea: anonymised samples of records

5 Anonymised Samples of Records Enable dissemination of individual anonymised microdata While protecting the confidentiality of individual records Popularised from applications in population censuses More recently, also applied for administrative records –Drazga(2008) describes the US experience –Examples from other countries like UK and others

6 Anonymised Samples of Jobs Database Goal: to design samples of SSA database records to be extracted and made available for analysis on regular basis Proposed sample design: stratified simple random sampling at each time point Rotation strategy: use Permanent Random Numbers (PRNs – e.g. Ohlsson 1995) to control sample overlap across time –Enables longitudinal analysis –Enables each sample to represent updated survey population –Simple, but effective rotation control

7 Sample Design & Selection Target population = all jobs held by workers affiliated to the General Social Security Regime (GSSR) in reference period Reference period = July 2001 till June 2002 Key domains of analysis defined as cross-classification of states (27 levels) x SIC of employer (four ‘sectors’) 1=Manufacturing, 2=Trade and distribution services, 3=Other services, 4=Agriculture, construction and other productive activities Main targets of inference: job status distribution 1=Active, 2=New admission, 3=Terminated in current month, 4=Terminated in previous periods, 5=Not reported

8 Stratification & Sample Size 57 explicit strata 40 strata = 10 states by 4 SIC groups +17 states with no further stratification (state-only strata) Sample size in each stratum to estimate proportions of at least 1.5% with a CV no larger than  10% n h = 6,300 records in 40 state by SIC strata n h = 12,600 records in 17 state-only strata Larger size in 17 state-only strata to enable domain estimation by SIC with some confidence Total sample size n = 466,200 job records (< 1.5% of total)

9 Maximum Relative Error for Sample Proportion under SRS with n=6,307 at 95% confidence

10 Rotation Scheme Designed to rotate out  1/12 of the sample at each new selection period We used monthly samples, but this can easily be changed to other periods, such as quarters, semesters, years, etc. Time in sample for each record  12 months (or periods) Time in sample not fixed, due to stochastic rotation control caused from using PRN sampling

11 Sample Sizes for Alternative Analysis

12 Selected estimates of total and proportions of jobs by status – April 2002

13 Selected estimates of counts and proportions of new jobs by activity sector – April 2002

14 Scatter plot of estimated proportions of new admissions and their CVs – April 2002

15 Proportions of jobs terminated in month t+k, for jobs existing (Active) or started (New admissions) in January 2002 (k=0)

16 Conclusions and discussion (1) Brazilian SSA could improve its approach for releasing statistical information about the formal labour market by providing access to anonymised samples of jobs This would enable satisfying analytical needs of many specialized users, while still protecting the confidentiality of individual records This would substantially enhance the capacity for the study and evaluation of the impact of public policies regarding the Social Security system in Brazil

17 Conclusions and discussion (2) The sample design proposed worked well in our application All the sample selection, estimation and analysis activities were carried out using a standard desktop microcomputer Once the samples are made available, analysts should have no difficulty in exploring the data for their own estimation and analysis activities The various analyses carried out with the selected samples illustrate the potential of such samples for analytical use

18 Conclusions and discussion (3) For cross-sectional estimates in any given month, the sample of approximately 466,200 records delivers precise estimates for some fine domains of interest For longitudinal analyses with samples six months apart, the sample would still have approximately 233,100 matched records available

19 Future Work Improved weighting methods for longitudinal analyses (e.g. following LAVALLÉE, 1995) Detailed analysis of disclosure risks associated with proposed sampling strategy Assess impact and introduce control measures to reduce bias caused by late reporting of new jobs (births) and jobs terminated (deaths)

20 Thanks for your attention.

21 References GONZALEZ, R. A. C. (2005). Amostragem longitudinal em registros administrativos: uma aplicação à previdência social. Rio de Janeiro: Escola Nacional de Ciências Estatísticas, MSc. Dissertation. DRAZGA, L. (2008). Uses Of Administrative Data At The U.S. Social Security Administration. LAVALLÉE, P. Cross-sectional weighting of longitudinal surveys of individuals and households using the weight share method. Survey Methodology v. 21, nº 1, p , OHLSSON, E. Coordination of Samples using Permanent Random Numbers. In: Cox, Binder, Chinnappa, Christianson, Colledge & Kott (eds.) Business Survey Methods, New York, Wiley, p , 1995.

22 Synchronised sampling algorithm Apply steps below within each selection stratum h Step 1 – Sort the records in the updated sampling frame in ascending order of the corresponding permanent random numbers (X hi ) Step 2 – Calculate the rank P hi of each record i in stratum h according to the corresponding associated permanent random numbers –The smallest position in the stratum shall be 1 and the largest shall equal N th, the total number of records in stratum h at time t

23 Synchronised sampling algorithm Step 3 – Determine the start and end points for sample inclusion in stratum h at time t using (1) (2) t=1 for July 2001 n th is the sample size in stratum h at time t T is the maximum number of rounds which a record is expected to be included in the sample mod{a ; b} is the remainder of the division of a by b.

24 Synchronised sampling algorithm Step 4 – If then include in the sample for time t the records with positions satisfying Otherwise, include in the sample for time t the records with positions satisfying or Repeat for each new survey round as needed (increase t)