An R vs SAS Experiment Megan Pope and Gareth Clews Office for National Statistics.

Slides:



Advertisements
Similar presentations
Calculation of Sampling Errors MICS3 Data Analysis and Report Writing Workshop.
Advertisements

1 ESTIMATION IN THE PRESENCE OF TAX DATA IN BUSINESS SURVEYS David Haziza, Gordon Kuromi and Joana Bérubé Université de Montréal & Statistics Canada ICESIII.
Evaluating the Effects of Business Register Updates on Monthly Survey Estimates Daniel Lewis.
Paul Smith Office for National Statistics
Outline of talk The ONS surveys Why should we weight?
Using Business Taxation Data as Auxiliary Variables and as Substitution Variables in the Australian Bureau of Statistics Frank Yu, Robert Clark and Gabriele.
European conference on quality in official statistics 2014
Module B-4: Processing ICT survey data TRAINING COURSE ON THE PRODUCTION OF STATISTICS ON THE INFORMATION ECONOMY Module B-4 Processing ICT Survey data.
GLOBAL TOBACCO SURVEILLANCE SYSTEM Global Youth Tobacco Survey Training Workshop Introduction to the GYTS Sample Design & Weights.
GENEralised software for Sampling Estimates and Errors in Surveys (GENESEES V. 3.0) Piero Demetrio Falorsi - Salvatore Filiberti Istat Structural Business.
Riku Salonen Regression composite estimation for the Finnish LFS from a practical perspective.
UNECE Work Session on Statistical Data Editing Vienna April 2008 Topic ii – Editing Administrative Data and Combined Sources.
Optimal Sampling Strategies for Multidomain, Multivariate Case with different amount of auxiliary information Piero Demetrio Falorsi, Paolo Righi 
Weighting sample surveys with Bascula Harm Jan Boonstra Statistics Netherlands.
NLSCY – Elements to take into account. Objectives of the Presentation zEmphasize the key elements to consider of when using NLSCY data.
NLSCY – Suggestions for papers. Objectives of the Presentation zEmphasize proper ways to use the NLSCY data zIdentify the key factors we are looking at.
Ranked Set Sampling: Improving Estimates from a Stratified Simple Random Sample Christopher Sroka, Elizabeth Stasny, and Douglas Wolfe Department of Statistics.
Who and How And How to Mess It up
Sampling.
Teaching Survey Sampling Theory using R Michael D. Larsen George Washington University UseR 2010 poster session, 7/21/10 1.
Understanding and Using NAMCS and NHAMCS Data Data Tools and Basic Programming Techniques 2010 National Conference on Health Statistics August 16, 2010.
Joint UNECE/Eurostat Meeting on Population and Housing Censuses (13-15 May 2008) Sample results expected accuracy in the Italian Population and Housing.
Metadata driven application for aggregation and tabular protection Andreja Smukavec SURS.
Sample Design.
Eurostat Statistical Data Editing and Imputation.
Administrative Data at Statistics Canada – Current Uses and the Way Forward 27 th Voorburg Group Meeting Warsaw, Poland André Loranger October 4, 2012.
CORE Rome Meeting – 3/4 October WP3: A Process Scenario for Testing the CORE Environment Diego Zardetto (Istat CORE team)
Measuring the quality of regional estimates from the ABS Jennie Davies and Daniel Ayoubkhani.
Improvements in stratification in the UK's Office for National Statistics Pete Brodie, Martina Portanti & Emily Carless UK Office for National Statistics.
Improving the Design of UK Business Surveys Gareth James Methodology Directorate UK Office for National Statistics.
Use of Administrative Data in Statistics Canada’s Annual Survey of Manufactures Steve Matthews and Wesley Yung May 16, 2004 The United Nations Statistical.
Optimal Allocation in the Multi-way Stratification Design for Business Surveys (*) Paolo Righi, Piero Demetrio Falorsi 
Software cost estimation Predicting the resources required for a software development process 1.
Aim: What is a sample design? Chapter 3.2 Sampling Design.
Copyright 2010, The World Bank Group. All Rights Reserved. Part 2 Labor Market Information Produced in Collaboration between World Bank Institute and the.
New and Emerging Methods Maria Garcia and Ton de Waal UN/ECE Work Session on Statistical Data Editing, May 2005, Ottawa.
The relationship between error rates and parameter estimation in the probabilistic record linkage context Tiziana Tuoto, Nicoletta Cibella, Marco Fortini.
1 Dealing with Item Non-response in a Catering Survey Pauli Ollila Statistics Finland Kaija Saarni Finnish Game and Fisheries Research Institute Asmo Honkanen.
Topic (vi): New and Emerging Methods Topic organizer: Maria Garcia (USA) UNECE Work Session on Statistical Data Editing Oslo, Norway, September 2012.
Sampling Design and Analysis MTH 494 LECTURE-12 Ossam Chohan Assistant Professor CIIT Abbottabad.
Computing tasks associated with Time Series Processing Extract from a presentation by Fortier and Quenneville Statistics Canada -TSRAC BSMD Seminar --
Sampling Error Estimation – SORS practice Rudi Seljak, Petra Blažič Statistical Office of the Republic of Slovenia.
Evaluating generalised calibration / Fay-Herriot model in CAPEX Tracy Jones, Angharad Walters, Ria Sanderson and Salah Merad (Office for National Statistics)
A Comparison of Variance Estimates for Schools and Students Using Taylor Series and Replicate Weighting Ellen Scheib, Peter H. Siegel, and James R. Chromy.
Workshop on Stock Assessment Methods 7-11 November IATTC, La Jolla, CA, USA.
©Ian Sommerville 2000Software Engineering, 7th edition. Chapter 26Slide 1 Software cost estimation l Predicting the resources required for a software development.
Eurostat Weighting and Estimation. Presented by Loredana Di Consiglio Istituto Nazionale di Statistica, ISTAT.
© Federal Statistical Office, Institute for Research and Development in Federal Statistics, Elmar Wein Federal Statistical Office Concepts, materials and.
Introduction to Survey Sampling
Multivariate selective editing via mixture models: first applications to Italian structural business surveys Orietta Luzi, Guarnera U., Silvestri F., Buglielli.
Statistics Canada Citizenship and Immigration Canada Methodological issues.
Analysis of Experiments
Sampling technique  It is a procedure where we select a group of subjects (a sample) for study from a larger group (a population)
An Alternative Package for Estimating Multivariate Generalised Linear Mixed Models in R Damon Berridge, Robert Crouchley & Daniel Grose, Lancaster University,
Joint UNECE-Eurostat worksession on confidentiality, 2011, Tarragona Sampling as a way to reduce risk and create a Public Use File maintaining weighted.
University of Warwick, Department of Sociology, 2012/13 SO 201: SSAASS (Surveys and Statistics) (Richard Lampard) Survey Design: Some Implications for.
Online Programming| Online Training| Real Time Projects | Certifications |Online Classes| Corporate Training |Jobs| CONTACT US: STANSYS SOFTWARE SOLUTIONS.
Øyvind Langsrud New Challenges for Statistical Software - The Use of R in Official Statistics, Bucharest, Romania, 7-8 April 1 A variance estimation R.
1 General Recommendations of the DIME Task Force on Accuracy WG on HBS, Luxembourg, 13 May 2011.
Small area estimation combining information from several sources Jae-Kwang Kim, Iowa State University Seo-Young Kim, Statistical Research Institute July.
Copyright 2010, The World Bank Group. All Rights Reserved. Producer prices, part 2 Measurement issues Business Statistics and Registers 1.
Using Latent Variable Models in Survey Research Roger E. Millsap Arizona State University Contact: (480)
University of Warwick, Department of Sociology, 2012/13 SO 201: SSAASS (Surveys and Statistics) (Richard Lampard) Survey Design: Some Implications for.
Weighting and reweighting in surveys
Estimation methods for the integration of administrative sources
Predict Failures with Developer Networks and Social Network Analysis
Confidence intervals for the difference between two means: Independent samples Section 10.1.
Istat - Structural Business Statistics
Marie Reijo, Population and Social Statistics
Sampling and estimation
Presentation transcript:

An R vs SAS Experiment Megan Pope and Gareth Clews Office for National Statistics

R at ONS Open source software in ONS Supporting the government IT strategy Development of training for GSS R Development Group i.Support use of R within ONS ii.Increase user base iii.Aim for incorporation in production systems Teaching R to a SAS audience Increasing usage 2

SAS at ONS Designated standard software Statistics Canada Generalised Estimation System (GES) Suite of SAS macros Calibration weights, domain estimates, variance estimates 3

ReGenesees Free R package R evolved Generalised software for sampling estimates and errors in surveys Developed by Italian Statistics Office (Istat) 4

R vs SAS Comparative study of complex survey estimation software Quality Improvement Fund (QIF) SAS (GES) v R (ReGenesees) Investigating open source in line with GSS strategy 5

Calibration Used if there is a relationship between auxiliary data and response variable An estimation procedure which constrains sample-based estimates of auxiliary variables to known totals (or accurate estimates) 6

Surveys chosen and why... Business surveys QSI– Cut-off sample BRES – Separate calibration totals Set thresholds for Winsorisation ABS – Biggest survey with 4,000 strata Externally calibrated weights 7

Surveys chosen and why... Social surveys LFS – biggest survey resource intensive LOS – longitudinal IPS – 2-stage calibration 8

Quarterly Stock Inquiry Cut-off sampling Combined ratio estimation Calibration to one auxiliary Estimates and variance estimates GES – Seven separate input files ReGenesees – Six simple commands 9

Quarterly Stock Inquiry - GES 10

Quarterly Stock Inquiry - ReGenesees design<e.svydesign(data= ids= strata= weights= fpc=) template<-pop.template(data= calmodel= partition=) pop<-fill.template(universe= template=) population.check(df.population= data= calmodel= partition=) cal<-e.calibrate(design= df.population= sigma2=) est<-svystatTM(design= y= by=,) 11

What we found Software comparison Time Missing values Programming 12

Conclusions/Recommendations ReGenesees successfully used in place of GES ReGenesees easier – less risk! GES more capable for some aspects and vice versa Recommend to explore further! 13

Questions