GENEralised software for Sampling Estimates and Errors in Surveys (GENESEES V. 3.0) Piero Demetrio Falorsi - Salvatore Filiberti Istat Structural Business Statistics
GENEralised software for Sampling Estimates and Errors in Surveys 2 Summary of the presentation § Objectives and evolution of the software § Software installation pre-requisites § Data needed for Genesees § Input data sets (characteristics, controls) § Output: tables, file formats, data sets § Structural Business Statistics (SBS) surveys using Genesees § Population of interest - Business Register § Domains of interest § SME Sampling strategy (current) § Variables of interest § Case study
GENEralised software for Sampling Estimates and Errors in Surveys 3 Objectives and evolution of the software (1/2) § Need to estimate variables of interest for social and economic statistics § Guarantee coherence among estimates in time and space § Improve quality of data produced (for example, in accordance to SBS Council Regulation) § Methodology (Deville and Särndal, 1992) § Implemented by Falorsi P.D. – Falorsi S..
GENEralised software for Sampling Estimates and Errors in Surveys 4 Objectives and evolution of the software (2/2) § Genesees prototype for social statistics § Genesees prototype for enterprises statistics (1992 as first reference year) § Several contributions to the development of the software have thereafter been provided by other Istat researchers § Delivery of the new releases is made regularly § Genesees is currently used for estimation in almost all Istat surveys
GENEralised software for Sampling Estimates and Errors in Surveys 5 Software installation pre-requisites § SAS for Windows § SAS Language, Macro, IML, Stat, Graph § HD ≥ 4 Mb; RAM ≥ 64 Mb How to download Genesees: § § then select: “Metodi e Software per le indagini statistiche” § download and then unzip the file “Genesees3.zip” on the directory c:\Genesees § to for the starting password § will inform you about the new releases of the software
GENEralised software for Sampling Estimates and Errors in Surveys 6 Data needed for Genesees § Frame (example: Business Register) → to get the known totals of auxiliary variables as a reference structure § Survey respondent units → to compute the initial sampling weight correction factor and then to assign the final sampling weight to each unit
GENEralised software for Sampling Estimates and Errors in Surveys 7 Input data sets (characteristics) § Input SAS data sets: l (“Noti”; “Inp”) § “Noti”: (var. name≤8 char.) l Planned population = domain of interest: (alfanum. var.; var. ≤15 char.) l Totals of auxiliary variables: (num. var.; at least 1 var.) § “Inp”: (var. name≤8 char.) l Id. Code (num. var.) l Planned population (as in “Noti”) l Auxiliary variables: (num. var.) (have to be inputted in the same order as in “Noti”) l Coef = initial weight (adjusted for unit non response); (num. var.) l Ck = “distance weight”: (num. var.); not necessary
GENEralised software for Sampling Estimates and Errors in Surveys 8 Input data sets (controls) § “Noti”: l Planned popul. =. → Procedure stops → data set “Noti-miss” l Totals of aux. var. =. → 0 § “Inp”: l Id. Code =. → Procedure stops → data set “Missing” l Id. Code = double → data set “Codici-doppi” l Auxiliary variables =. → 0 l Coef =. → 1 (no controls) l Ck =. → 1
GENEralised software for Sampling Estimates and Errors in Surveys 9 Output tables § Output tables (summary descriptive statistics related to the calibration estimators process) : l Table 1: Statistics on estimates and final weights for planned popul.; l Table 2: Statistics on initial weights correction factors; l Table 3: Statistics on estimates and initial weights; l Table 4: Prefixed parameters for the estimation iterative procedure; l Table 5: Known totals, direct and final estimates, and differences; l Tabulate 1: Controls on the domains: known totals, direct estimates, ratios between known totals and direct estimates, sample totals; l Tabulate 2: Sample size (respondents) and population estimate with direct weights; l Tabulate 3: Controls on domains without sample units.
GENEralised software for Sampling Estimates and Errors in Surveys 10 Output file formats § Output file formats l “genesees.log” (SAS log) l “stampa1.txt” – “stampa6.txt” (Tables) l “stampe stime.htm” (Tables) l Data sets SAS (“*.sas7bdat”)
GENEralised software for Sampling Estimates and Errors in Surveys 11 Output data sets (1/2) § Diagnostics (errors detected in the input step, if any) : l “missing”; (Id. Code =.) l “noti-miss”; (Planned popul. =.) l “vuoti” (domain is present in “Noti” but is not present in “Inp”); l “codici-doppi”; (Id. Code = double) l “csenzat” (domain is present in “Inp” but is not present in “Noti”); l “savestime” (shows parameters inputted)
GENEralised software for Sampling Estimates and Errors in Surveys 12 Output data sets (2/2) § Statistics and final weights: l “Pesifin” (initial w.; corr. factor; final w.; id.; conta; domain); l “stat” conta; max; min; sum; mean; var; cv; (with reference to initial weights, correction factor and final weights) Iterations; maxiter; converge; constraints (c2); sample units in the domain (r2); dist. func.; l ”stimedir”: domain; aux. var. totals; conta; l ”stimefin”: known total; direct estimate; final estimate; conta; difference between final estimate and known total
GENEralised software for Sampling Estimates and Errors in Surveys 13 Structural Business Statistics (SBS) Surveys using Genesees (1/2) § Small and Medium Enterprises (SME) Survey § Information and Communication Technologies (ICT) Survey § Structure of Earnings Survey (SES) § Labor Cost Survey (LCS) § Prodcom § SBS Preliminary Estimates § …
GENEralised software for Sampling Estimates and Errors in Surveys 14 Structural Business Statistics (SBS) Surveys using Genesees (2/2) Estimation of economic variables on enterprises according to: § Istat traditional data production on enterprises § Structural Business Statistics ( SBS ) EU Council Regulation No 58/97 l Preliminary estimates ( 1 estimation domain; t + 10 months ) l Final estimates ( 3 estimation domains; t + 18 months ) l Quality indicators and specific reports ( 3 estim.domains; t + 24 months ) Coefficient of Variation - CV (3 domains); Item and unit non response rate (1 domain); Specific reports on survey strategy and principal economic activity. t = year of reference
GENEralised software for Sampling Estimates and Errors in Surveys 15 Population of interest (1/2) Number of Italian enterprises (SBS 2002) Number of persons employed Economic activity sector (NACE Rev.1 Division) Total Manufacturing (10-41)463,05254,35926,22910,5061,553555,699 Constructions (45)513,15618,4035,1241, ,900 Services (50-74)2,555,63149,26817,4056,5971,2612,630,162 Total3,531,839122,03048,75818,2422,8923,723,761 Number of persons employed of the Italian enterprises (SBS 2002) Number of persons employed Economic activity sector (NACE Rev.1 Division) Total Manufacturing (10-41)1,230,838730,888774,903951,7641,176,7724,865,165 Constructions (45)1,045,386237,915145,77898,09147,8501,575,020 Services (50-74)4,464,548643,597517,363634,7251,459,2317,719,464 Total6,740,7721,612,4001,438,0441,684,5802,683,85314,159,649
GENEralised software for Sampling Estimates and Errors in Surveys 16 Population of interest (2/2)
GENEralised software for Sampling Estimates and Errors in Surveys 17 Business Register ASIA - Data sources: - Tax Register, Chambers of Commerce, Social Security, Work Accident Insurance, Electric Power Board, SEAT telephone directory - Statistical and probabilistic procedure for enterprises’ main economic activity detection - Variables in the register are the result of standardization, normalization and integration of information provided by administrative sources
GENEralised software for Sampling Estimates and Errors in Surveys 18 Domains of study (SBS final estimates) CodeType of domain (partition of population of interest) Number of domains (in the partition) DOM1NACE Rev.1.1 Class (4-digit)461 DOM2 NACE Rev.1.1 Group (3-digit) by size-class 1,047 DOM3 NACE Rev.1.1 Division (2-digit) by region 984
GENEralised software for Sampling Estimates and Errors in Surveys 19 SME Sampling strategy (current) § N ≈ 3,723,000 enterprises ( Business Register ) l (enterprises <10 persons employed cover 94.8% of the total enterprises and 47.8% of the total employment) § Stratified simple random sample § H ≈ 26,000 strata ( NACE Rev.1.1, Size class, Region ) § n ≈ 120,000 ( negative coordination with other SBS Surveys, multivariable and multidomain sample allocation ) § Survey technique: postal questionnaire; 2 call-backs § Calibration estimators methodology ( Deville and Särndal,1992 )
GENEralised software for Sampling Estimates and Errors in Surveys 20 Variables of interest - Turnover - Value added at factor cost - Employment - Total purchases of goods and services - Personnel costs - Wages and salaries - Production value - ….. Totals of variables of study are estimated with reference to subpopulation of interest (domains), as requested by SBS EU Regulation
GENEralised software for Sampling Estimates and Errors in Surveys 21 Case study
GENEralised software for Sampling Estimates and Errors in Surveys 22 Starting picture
GENEralised software for Sampling Estimates and Errors in Surveys 23 Thank you!