Dissemination of Business Demography data - Balancing data needs with confidentiality Roundtable on Business Survey Frames 17-21/10/2005 Hartmut Schrör, Eurostat
Contents Current state of the data collection Dissemination of business demography data Demand for data Confidentiality and its implementation
Current state of the data collection Voluntary data collection established since 2002 (old Member States), new Member States joined in 2004. Basic elements: enterprise births, deaths, survival up to 5 years, and related employment. Some Member States missing Latest available data Enterprise births 2002 Survivals into 2002 Deaths 2001 Common methodology Business demography recommendations manual
Current state of the data collection Legal basis New Annex IX to Council Regulation on Structural Business Statistics Scope and definition of the data collection remains largely the same
Dissemination of demography data Eurostat website http://epp.eurostat.cec.eu.int All publishable data can be downloaded free of charge Multidimensional datasets in proprietary format Extraction to CSV, Excel Two datasets on business demography Breakdown by legal form (without survival data) Breakdown by employee size class (including survival data) No access to confidential data for researchers
Dissemination of demography data Publications Statistics in Focus 12 pages in English, French, German Demographic events and related employment Activity focus on ICT Detailed tables Last one published in 2004 PDF versions available free of charge on Eurostat website
Demand for data Data extractions from Eurostat website, first half of 2005 size class table (including survivals): 891 among the top 10% legal form table 143 Specific requests Structural Indicators Commission reports, policy documents Various external requests
Demand for data Unfulfilled desires More detailed activity breakdown (NACE 4-digit level) More detailed size class breakdown Regional breakdown
Confidentiality requirements Data on individual enterprises may be declared confidential by Member States. Different rules among Member States concerning… Indicators Number of enterprises usually publishable Employment data often confidential Thresholds Aggregates consisting of less than 3…6 enterprises Dominance 1 or 2 enterprises dominate an aggregate value Percentage of the dominance (75 % to 90 %)
Confidentiality – dataset design 2 multidimensional datasets Country, year, NACE, legal form, indicator Country, year, NACE, employee size class, indicator Example: Spain, 2002, NACE C, 5-9 employees, persons employed in births Value = 549 Confidentiality concerns reflected in the level of detail of the aggregated data NACE activity: mostly 2 or 3 digit level 3 legal forms 5 size classes
Confidentiality - implementation Confidentiality flags used in the data collection Rule flag Too few enterprises A One enterprise dominates the data B Two enterprises dominate the data C Confidential data due to secondary confidentiality D Data is non-confidential Blank
Confidentiality - example Number of newly born enterprises employee size class size class total 1 to 4 5 to 9 10 to 19 20 or more NACE K72 total 344 27 155 42 84 39 721 66 2~A 23 17 16 8 722 53 3~A 12 10 24 7 723 72 4 43 6 13 724 49 5 725 58 31 726 46 19 9
Confidentiality - example Number of persons employed in newly born enterprises (internal database) employee size class size class total 1 to 4 5 to 9 10 to 19 20 or more NACE K72 total 3039 27 352 98 1310 1140 721 632 2~A 50 120~D 260 200 722 537 3~A 37 70~D 320 180 723 558 4 100 44 250 160 724 405 5~D 65 15~A 190 150 725 403 6 45 32 170 726 504 7 55 22 140 280
Confidentiality - example Number of persons employed in newly born enterprises (dissemination database) employee size class size class total 1 to 4 5 to 9 10 to 19 20 or more NACE K72 total 3039 27 352 98 1310 1140 721 632 :~c 50 260 200 722 537 37 320 180 723 558 4 100 44 250 160 724 405 65 190 150 725 403 6 45 32 170 726 504 7 55 22 140 280
Confidentiality - challenges Two tables with partial overlap of dimensions Country, year, NACE, legal form, indicator Country, year, NACE, employee size class, indicator Values may belong to multiple aggregates NACE hierarchy Special activity aggregates (ICT, services) Keeping higher aggregates publishable (“top down”)
Confidentiality – software tools Manual work Good case-by-case judgement Time consuming Risk of error Software tools tested at Eurostat CIF (Confidentiality InterFace) GHMITER Over-protective
Confidentiality – software tools Τ-Argus GHMITER Can partially automate confidentiality treatment Limitations: special aggregates, setting flags, dominance rules
Conclusions Business Demography project at European level has been successful Demand for data and confidentiality requirements can be balanced Software tools: T-Argus most promising Suggestion to Member States to carry out confidentiality treatment themselves. Compliance with confidentiality rules Harmonization with results published at national level