1 Enhancing data quality by using harmonised structural metadata within the European Statistical System A. Götzfried Head of Unit B6 Eurostat
2 2 The starting point The Commission Communication 404/2009 asks that production methods of European statistics need to be improved in order to enhance the efficiency within the European Statistical System (ESS); that the ‘stove-pipe’ statistical production systems should be replaced by more integrated production processes across statistical domains or statistical organisations (i.e. horizontal and vertical integration). Metadata are accompanying the statistical production process from end-to-end; we therefore see harmonised metadata as one of the main enablers for progressing towards the aim of the abovementioned Communication.
3 3 Harmonisation of structural metadata In the centre of interest: The harmonisation of structural metadata “Structural metadata are needed to identify, use and process data matrices and data cubes, e.g. names of columns or dimensions of statistical cubes. Structural metadata must be associated with the statistical data, otherwise it becomes impossible to identify, retrieve and navigate the data. “ We will present our work on the harmonisation of different types of structural metadata such as code lists, statistical variables and structural metadata linked to data tables. This work should lead to a considerable improvement of the data and metadata produced and disseminated by Eurostat and the ESS, in line with the principles of the European Statistics Code of Practice.
4 4 Harmonising code lists Code lists are used as dimensions and attributes in data structures (data messages) all along the statistical business process; an example: Population statistics with the dimensions age, geographical area, sex and time (including the respective codes used)
5 5 Harmonising code lists Harmonising code lists means more in detail: to produce lists of codes including the underlying statistical concepts which can be broadly used across statistical domains; to harmonise these lists in applying a number of basic principles, such as using as far as possible official classifications, inserting aggregates, etc. This harmonisation will improve the quality of the data produced as codes and concepts are defined uniquely and as data and metadata exchange will be facilitated.
6 6 Harmonising code lists Harmonising code lists means more in detail: an example for such as code list (related to age classes) is shown in the paper submitted; based on existing codes Eurostat produces and releases more and more of those lists with around 400 lists to be disseminated at the end; these lists also need to be maintained over time; the lists will be used in newly created data structure definitions (based on the SDMX technical and statistical standards) and be included in IT production systems; Some of those lists should also be upgraded into the SDMX statistical standards;
7 7 Harmonising statistical variables Harmonising statistical variables means more in detail: to draw up an inventory of the statistical variables used within the ESS in dissemination (around 1300 of those variables were compiled); to add standard characteristics to these statistical variables such as definitions, statistical domains, units of measure, etc.; to use this inventory for the harmonisation of the statistical variables in cases where unnecessary and unjustified differences exist between them; to improve the statistical variables themselves in order to make them better accessible and understandable for producers and users.
8 8 Harmonising statistical variables This activity is one more activity providing the pre-condition for a better integration of statistical business processes as promoted by the abovementioned Commission Communication 404/2009. However: Alignments of the statistical variables requires a change of the respective data sets; this can be seen more as a medium term task.
9 9 Harmonising structural metadata linked to data tables Structural metadata linked to data tables are table headings, titles, subtitles and short descriptions and similar metadata; harmonising this structural metadata means more in detail: to write guidelines for this structural metadata linked to the data tables (= around 4000 multi-dimensional tables and around 1000 so-called pre-defined tables are in dissemination); to improve and harmonise the headings of the data tables disseminated in applying those guidelines; to improve and harmonise the short descriptions explaining the main contents of the pre-defined tables;
10 Harmonising structural metadata linked to data tables Example for an old and new table heading: Old title of a multi-dimensional table: 'Relative incidence rate of accidental injuries at work by severity, permanency of the job, length of service in the enterprise and economic activity of the employer (EU mean rate = 100 for each severity)’ Revised title of a multi-dimensional table: –Title: 'Incidence rate of accidental injuries at work by severity, job status and NACE' –Subtitle:'Index EU=100‘
11 Harmonising structural metadata linked to data tables An improvement of the structural metadata linked to the data tables should increase the accessibility and clarity of our disseminated data considerably. We could also envisage in a second step that these guidelines are improved further and that they get advanced towards general guidelines for this type of metadata for the ESS.
12 All over Many harmonisation and improvement efforts related to structural metadata are ongoing at Eurostat and within the ESS. This work should improve the quality of our data and metadata disseminated, mainly in terms of consistency, accessibility and clarity of the data sets produced. The harmonisation work also considerably contributes to the improvement and integration of the statistical business processes as defined in the Commission Communication 404/2009.