Metadata use in the Statistical Value Chain UNECE-Eurostat-OECD Meeting on Management of Statistical Information Systems MSIS 2008 Luxembourg, 7-9 April 2008 Georges Pongas Adam Wroński 07-Apr-08
Content Introduction Operational Characteristics of Metadata Technical Characteristics of the Metadata Metadata types needed in the various steps of the SVC (statistical value chain) Conclusion 7-Apr-08 Metadata use in the Statistical Value Chain
Seven SVC steps Expression of the need Data collection design Specification and development of the tools needed for the data collection Data collection Data editing and imputation Data processing Data dissemination 7-Apr-08 Metadata use in the Statistical Value Chain
Basics Leave out the statistical notions from the technical (implementation oriented) characteristics of the metadata. Design metadata technical characteristics so the same metadata structures can cover both statistical and non-statistical requirements 7-Apr-08 Metadata use in the Statistical Value Chain
Operational Characteristics of Metadata Static nature Long production process Located in various places (resources) Critical link with statistical data depends on statistical data changes Strong coupling of structural metadata with the statistical data Large number of metadata entities needed in SVC 7-Apr-08 Metadata use in the Statistical Value Chain
Technical Characteristics of Metadata Terminology often complex Technical characteristics and statistical notions frequently mixed 7-Apr-08 Metadata use in the Statistical Value Chain
Statistical Notions and Metadata Examples Classification, keyword list and set of information related to the SDDS standard Correspondence table between two classifications & table containing the links (access rights) between the user names and the statistical datasets of a database The only difference is the context, i.e., the user interface Thus develop separately: a common set of functionalities and the interface layer for an application 7-Apr-08 Metadata use in the Statistical Value Chain
Metadata Technical Structure Categories Three categories proposed: Simple Metadata Entities (SME) Binary Relationships (BR) Clustered Metadata Entities (CME) 7-Apr-08 Metadata use in the Statistical Value Chain
Simple Metadata Entities (SME) simple key variable number of attributes appropriate for vertical type storage Example 1 Example 2 Entity NACE user name Entity element 2122 gpongas Attribute name English label phone no Attribute value “Mining” 430139 7-Apr-08 Metadata use in the Statistical Value Chain
Examples of SMEs SDDS documents Dublin Core Classifications Keywords Administrative entities Programs Publications 7-Apr-08 Metadata use in the Statistical Value Chain
Binary Relationships (BR) Two types: Between two different entities correspondence tables, access rights definitions Inside the same entity thesauri, classification hierarchies, links between regulations, statistical documents Example Relationship id UN thesaurus First entity id EUROPE First entity role Parent Second entity id FR Second entity role Child Reason of link Broader term 7-Apr-08 Metadata use in the Statistical Value Chain
Clustered Metadata Entities (CME) Complex entities characterised by variable keys’ cardinality and references to other entities of type CME, SME and BR Description techniques XML schema is appropriate 7-Apr-08 Metadata use in the Statistical Value Chain
Examples SDMX, Gesmes definitions Dataset definitions Annotations to dataset cells Confidentiality definitions linked to datasets 7-Apr-08 Metadata use in the Statistical Value Chain
Metadata in the various steps of the SVC 7-Apr-08 Metadata use in the Statistical Value Chain
Collection Metadata Mostly of type BR and SME Among others they contain: source agencies data files descriptions codelists validation rules linked to initial data checks 7-Apr-08 Metadata use in the Statistical Value Chain
Editing, Imputation and Processing Metadata More complex than the collection metadata (more CME entities needed) Among others they contain: Dataset definitions Formulas, programs, scripts Conditional and ordinary annotations Dissemination feeding information 7-Apr-08 Metadata use in the Statistical Value Chain
Dissemination Metadata The most complex metadata types are located here. They contain almost all the previously described metadata plus their own Reasons for this complexity Dissemination contains all the statistical domains It must cover all user types It has tight delivery deadlines It must offer navigation presentation and extraction facilities of great friendliness 7-Apr-08 Metadata use in the Statistical Value Chain
Among others dissemination metadata contain Sitemap description Release calendars Dataset links to publication tables Questionnaires definitions linked to datasets Units of measurement Ready made queries 7-Apr-08 Metadata use in the Statistical Value Chain
Conclusion Separation of statistical notions (context) and structure (functionality) of metadata gives minimisation of structural metadata types consequently it makes easier to build and implement a complex statistical (metadata and data) system 7-Apr-08 Metadata use in the Statistical Value Chain