Metadata for a Statistical Data Warehouse Lars-Göran Lundell Statistics Sweden Luxembourg 22 September 2011
Metadata and Data Metadata are data about data Data are qualitative or quantitative information collected through observation Sources: (derived from) Wikipedia, ISO, METIS
Statistical Metadata and Data Statistical metadata are data about statistical data Statistical data are data from a survey or administrative source used to produce statistics Source: METIS
Metadata for a Data Warehouse Technical metadata Structural information How to physically find and use logical data Process descriptions How data flows in the DW Authentication rules Who may do what? Business metadata Definitions and descriptions Help the end-user interpret and evaluate the data Sources: Kimball, Inmon, others
Metadata for statistics production Structural metadata Act as identifiers and descriptors of the data Identify, use, and process data matrixes and data cubes Names of databases, columns, dimensions Reference metadata Describe the contents and the quality of the data Include conceptual, methodological and quality metadata Source: METIS
Metadata categories A metadata item is either Structural (technical) or reference (business) Other mutually exclusive categories include Active passive Structured free-form Standardised non-standard Centralised local
The Statistical Data Warehouse A central “statistical data” store for managing all available data of interest, regardless of source, enabling the NSI to: perform reporting; execute analysis; produce the necessary information; (re)use data to create new data/new outputs. Data Warehouse Statistics production Statistical Data Warehouse
Metadata for a Statistical Data Warehouse Emphasis on Active metadata Structured metadata Structural metadata And Process metadata Describe expected or actual outcome of one or more processes using evaluable and operational metrics Quality metadata Source quality, methods used, usability/restrictions Tracing information Which surveys/registers contributed to a specific output? Plus metadata common to all statistics production Reference metadata
Metadata standards in a Statistical Data Warehouse What should be standardised? Contents, formats, repository, software Which level of standards should be used? International/Eurostat, National/NSI, DW internal How should a standard be interpreted? Complete adherence, compatible How strict adherence should be required? Mandatory, recommended Should some components be prioritised? Big bang, evolution
Metadata Quality The more data, the more need for metadata The Statistical Data Warehouse contains lots of data, making it dependent on its metadata Correct, high-quality metadata are vital for its use and governance No metadata useless data Bad metadata misused data Good metadata useful data
Metadata for a Statistical Data Warehouse – what’s next? More detailed descriptions Standards Collection and usage Storage... more