Download presentation
Presentation is loading. Please wait.
1
Petr Elias Czech Statistical Office
WHAT IS METADATA? Petr Elias Czech Statistical Office
2
The main goal of statistics
??? ...to produce statistical data, interpret them and make them available to users.
3
Example of data to be published
Average household size Netherlands Published: January 2012 2,7 Survey: Census 2011 Preliminary results
4
Average household size
Statistical metadata Data about statistical data (OECD definition) Information about data processes (of producing and using data) tools involved (UNECE definition) 2,7 Published: January 2012 Average household size Netherlands Survey: Census 2011 Preliminary results
5
Users of metadata Data providers and interviewers Statisticians
questionnaires Statisticians data processing and analyses End-users search for and understanding of data
6
Metadata coverage (GSBPM*)
* GSBPM = Generic Statistical Business Process Model Source:
7
Categories of metadata
Structural metadata Reference metadata = Identification and description of data Description of the content and the quality of data
8
= Names of columns / dimensions
Structural metadata Must be associated with data = Names of columns / dimensions Necessary for: identification, retrieval and navigation through the data understanding the data from matrixes and data cubes
9
Structural metadata – example
Nights spent by non-EU residents inside EU – per population 2005 2006 2007 2008 2009 2010 EU - 27 427 459 458 441 419 473 Austria 1109 1175 1162 1186 1121 1223 Estonia 287 359 356 374 483 Finland 319 379 417 446 401 422 Germany 220 249 246 251 234 268 Hungary 266 270 253 248 213 243 Italy 726 777 772 734 693 755 Luxembourg 472 497 533 519 447 : Netherlands 310 327 329 289 316 Slovakia 134 152 151 135 103 117 Slovenia 481 529 585 653 579 632
10
Reference metadata „Documentation“ covering: Concepts Methodology
e.g. definitions, practical implementation of concepts Methodology e.g. sampling, collection methods, editing processes Quality e.g. timeliness, accuracy
11
Reference metadata – example
ESMS reference metadata structure used by Eurostat (for Census) Contact organisation organisation unit name person function mail adress address phone number fax number Metadata update last certified last posted last update Statistical presentation data description classification system coverage – sector statistical concepts and definitions statistical unit statistical population reference area coverage – time base period Unit of measure Reference period Institutional mandate legal acts and other agreements data sharing Confidentiality policy data treatment Release policy release calendar release calendar access release policy – user access Frequency of dissemination Dissemination format news release publications online database microdata access other Accessibility of documentation documentation on methodology documentation on quality management Quality management quality assurance assessment Relevance user needs user satisfaction completeness Accuracy overall accuracy sampling error non-sampling error Timeliness and punctuality timeliness punctuality Comparability geographical over time Coherence cross domain internal Cost and burden Data revision practice Statistical processing source data frequency of data collection data collection data validation data compilation adjustment Comment
12
Standardisation of metadata
Exercise 1 Team A Why standardisation? Team B Who develops and implements standards? Team C What can be standardised?
13
Standardisation of metadata – WHY?
Common vocabulary Comparability Data exchange – compatibility Reduction of costs
14
Standardisation of metadata – WHO?
Major players International organisations World: UNECE, OECD, IMF, WCO, WHO, World Bank, BIS... EU: European Commission (Eurostat), ECB... National statistical institutes participation in international standardisation projects
15
Standardisation of metadata – WHAT?
Content code lists & classifications (Neuchâtel model, SDMX) variables Technology file structure & format (XML) editting applications (Metadata handler) transmission standards (SDMX)
16
Statistical unit / person
Common vocabulary Population Survey sample Statistical unit / person ... Statistical measure Total Index Median ... Variables Economic activity Marital status ... Measurement unit Piece % Euro ... Classifications NACE Rev.2 ... Code lists Marital status value
17
Common vocabulary Terminology (1/2)
Population: Population is the total membership or population or "universe" of a defined class of people, objects or events. Target population (= scope of the survey): A target population is the population outlined in the survey - objects about which information is to be sought. Survey population (= coverage of the survey): A survey population is the population from which information can be obtained in the survey. Survey sample: A sample is a subset of a population where elements are selected based on a randomised process with a known probability of selection. Statistical unit: An object of statistical survey and the bearer of statistical characteristics. The statistical unit is the basic unit of statistical observation within a statistical survey. Observation unit: Observation units are those entities on which information is received and statistics are compiled. (e.g. establishment, person) Reporting unit: Reporting units are units that supplie the data for a given survey instance. (e.g. enterprise, person) Analytical unit: Analytical units represent real or artificially constructed units, for which statistics are compiled. (e.g. corporation, person)
18
Common vocabulary Terminology (2/2)
Variable: A variable is a characteristic of a unit being observed that may assume more than one of a set of values to which a numerical measure or a category from a classification can be assigned (e.g. income, age, weight, etc. and "occupation", "industry", "disease”). Measurement unit: A measurement unit has a type (e.g. currency: Euro, …) and provides the level of detail (e.g. Euro, 1000 Euro) for the value of the variable. Classification: A classification is a set of discrete, exhaustive and mutually exclusive observations, which can be assigned to one or more variables to be measured in the collation and/or presentation of data. Code list: A code list is a predefined list from which some statistical concepts (coded concepts) take their values. Statistical measure: A summary (means, mode, total, index etc.) of the individual quantitative variable values for the statistical units in a specific group (study domains).
19
Common vocabulary – Neuchâtel model
Developed by Neuchâtel Group RuN Software Werkstatt
20
Common vocabulary – Neuchâtel model
Purpose to define common language & common perception of the structure of classifications and links among them
21
Common vocabulary – Neuchâtel model
Classification Family Classification Item Classification Level Classification Version Classification Classification Variant Correspondence Table Correspondence Item Classification Index Classif. Index Entry Case Law
22
Common vocabulary – Neuchâtel model
Terminology Classification family: A classification family comprises a number of classifications, which are related from a certain point of view. Classification: Classification describes the ensemble of one or several consecutive classification versions. It is a "name" which serves as an umbrella for the classification version(s). Classification version: A classification version is a list of mutually exclusive categories representing the version-specific values of the classification variable. A classification version has a certain normative status and is valid for a given period of time. Classification level: A classification structure (classification version or classification variant) is composed of one or several levels. In a hierarchical classification the items of each level but the highest are aggregated to the nearest higher level. A linear classification has only one level. Classification variant: A classification variant is based on a classification version. In a variant, the categories of the classification version are split, aggregated or regrouped to provide additions or alternatives to the standard order and structure of the base version. Correspondence table: A correspondence table expresses the relationship between different versions or variants of the same classification or between versions or variants of different classifications. Classification index: A classification index is an ordered list (alphabetical, in code order etc.) of classification index entries. A classification index relates to one particular classification version or variant. Case law: Case law is an agreed assignment of a classification items to a phenomena where it is not easy for users to classify. The aim is to have the standardised explanation of classifications.
23
Common vocabulary – SDMX
24
Common vocabulary – SDMX
Standardisation of Metadata common vocabulary Cross-domain concepts Cross-domain code lists Data structure definitions (structural metadata) Metadata structure definitions (reference metadata) File format (XML) Tools More information –
25
Any questions?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.