Max Booleman Statistics Netherlands Metadata models Max Booleman Statistics Netherlands
Content Introduction Functions of metadata Kinds of metadata Why do we need a metadatamodel? Choosing a model Brief overview different models Communities/platforms Lessons learned
Introduction The ‘old’ way The ‘new’ way Special dedicated surveys Combined complex designs of registrations and samples (minimize administrative burden) Stove-pipe statistics Common input- and outputbases (sharing data) Knowledge in the head of employees Knowledge in documents (metadata) Tailor made tables Common structure
Functions of metadata (1) Input data + transformation = output data Describing data Describing process Describing quality (data and process)
Functions of metadata (2) Information for users, producers inside and outside the office What does it mean? (Automatic) Rules for producers inside the office Ex ante vs ex post: What should you do? What did you do?
Kinds of metadata Related to the functions: Conceptual: describing text, relating elements Process: methods, programs, sequence Quality: norms and indicators (data and process) Technical: the hardware
Why a model? We want: Re-use of definitions, classifications, … Re-use of processes, rules, methods Re-use of data A model facilitates the conceptual level: Structure (coherence) Relations between (data consistency) Meaning of (textual consistency) Processes: metadata driven (machine readable)
Properties of a model A good model should: Meet the user needs Be compact Have a coherent set of metadata object types Model: metadata of the metadata There is no universal model, like there is no universal car.
Example What do I need to understand ’21’? Turnover Costs Profit Trade, Enterprises, 2001 Turnover *1000 euro Costs Profit Size class 1 9 Size class 2 12 total 21 What should be the metadata of ’21’? What do I need to understand ’21’?
Example (cont. 2) What do I need to understand ’Turnover’? Turnover Trade, Enterprises, 2001 Turnover *1000 euro Costs Profit Size class 1 9 Size class 2 12 total 21 What should be the properties of the variable ‘turnover’ (the metadata of ‘Turnover’)? What do I need to understand ’Turnover’?
Example (cont. 3) Modelproperties Example name Turnover description Earnings of an enterprise statistical unit Enterprise period Year relation Turnover=costs + profit measurement unit Euro type of aggregation Sum
Remarks (1) Part of the properties? Period ‘Year’/ Name ‘Turnover’ Measurement unit ‘euro’ Versioning (lifecycle) Homonyms/Synonyms
Remarks (2) A model is like decomposition of sentences: The total turnover of enterprises in The Netherlands was in 2008 equal to … billion euro. The total turnover of the enterprise Shell in The Netherlands was in 2008 equal to … euro. A Population of Statistical units at or during a ‘time’ will be described by Variables
Remarks (3) Definition of Age, Turnover etc.: in principle unit independent but formulated user friendly. The concept of ‘age’ is the same for electrons, cars, buildings and human beings.
Remarks (4) Relation between statistical units: A student is a kind of a person: inherit properties of person additional (useful) own properties A household contains persons An enterprise contains establishments
Remarks (5) Relation between populations: Income of all persons of one household = Income of the household? Income of all persons = Income of all households? Turnover of all establishments = Turnover of all enterprises? Consolidation?
Julius Ceasar Columbus BC AC Present statistics forecast
Julius Ceasar Columbus BC AC Present statistics forecast 1-1-2006 31-1-2006 Present statistics forecast
Julius Ceasar Columbus BC AC Present statistics forecast Dutch nationality Julius Ceasar Columbus BC AC 1-1-2006 31-1-2006 Present statistics forecast
Julius Ceasar Columbus BC AC Present statistics forecast Dutch nationality Julius Ceasar Columbus Inhabitant of The Netherlands BC AC 1-1-2006 31-1-2006 Present statistics forecast
Remarks A population is a collection of statistical units limited in time, area, ….. Could ‘student’ be a statistical unit? ‘Student’ is a kind of ‘person’ so ‘Students’ is formally a subpopulation of ‘persons’ Should we distinguish 5 or 1000 kinds of statistical units?
‘Choosing’ a model Own wishes Checking existing models Logical, coherent description of input, output (files) Checking existing models Compile own model (compact!) Map to/from existing models Plan-Do-Check-Act
Overview (1) XBRL: exchange of micro data (http://www.xbrl.org/Home/) IMF (GDDS, SDDS) http://dsbb.imf.org/Applications/web/gdds/gddshome/ SDMX: exchange of statistical data Push Pull http://www.sdmx.org/ Neuchâtel group (classifications, variables) http://www1.unece.org/stat/platform/display/metis/Part+B+-+Metadata+Concepts%2C+Standards%2C+Models+and+Registries
Overview (2) ISO 11179 (http://www.iso.org/iso/iso_catalogue/catalogue_tc/catalogue_detail.htm?csnumber=35348) DDI 3.0 (http://www.ddialliance.org/ddi3/index.html#ddi1) Dublin Core (http://dublincore.org/)
Communities/platforms/conferences Metanet (http://www.epros.ed.ac.uk/metanet/index.html) Metis (http://www.unece.org/stats/archive/docs.date.e.htm) http://unece.org/stats/cmf/introduction.html Working group Eurostat (http://circa.europa.eu/Public/irc/dsis/Home/main) Q2008/Q2006/Q2004/Q2001 (http://www.statistics.gov.uk/q2006 and http://q2004.destatis.de/) SDMX XBRL (http://www.xbrl.org/Home/) CODACMOS (http://www.codacmos.eu.org/)
Lessons Learned (1) The ultimate model does not exist (yet?) Mapping from and to models Start with your own wishes Start with a standard model and adjust 80%-20% rule: don’t try to do everything at once Store only what is in use
Lessons Learned (2) Think broad, start small Homonyms and synonyms Survival of the fitting: using standards should be efficient Adjusting standards often (very) expensive Homonyms and synonyms Formal description is difficult and takes time and effort
Questions?