Met a-data Resources in Europe: within NSIs and from Dosis Projects Wilfried Grossmann Department of Statistics and Decision Support Systems University Vienna
Metadata Resources in Europe2 Contents Introduction Contents of Meta-data IT- Structures for Meta-data Processing Meta-data Conclusions
Metadata Resources in Europe3 Introduction Continuing hot topics in the meta-data discussion Content-orientation versus IT- orientation There is a lack of communication between these two groups
Metadata Resources in Europe4 Introduction Meta-data providers versus meta- data users Who provides which type of information for whom?
Metadata Resources in Europe5 Contents of Meta-data What kind of objects should be documented? Basic statistical structures Variables Values Data sets ____________________ Statistical output Statistical Systems Statistical Processing
Metadata Resources in Europe6 Contents of Meta-data Approaches towards meta-data content The template oriented approach The data warehouse approach The process oriented approach
Metadata Resources in Europe7 Contents of Meta-data The template oriented approach Templates defined by a number of working groups For micro data and data sets DDI, Dublin Core For (economic) macrodata OECD, IMF, ECE (Internet)
Metadata Resources in Europe8 Contents of Meta-data The template oriented approach The OECD Template: Concepts and sources Data Collection Data manipulation by national source Data quality Data Transmission International Standards Data Storage and Manipulation by OECD Output preparation and delivery by OECD
Metadata Resources in Europe9 Contents of Meta-data The template oriented approach The IMF Template: Coverage Periodicity Timeliness Quality of disseminated data Integrity of disseminated data Access by the public
Metadata Resources in Europe10 Contents of Meta-data The template oriented approach Although the OECD approach seems more reliable from statistical point of view, IMF is favoured at the moment by international organisations (EUROSTAT)
Metadata Resources in Europe11 Contents of Meta-data The warehouse approach Integration of the data inside the NSIs in a data warehouse Output and dissemination as first step Meta-data are oriented towards the needs of the data warehouse
Metadata Resources in Europe12 Contents of Meta-data The warehouse approach Projects in this direction in many NSI Best documentation: Australian Office Definitional meta-data Procedural meta-data Operational meta-data Systems meta-data Datasets meta-data
Metadata Resources in Europe13 Contents of Meta-data The process oriented approach Combines statistical and IT considerations Statistical data are considered not as final products but as the result of a process chain More detailed consideration of statistical terminology
Metadata Resources in Europe14 Contents of Meta-data The process oriented approach Starting point was the SCB-DOC model (Rosen and Sundgren, 1991) A sequence of templates accompanying the statistical production process Ongoing activities at Statistics Sweden A number of NSIs want to adopt the model
Metadata Resources in Europe15 Contents of Meta-data The process oriented approach The IDARESA model Object oriented representation based on SCB-DOC with emphasis on possible semi-automatic processing
Metadata Resources in Europe16 Contents of Meta-data The process oriented approach The US-Bureau of census model (Gillman, Appel et al. running project): Statistical system defined as an identifiable process.... to produce one or more deliverables
Metadata Resources in Europe17 Contents of Meta-data Summary Process oriented approach seems to be favourable for a number of reasons Two Examples: Classification servers Data Quality
Metadata Resources in Europe18 Contents of Meta-data Summary: Classification server A classification server should Support unified use of terminology inside NSIs or international organisations Support harmonisation between (international) standard classifications and locally defined (adapted) classifications
Metadata Resources in Europe19 Contents of Meta-data Summary: Classification server Requirements for a classification server A data base supporting easy and user friendly manipulation of hierarchy trees A mapping tool supporting the definition of correspondence tables between classifications A management strategy for implementation
Metadata Resources in Europe20 Contents of Meta-data Summary: Classification server Up to now only few successful implementations for partial solutions EUROSTAT (SIMONE-Server) New Zealand,
Metadata Resources in Europe21 Contents of Meta-data Summary: Data Quality Data Quality Criteria for quality of statistics are well known (Relevance, accuracy, timeliness, accessibility, comparability, coherence, completeness) The problem Achieve quality in the production process Document quality by appropriate meta-data
Metadata Resources in Europe22 Contents of Meta-data Summary: Data Quality Experience shows that documentation quality is rather poor as soon as it is separated from the production process Example for an integration project SIDI-approach by ISTAT
Metadata Resources in Europe23 IT Structures for Meta-data Internet and data warehouse offer new opportunities for Meta-data and data repositories Meta-data access and exchange Lead towards a more open policy in data dissemination
Metadata Resources in Europe24 IT Structures for Meta-data Meta-data repositories Approaches towards repositories The thesaurus approach The template oriented approach The Data Warehouse oriented approach
Metadata Resources in Europe25 IT Structures for Meta-data Meta-data repositories Example for a thesaurus oriented approach EUROSTAT servers for concepts and definitions Advantage: available on the Internet Problem: Navigation not so easy
Metadata Resources in Europe26 IT Structures for Meta-data Meta-data repositories Contents –Descriptions (dictionaries) –Semantic (coverage, standard classifications coherence of information) –Administration (responsible persons) –Selection (keywords, search facilities)
Metadata Resources in Europe27 IT Structures for Meta-data Meta-data repositories Example for the template oriented approach StatBase: supporting access to meta-data as well as data and reports Meets quite well the requirements of OECD data template No direct connection between data and meta-data
Metadata Resources in Europe28 IT Structures for Meta-data Meta-data repositories Example for the warehouse oriented approach StatLine(CBS): Based on data access from multidimensional tables (cubes) Accompanying meta-information is only in Dutch Extraction of special meta-data items is not so easy as in StatBase
Metadata Resources in Europe29 IT Structures for Meta-data Meta-data access and exchange Ongoing work in access and exchange New Standards for access and exchange Accessing distributed sources Combination of information
Metadata Resources in Europe30 IT Structures for Meta-data Meta-data access and exchange Actual trends in standardization Traditional standards for data and meta-data exchange like GESMES or CLASET will probably switch to XML-platform. New standards from the Object Management Group (OMG)
Metadata Resources in Europe31 IT Structures for Meta-data Meta-data access and exchange Example MOF (Meta Object Facility) –Extensible Framework for meta-data model definition –Programming interface for storage and access of meta-data –Integration facilities across domains But note: This is a general approach for warehouses not necessarily tied with statistics
Metadata Resources in Europe32 IT Structures for Meta-data Meta-data access and exchange Example for Accessing and processing distributed sources ADDSIA: Accessing and processing distributed sources for analysis purposes Minimum requirements for standardisation in advance Orientation towards statistical problems
Metadata Resources in Europe33 Processing Meta-data Goal Data and meta-data are processed together
Metadata Resources in Europe34 Processing Meta-data Advantages Reduction of documentation effort More consistency in meta-data Requirements Software tools supporting this view Operational models for meta-data
Metadata Resources in Europe35 Processing Meta-data Up to know only prototypes with emphasis on different aspects of processing The planning approach The throughput approach The transformation approach
Metadata Resources in Europe36 Processing Meta-data The planning approach Develop software tools (workbench) for setting up meta-data documentation BRIDGE/IMIM: A desktop for planning surveys and statistical production Meta-data generated in the planning phase are managed by the system No data are processed
Metadata Resources in Europe37 Processing Meta-data The planning approach Improvement and adaptation of meta-data models for new tasks like quality and use of administrative sources SIDI (Statistics Italy) Integration of quality in the statistical production process Standardization of the production process
Metadata Resources in Europe38 Processing Meta-data The throughput approach Use as much meta-data as possible from OldMeta-data to obtain NewMeta-data CBS (ongoing work): Use BLAISE meta-data as input Produce StatLine meta-data as output
Metadata Resources in Europe39 Processing Meta-data The transformation approach Define meta-data algorithms for all types of data algorithms Throughput meta-data Modified meta-data New meta-data Meta-data summarization
Metadata Resources in Europe40 Processing Meta-data The transformation approach IDARESA project Meta-data algorithms for elementary data base operations ISMIS Identification of added value in meta-data (new meta-data) Pursuit of the production process inside EUROSTAT
Metadata Resources in Europe41 Processing Meta-data The transformation approach
Metadata Resources in Europe42 Conclusions Is there progress in meta-data research and development? Yes, but rather slow because There is a lack of co-ordination in research (Probably improved by a forthcoming meta-data working group) There is an information gap between meta- data research groups and NSIs NSIs seem to prefer their own solutions