ESTP course on Statistical Metadata – Introductory course Statistics Netherlands, The Hague,18-19 February 2013
at Statistics Netherlands Implementation of the Data Service Centre at Statistics Netherlands Harry Goossens Programm Manager DSC 2
Agenda Why, What, How ? The CBS Metadata model The CBS Business Architectuur Steady States Implementation Daily practice Demo 3
Data Service Centre: What is it ? Fundamental corner stone of the CBS Business Architecture Central ‘vault’ with Steady States, linking: statistical data (facts & figures) conceptual metadata (description) technical metadata (user’s guide) documentation Implementation of the Dutch metadata model
DSC: The concept No data without metadata Based upon dedicated metadata model Strict distinction between the data that are actually processed and the metadata that describe the definitions, the quality and the process activities Steady states are explicitly designed for re-use. The metadata (of steady states) are generally accessible and are standardised as much as possible
Generic services: Catalogue: searching & finding Metadata management DSC: What offers it ? Generic services: Catalogue: searching & finding Metadata management Centralised data distribution Authorisation management Automatic process interfacing Archiving of statistical datasets Version management
DSC: Conceptual metadata Metadata that describes the data in a generic, non specific way, in all the various phases of the statistical proces: Input - description of received data; - terminology of the supplier (internal and external) Processing - description of data produced in various statistical processes; - internal (and international) standards / guidlines Output - description of publishable output data - definition of (sub)populations, outputvariables, object types content description,
DSC Metamodel (simplified) Variable Data Design 1 : 1 1 : n Technical Metadata file (XML) Context Variable 1 : n Documentation (Word, PDF,….) Datasets (ASCII CSV, Fixed) Codelists (XML) 1 : 1
Mission, Policy and Strategic Objectives CBS Business Architecture (simplified) Mission, Policy and Strategic Objectives Design Process Design Data External users with information needs DSC Metadata Catalogue Metadata Management External Suppliers of data Collect Data Process Data Disseminate Data DSC Data storage Data (steady states)
What are steady states ? A steady state is a data set together with information for its correct interpretation. Rectangular - Rows represent units (micro) or classes of units (macro) - Columns represent variables Heading: population, time Dataset design (vary time): in design phase Dataset design is like a template of a table: only borders and heading 1 Dataset design, n Datasets
Why steady states ? Reduce storage: Secure the statistical proces: Store once Re-use many times Secure the statistical proces: Each steady state is a guaranteed fall back point Improve consistency: Every following process uses the same dataset ‘Single point of truth’ principle Improve flexibility Enables independent, generic proces design
Implementation Micro model simplyfied for practical use Skip / Combine objects Reduce attributes Why Documentum ? Completely object orientated, enables to implement DSC metamodel Largly configurable (user interface, authorisation, etc.) Large flexibillity Proven technology TaskSpace
DSC system - Functionalities Taylor made user interface (TaskSpace) Maintenance of meta objects Constraints mostly in interface, not in DB/repository Specific flexible search engine Various entrances, easy to extend Import & Export of datasets Modified according to model Interface for bulk-import of metadata Based on standard XML schema Conceptual and technical meta
Daily practice - Challenges Available metadata quality often poor Great variety, each statistic own way of describing Often tool based (SPSS), more technical then logical Definition = question from survey Minimum mapping with DSC model No real urge Although rated IMPORTANT, low priority No clear ownership, resonsibility not felt Extra work without direct gain (burden)
Daily practice - Road map Explaining the concept & metadata model Requirements, guidelines Stocktaking What meta is available ? How extensive or poor ? What quality, actuality ? Re-usability Mapping on the model (Re)Design Datadesigns Matching attributes
Daily practice - Chances Look for added value Problems ? Wishes ? (Long) Wanted improvements ? Re-usability Define pilot Quick hands on experience, short cycle Good estimation time & resources ‘Proof of the design pudding’
Daily practice – At work (1) Excel template, Nesstar Publisher Porch / Ballot Visual check on guidelines Automated check on completeness, inconsistencies, relations VARIABLE – CONTEXT VARIABLE etc. Advise for corrections/improvemnts: by owner (statistics) ! Define and set authorisations Groups for import, export, metadata maintenance
Daily practice – At work (2) Metadata in DSC-system Define datadesign Import TMF (xml) Bulkimport Variables (xml) Import dataset (s) Search Export data & metadata