at Statistics Netherlands Data Service Centrum Implementation of the Data Service Centre at Statistics Netherlands Harry Goossens Programm Manager DSC
Agenda Why, What, How ? The CBS Metadata model The CBS Business Architectuur Steady States Implementation Daily practice Demo
Data Service Centre: What is it ? Fundamental corner stone of the CBS Business Architecture Central ‘vault’ with Steady States, linking: statistical data (facts & figures) conceptual metadata (description) technical metadata (user’s guide) documentation Implementation of the Dutch metadata model
DSC: The concept No data without metadata Based upon dedicated metadata model Strict distinction between the data that are actually processed and the metadata that describe the definitions, the quality and the process activities Steady states are explicitly designed for re-use. The metadata (of steady states) are generally accessible and are standardised as much as possible
DSC: What offers it ? Generic services: Catalogue: searching & finding Metadata management Centralised data distribution Authorisation management Automatic process interfacing Archiving of statistical datasets Version management
DSC: Conceptual metadata Metadata that describes the data in a generic, non specific way, in all the various phases of the statistical proces: Input description of received data; terminology of the supplier (internal and external) Processing description of data produced from various statistical processes; internal (and international) standards/guidelines Output description of publishable outputdata; definition of (sub)populations, outputvariables, object types content description,
CBS Metadata model - micro
CBS Metadata model - macro
DSC Metamodel - simplified Variabele Dataontwerp 1 : n 1 : 1 Technische Metafile (XML) Context Variabele 1 : n Documentatie (Word, PDF,….) Datasets (ASCII CSV, Fixed)
CBS Business Architecture Strategy Design DSC – Metadata Catalogue Chain management Statistics Production Steady States DSC - Data Storage
Steady States
What are steady states ? A steady state is a data set together with information for its correct interpretation. Rectangular Rows represent units (micro) or classes of units (macro) Columns represent variables Heading: population, time Dataset design (vary time): in design phase Dataset design is like a template of a table: only borders and heading 1 Dataset design, n Datasets
Why steady states ? Reduce storage: Store once Re-use many times Secure the statistical proces: Each steady state is a guaranteed fall back point Improve consistency: Every following process uses the same dataset Improve flexibility Enables independent, generic proces design
Implementation Micro model simplyfied for practical use Skip / Combine objects Reduce attributes Why Documentum ? Completely object orientated, enables to implement DSC metamodel Largly configurable (user interface, authorisation, etc.) Large flexibillity Proven technology TaskSpace
DSC system - Functionalities Taylor made user interface (TaskSpace) maintenance of meta objects constraints mostly in interface, not in DB/repository Specific flexible search engine various entrances, easy to extend Import & Export of datasets modified according to model Interface for bulk-import of metadata based on standard XML schema conceptual and technical meta
Daily practice - Challenges Available metadata quality often poor Great variety, each statistic own way of describing Often tool based (SPSS), more technical then logical Definition = question from survey Minimum mapping with DSC model No real urge Although rated IMPORTANT, low priority No clear ownership, resonsibility not felt Extra work without direct gain (burden)
Daily practice - Road map Explaining the concept & metadata model Requirements, guidelines Stocktaking What meta is available ? How extensive or poor ? What quality, actuality ? Re-usability Mapping on the model (Re)Design Datadesigns Matching attributes
Daily practice - Chances Look for added value Problems ? Wishes ? (Long) Wanted improvements ? Re-usability Define pilot Quick hands on experience, short cycle Good estimation time & resources ‘Proof of the design pudding’
Daily practice – At work Excel template Porch / Ballot Visual check on guidelines Automated check on completeness, inconsistencies, relations VARIABLE – CONTEXT VARIABLE Corrections Bulk import TMF Meta XML
Screenshot
Screenshot: Metadata attributes of a Data design
Screenshot: and this is how we stor the statistical datasets