SDMX in the Vietnam Ministry of Planning and Investment - A Data Model to Manage Metadata and Data ETV2 Component 5 – Facilitating better decision-making in the Ministry of Planning and Investment EUROPEAN TECHNICAL ASSISTANCE PROGRAMME FOR VIETNAM (ETV2) Ministry of Planning and Investment, Ministry of Finance and Ministry of Science and Technology in partnership with the European Commission VNM/AIDCO/2002/0589
ETV2 Component 5 Goals Assist Ministry of Planning and Investment (MPI) to improve monitoring, analysis and decision-making capability Focus on access to and quality of MPI information Major data-related Issues Diverse data from multiple sources from provinces, other ministries, businesses, etc inconsistent formats, definitions overlapping data indifferent areas of MPI No facilities to share and reuse data poor data and metadata management no central storage or registration of data
We are evaluating an SDMX-based solution To hold and manage the metadata mostly structural metadata in SDMX terms but reference metadata will be added To link the data to the metadata To provide an environment where the data can be managed in the context of the metadata for capture and storage for searching and browsing for retrieval and access
Why SDMX? This does not seem to be its home territory data and metadata exchange the data does come from lots of other organisations but this is not the focus of the current work MPI has a pool of unmanaged, mission-critical data with very little management of structural metadata and with virtually no reference metadata although plenty of need for it
Why SDMX? The SDMX Conceptual Model is essentially about linking metadata to data this model provides a framework for sharing and re-using structural metadata a context within which data can be managed and used, via the structural metadata a context within which reference metadata (eg quality information) can be introduced and managed SDMX also provides basic tools to support use of this Conceptual Model It is a considerable challenge to build these features into most other approaches you have to devise and support your own SDMX-like model
The MPI Data Regular tables (and other material) supposed to come regularly from the various source organisations mostly do not come from all in timely fashion little control over how they come usually , often paper! may come as different versions at different times! quality and version status is an issue MPI works in Excel data that comes electronically generally comes in Excel where metadata exists it exists as Excel templates mostly stored on local machines hard to share within MPI departments, impossible to share across them much of the data has security management requirements
SDMX from the Conceptual Model Perspective SDMX is usually described from a data exchange perspective the terminology is a bit abstruse the UML makes a developing a detailed understanding a challenge I like to look at SDMX as a Conceptual Model and to relate all the SDMX jargon to important concepts in the conceptual model I have a few slides to explain the model and to show how we might use it at MPI still work in progress
Hierarchical Code Set (Classifications) Concept Scheme Concept Code List Data Structure Definition Metadata Structure Definition Data Flow Metadata Flow Metadata Set Data Set Data Provider Provision Agreement Category Scheme Category Data Provisioning Metadata Reference Metadata Data Structural Metadata Structural Metadata contains information used to structure data and metadata. This covers Concept and Category schemes, Code Lists and Hierarchical Classifications, and Structure Definitions for Data and Metadata Sets. The basic structural metadata components are the Item Scheme and the Structure Definition. Reference Metadata is non-structural metadata that gives more information about an object to make its interpretation more meaningful. Quality, methodological, and conceptual information are examples of Reference Metadata. SDMX allows Reference Metadata to be published, shared, and reused. Potentially many data sets from many providers conforming to structure of data flow Provision Agreements are agreements by providers to deliver data to a schedule according to a flow structure Possibly multiple category schemes to categorise flows and allow searching and indexing of data and metadata flows Structures are built from Concepts and associated Code Lists that define the valid content Organises simple code lists into hierarchies. Concepts are organised into Concept Schemes Data and Metadata Flows are linked to a structure that defines the format of the corresponding Data Sets and Metadata Sets The observed phenomena at specific points as identified by the values of concepts comprising the key of the observations. Codes and their names and descriptions. The major SDMX artefacts Flows – the heart of SDMX
The SDMX top-level model
This is a description of an data or metadata flow – an abstracted data or metadata set that will potentially occur for many periods and from many providers (eg a regular table received by MPI from various sources)
The SDMX top-level model This is a description of an data or metadata flow – an abstracted data or metadata set that will potentially occur for many periods and from many providers (eg a regular table received by MPI from various sources) This is an instance data or metadata set from a particular provider at a particular time, eg, a particular table from Ninh Binh province, for a particular period
The SDMX top-level model This is a description of an data or metadata flow – an abstracted data or metadata set that will potentially occur for many periods and from many providers (eg a regular table received by MPI from various sources) This is an instance data or metadata set from a particular provider at a particular time, eg, a particular table from Ninh Binh province, for a particular period Provision Agreements indicate what Providers will provide what subset, when, how often, and how
The SDMX top-level model This is a description of an data or metadata flow – an abstracted data or metadata set that will potentially occur for many periods and from many providers (eg a regular table received by MPI from various sources) This is an instance data or metadata set from a particular provider at a particular time, eg, a particular table from Ninh Binh province, for a particular period Provision Agreements indicate what Providers will provide what subset, when, how often, and how This identifies the Data Providers, giving indicative and contact information and linking to Provision Agreements and actual data and metadata sets
The SDMX top-level model This describes the structure of the data or metadata flow – all the metadata needed to request and understand an instance of the flow (an actual data or metadata set). Links to all other structural metadata. This categorises all the defined data and metadata flows, providing a structuring framework and a basis for searching. Links to other structural metadata.
The SDMX top-level model
SDMX at MPI What we envisage is an SDMX Registry/Repository code sets and classifications environment for standardising and harmonising Data Structure Definitions for all the regular data sets received by MPI the Data Flows Categorisation schemes to index the Data Flows a data storage environment to hold the Data Sets initially probably a simple file store possible a database store possibly a star-schema store with star schema design generated automatically from structural metadata provides options for different cuts through data flows
SDMX at MPI What we envisage (cont) intelligent interfaces to Excel using the structural metadata to support browsing and retrieval of data automatic generation of Excel templates data capture, registration, and management structural metadata browsing and management reference metadata definition and management reference metadata attachment
More Information In the workshop sessions