Presentation is loading. Please wait.

Presentation is loading. Please wait.

SDMX training Francesco Rizzo June 2018

Similar presentations


Presentation on theme: "SDMX training Francesco Rizzo June 2018"— Presentation transcript:

1 SDMX training Francesco Rizzo rizzo@istat.it 19-22 June 2018
SDMX – GSBPM mapping SDMX – GSIM mapping

2 Mapping GSBPM - SDMX Planning
Quality management / Metadata Management / Data Management / Process Data Management / etc. Planning

3 All these sub-processes are covered by SDMX
(1) Specify needs (1.1) Identify needs (1.2) Consult and confirm needs All these sub-processes are covered by SDMX (1.3) Establish output objectives (1.4) Identify concepts (1.5) Check data availability (1.6) Prepare business case

4 (2) Design (2.1) Design output (2.3) Design collection (2) Design
This phase describes the development and design activities, and any associated practical research work needed to define the statistical outputs, concepts, methodologies, collection instruments and operational processes. This phase specifies all relevant metadata, ready for use later in the statistical business process, as well as quality assurance procedures. (2.1) Design output This sub-process contains the detailed design of the statistical outputs, products and services to be produced, including the related development work and preparation of the systems and tools used in the "Disseminate" phase SDMX Web Service (APIs) for M2M dissemination Graphical User interfaces as clients of the SDMX APIs for both data and metadata (2.3) Design collection This sub-process determines the most appropriate collection method(s) and instrument(s) Define how to collect data (Push Vs Pull) Create Excel matrices Develop SDMX DSDs and related artefacts Develop guidelines and examples (e.g. SDMX messages) Identify all the security issues and design the related solutions Tools like Eurostat Data Structure Wizard or Istat Meta Manager can be used effectively (2.1) Design output (2.3) Design output

5 (4.4) Finalize collection
(4) Collect phase This phase collects or gathers all necessary information (data and metadata), using different collection modes (including extractions from statistical, administrative and other non-statistical registers and databases), and loads them into the appropriate environment for further processing. SDMX is used to collect (mainly) aggregated data. For example, within the context of the European statistical System, Eurostat has already started many data collections based on SDMX (e.g. NA, STS, etc.) (4) Collect (4.2) Set up collection This sub-process ensures that the people, processes and technology are ready to collect data and metadata, in all modes as designed. On the basis of the (2.3) sub-process the following activities should take place: Configuring collection systems to request (Pull) and receive (Push) the data Ensuring the security of data to be collected Define provisional agreements (4.3) Run collection This sub-process is where the collection is implemented, with the different instruments being used to collect or gather the information, which may include raw micro-data or aggregates produced at the source, as well as any associated metadata. Pull data from data providers databases or websites (e.g. pulling SDMX web services) Accept data pushed by the data provider Perform basic validation of the structure and integrity of the information received (4.4) Finalize collection This sub-process includes loading the collected data and metadata into a suitable electronic environment for further processing. It may include manual or automatic data take-on, for example: Extracting data from excel questionnaires (see Istat Excel2CSV tool) Converting the formats of files received from other organisations (see Eurostat Convert tool) Loading data into databases (4.2) Set up collection (4.3) Run collection (4.4) Finalize collection

6 (5.2) Classify and code (5) Process
This sub-process classifies and codes the input data. In the data reporting exercises very often a mapping between “local” and SDMX concepts and codes must take place The Eurostat Mapping Assistant tool can be used effectively in this context (5.2) Classify and code

7 (6.1) Prepare draft outputs
(6) Analyse (6) Analyse phase In this phase, statistical outputs are produced, examined in detail and made ready for dissemination. It may not seem obvious that SDMX is relevant to the process of analysis of aggregates, but it can sometimes be very useful. This will depend on which tools are used by an NSO to perform these various steps. Because most systems work well with XML generally – and because SDMX-ML is one flavour of XML – SDMX can provide some useful functions as the aggregates are analysed and further processed (6.2) Validate outputs This sub-process is where statisticians validate the quality of the outputs produced, in accordance with a general quality framework and with expectations. The validation of outputs requires more than just data visualization, and it is here that SDMX-ML can provide some solid benefit. Some of the validation rules exist within the DSD, and these can be automatically checked using free SDMX data and metadata set tools, others exist within an SDMX Registry where cross references, versioning, and request for deletions are validated to ensure the integrity of the structural metadata. What SDMX cannot validate is that the numbers reported are correct in terms of other values in the data set – that is, are they plausible values given the numbers reported in preceding periods, or in relation to other reported data (6.1) Prepare draft outputs This sub-process is where the data are transformed into statistical outputs. In the preparation of draft outputs, it may be helpful to use any of the various visualization tools based on SDMX when looking at the data. Especially if files are passed between several individuals while the draft outputs are prepared, it may be useful to exchange the SDMX-ML file, so that different individuals can use different visualizations of the same data while performing this work. (6.1) Prepare draft outputs (6.2) Validate outputs

8 (7.1) Update output systems
This sub-process manages the update of systems where data and metadata are stored ready for dissemination purposes, including: formatting data and metadata ready to be put into output databases loading data and metadata into output databases ensuring data are linked to the relevant metadata SDMX can be used as a format for the exchange of data between systems, whether these systems are internal to an organization, or external, and thus it makes a good format for loading databases used in all types of dissemination. An SDMX Registry can make the reporting of such data more automated by using the data registration mechanism supported by a registry. The benefit of such a system is that – once new data have been “registered”, the data collector can come and simply query the service for the new data. This helps to ease the burden of data reporting (7) Dissemination (7.2) Produce dissemination products This sub-process produces the products, as previously designed (in sub-process 2.1), to meet user needs. SDMX can be used as the single XML format for the creation of all other dissemination products, at least for providing the tabular views of the data. SDMX is also directly useful in two ways: as a format for reporting to data collectors and as a direct download format. The “advanced” use of SDMX – where an SDMX-capable database can create many dissemination products which transform the SDMX into other formats, and even in an on-demand fashion for Web dissemination – can greatly simplify the process of preparing dissemination outputs (7.4) Promote dissemination products SDMX is extremely useful in this regard, although not perhaps in a fashion which is obvious. This process in the GSBPM is typically seen as the “advertising” of the statistical products, and SDMX is not much use here except that the use of leading-edge standards may offer some opportunities for promotion (presentations at conferences, etc.). For more interesting in increasing the visibility and use of data is the existence of the SDMX Registry Services, which provide a platform for the automatic discovery of data products. Users have become used to the idea that resources can be “Googled”, and while the SDMX Registry services are not part of Google itself, they do provide a focused way of searching for all of the data produced within a domain, regardless of which site the data is published on. In essence, the SDMX Registry Services provide an online catalogue, listing all of the data available within a community. (7.1) Update output systems (7.2) Produce dissemination products (7.4) Promote dissemination products

9 Metadata management Data management
This over-arching phase deals with creation, use and archiving of statistical metadata. SDMX provides a central repository (the Registry) full of standardized structural metadata SDMX Information Model details artefacts for Reference and Quality metadata Metadata management Data management This over-arching phase deals with general data security, custodianship and ownership, data quality, archiving rules, preservation, retention and disposal. SDMX as a model for the structure of a data warehouse or metadata repository Data management

10 Mapping GSIM - SDMX

11 A combination of a Category and related attributes.
SDMX GSIM Item A combination of a Category and related attributes. Node A Node is created as a Category, Code or Classification Item for the purpose of defining the situation in which the Category is being used.

12 Mapping GSIM - SDMX SDMX GSIM
Item Scheme A combination of a Category and related attributes. Node Set Node Set is a kind of Concept System. Here are 2 examples: 1) Sex Categories (Male, Female, Other) 2) Sex Codes (<m, male>, <f, female>, <o, other>)

13 Mapping GSIM - SDMX SDMX GSIM
Concept Unit of thought created by a unique combination of characteristics Unit of thought differentiated by characteristics.

14 Mapping GSIM - SDMX SDMX GSIM
Concept Scheme Set of concepts that are used in a data structure definition or metadata structure definition. Concept schemes group a set of concepts, provide their definitions and names. Node Set Node Set is a kind of Concept System. Here are 2 examples: 1) Sex Categories (Male, Female, Other) 2) Sex Codes (<m, male>, <f, female>, <o, other>)

15 Mapping GSIM - SDMX SDMX GSIM
Code Language independent set of letters, numbers or symbols that represent a concept whose meaning is described in a natural language Code Item An element of a Code List. A type of Node particular to a Code List type of Node Set. A Code Item combines the meaning of the included Category with a Code representation.

16 Mapping GSIM - SDMX SDMX GSIM
Code list Predefined set of terms from which some statistical coded concepts take their values. Code List A list of Categories where each Category has a predefined Code assigned to it. A kind of Node Set for which the Category contained in each Node has a Code assigned as a Designation. For example:  1 - Male  2 - Female

17 Mapping GSIM - SDMX SDMX GSIM Data Structure Definition
Set of structural metadata associated to a data set, which includes information about how concepts are associated with the measures, dimensions, and attributes of a data cube, along with information about the representation of data and related descriptive metadata Data Structure Defines the structure of an organized collection of data (Data Set) The structure is described using Data Structure Components that can be either Attribute Components, Identifier Components or Measure Components. from the person (Unit)

18 Mapping GSIM - SDMX SDMX GSIM
Metadata Structure Definition Specification of the allowed content of a metadata set in terms of attributes for which content is to be provided and to which type of object the metadata pertain Referential Metadata Structure Defines the structure of an organized collection of referential metadata (Referential Metadata Set) A Referential Metadata Structure defines a structured list of Referential Metadata Attributes for a given Referential Metadata Subject

19 Mapping GSIM - SDMX SDMX GSIM
Data Set Organised collection of data defined by a Data Structure Definition (DSD) An organized collection of data. Examples of Data Sets could be observation registers, time series, longitudinal data, survey data, rectangular data sets, event-history data, tables, data tables, cubes, registers, hypercubes, and matrixes. A broader term for Data Set could be data. A narrower term for Data Set could be data element, data record, cell, field

20 Mapping GSIM - SDMX SDMX GSIM
Metadata Set Organised collection of reference metadata. It contains one or more reports, each report comprising the metadata content (a set of attributes and corresponding content), and the identification of the precise object to which the metadata are to be attached. The metadata can be attached to any SDMX artefact that can be identified (e.g. structural artefact such as a code, concept, dimension or a part of a data set such as a partial series key or observation) Referential Metadata Set An organized collection of referential metadata for a given Referential Metadata Subject. Referential Metadata Sets organize referential metadata. Each Referential Metadata Set uses a Referential Metadata Structure to define a structured list of Referential Metadata Attributes for a given Referential Metadata Subject

21 Mapping GSIM - SDMX SDMX GSIM
Data Provider Organisation or individual that reports or disseminates data or reference metadata Information Provider An Individual or Organization that provides collected information. An Information Provider possesses sets of information (that it has generated, collected, produced, bought or otherwise acquired) and is willing to supply that information (data or referential metadata) to the statistical office. The two parties use a Provision Agreement to agree the Data Structure and Referential Metadata Structure of the data to be exchanged via an Exchange Channel

22 Mapping GSIM - SDMX SDMX GSIM
Provisional Agreement Arrangement within which the information provider supplies data or metadata The legal or other basis by which two parties agree to exchange data. A Provision Agreement between the statistical organization and the Information Provider (collection) or the Information Consumer(dissemination) governs the use of Exchange Channels. The Provision Agreement, which may be explicitly or implicitly agreed, provides the legal or other basis by which the two parties agree to exchange data. The parties also use the Provision Agreement to agree the Data Structure and Referential Metadata Structure of the information to be exchanged

23 Mapping GSIM - SDMX SDMX GSIM
Attribute Statistical concept providing qualitative information about a specific statistical object. Attribute Component The role given to a Represented Variable in the context of a Data Structure, which supplies information other than identification or measures. For example the publication status of an observation (e.g. provisional, final, revised)

24 Mapping GSIM - SDMX SDMX GSIM
Measure Statistical concept for which data are provided in a data set Measure Component The role given to a Represented Variable in the context of a Data Structure to hold the observed/derived values for a particular Unit in an organized collection of data. A Measure Component is a sub-type of Data Structure Component. For example age and height of a person in a Unit Data Set or number of citizens and number of households in a country in a Data Set for multiple countries (Dimensional Data Set).

25 Mapping GSIM - SDMX SDMX GSIM
Dimension Statistical concept for which data are provided in a data set Identifier Component The role given to a Represented Variable in the context of a Data Structure to identify the unit in an organized collection of data. An Identifier Component is a sub-type of Data Structure Component. The personal identification number of a Swedish citizen for unit data or the name of a country in the European Union for dimensional data.


Download ppt "SDMX training Francesco Rizzo June 2018"

Similar presentations


Ads by Google