Download presentation
Presentation is loading. Please wait.
1
Czech Statistical Office
Metadata standards Petr Elias Czech Statistical Office
2
OUTLINE What is a metadata standard? Examples of standards
3
Standardisation in statistics
Reasons: Demand for comparable data Statistics survey more complex phenomena Trend towards the reduction of respondents‘ burden Standardisation enables the increase of efficiency of proecesses Increased use of alternative data sources (administrative data) Standardisation needs to be managed internationally => Common strategy and programme for its implementation (ESS.VIP).
4
What is a metadata standard?
ISO definitions of standard: A standard is a document established by a concesus and adopted by a defined body. The document is commonly and repeatedly reusable, it sets the rules and procedures (in a form of guidelines) or parameters for activities or their outputs focusing on accomplishment of optimal level of order in the defined sphere. A standard is a document that provides requirements, specifications, guidelines or characteristics that can be used consistently to ensure that materials, products, processes and services are fit for their purpose.
5
The sponsorship on standardisation
Activities focus on the preparation for the complex management of interational statistical standardisation process Define statistical standard Make an inventory of all standards refering to statistics Develop a registry of statistical standards
6
The Common Metadata Framework
The Common Metadata Framework (CMF) initiative of UNECE that aims to assist statistical organizations in the adoption, modelling, usage, and implementation of statistical metadata systems and practices across all phases of their statistical business process. Since the process for statistical surveys is generally the same everywhere, it is possible to build a common business process model for survey work.
7
The Common Metadata Framework
CMF distinguishes 4 types of standards: Statistical concepts Technical standards Models and statistical practices Methodological guidelines and recommendations
9
Closer look at: Neuchâtel model GSBPM, GSIM, CSPA Inspire Dublin Core
RDF (Open & Linked Metadata) DDI SDMX SIMS
10
Neuchâtel model Classification Family Classification Item
Classification Level Classification Version Classification Classification Variant Correspondence Table Correspondence Item Classification Index Classif. Index Entry Case Law
11
GSBPM, GSIM, CSPA United Nations Economic Commission for Europe (UNECE) standards Generic Statistical Business Process Model (GSBPM) Generic Statistical Infromation Model (GSIM) Common Statistical Processing Architecture (CSPA) CSPA (tools) GSIM (objects) GSBPM (processes)
12
INSPIRE What is the INSPIRE Directive?
The INSPIRE directive 2007/2/EC came into force on 15 May 2007 and will be implemented in various stages, with full implementation required by 2019. The INSPIRE directive aims to create a European Union (EU) spatial data infrastructure. This will enable the sharing of environmental spatial information among public sector organisations and better facilitate public access to spatial information across Europe. A European Spatial Data Infrastructure will assist in policy-making across boundaries. Therefore the spatial information considered under the directive is extensive and includes a great variety of topical and technical themes.
13
INSPIRE INSPIRE is based on a number of common principles:
Data should be collected only once and kept where it can be maintained most effectively. It should be possible to combine seamless spatial information from different sources across Europe and share it with many users and applications. It should be possible for information collected at one level/scale to be shared with all levels/scales; detailed for thorough investigations, general for strategic purposes. Geographic information needed for good governance at all levels should be readily and transparently available. Easy to find what geographic information is available, how it can be used to meet a particular need, and under which conditions it can be acquired and used.
14
INSPIRE Implementing Rules
To ensure that the spatial data infrastructures of the Member States are compatible and usable in a Community and transboundary context, the Directive requires that common Implementing Rules (IR) are adopted in a number of specific areas (Metadata, Data Specifications, Network Services, Data and Service Sharing and Monitoring and Reporting). These IRs are adopted as Commission Regulations/Decisions. The Commission is assisted in the process of adopting such rules by a regulatory committee composed by representatives of the Member States and chaired by a representative of the Commission (this is known as the Comitology procedure).
15
INSPIRE Implementing Rules Inspire Metadata Regulation
Commission Regulation (EC) No 1205/2008 of 3 December 2008 + corrigendum The metadata describing a spatial data set, a spatial data set series or a spatial data service shall comprise the metadata elements or groups of metadata elements set out in Part B of the Annex and shall be created and maintained in accordance with the rules set out in Parts C and D thereof.
16
INSPIRE
17
INSPIRE – Code-list registry (on-line)
18
INSPIRE – Geoportal (on-line)
19
DUBLIN CORE The Dublin Core Schema
Is a small set of vocabulary terms that can be used to describe web resources (video, images, web pages, etc.), as well as physical resources such as books or CDs, and objects like artworks. The full set of Dublin Core metadata terms can be found on the Dublin Core Metadata Initiative (DCMI) website The original set of 15 classic metadata terms, known as the Dublin Core Metadata Element Set are endorsed in the following standards documents: IETF RFC 5013 ISO Standard [6] NISO Standard Z39.85[7]
20
DUBLIN CORE Dublin Core Metadata Element Set – 15 elements:
Title, Creator, Subject, Description, Publisher, Contributor, Date, Type, Format, Identifier, Source, Language, Relation, Coverage, Rights Each Dublin Core element is optional and may be repeated. The DCMI has established standard ways to refine elements and encourage the use of encoding and vocabulary schemes. There is no prescribed order in Dublin Core for presenting or using the elements. The Dublin Core became ISO standard in 2006 and is used as a base-level data element set for the description of learning resources in the ISO/IEC Metadata for learning resources (MLR) – Part 2: Dublin Core elements, prepared by the ISO/IEC JTC1 SC36. Full information on element definitions and term relationships can be found in the Dublin Core Metadata Registry
21
DUBLIN CORE The Dublin Core Metadata Initiative (DCMI) Metadata Terms is the current set of the Dublin Core vocabulary This set includes the fifteen terms of the Dublin Core Metadata Element Set, as well as the qualified terms. Each term has a unique URI in the namespace and all are defined as RDF properties.
22
DUBLIN CORE The Dublin Core Metadata Initiative (DCMI) Metadata Terms is the current set of the Dublin Core vocabulary abstract coverage hasFormat isVersionOf requires accessRights created hasPart language rights accrualMethod creator hasVersion license rightsHolder accrualPeriodicity date identifier mediator source accrualPolicy dateAccepted instructionalMethod medium spatial alternative dateCopyrighted isFormatOf modified subject audience dateSubmitted isPartOf provenance tableOfContents available description isReferencedBy publisher temporal bibliographicCitation educationLevel isReplacedBy references title conformsTo extent isRequiredBy relation type contributor format issued replaces valid
23
DUBLIN CORE The Dublin Core Schema
Dublin Core Metadata may be used for multiple purposes, from simple resource description, to combining metadata vocabularies of different metadata standards, to providing interoperability for metadata vocabularies in the Linked Data cloud and Semantic Web implementations. "Dublin" refers to Dublin, Ohio, USA where the schema originated during the 1995
24
DUBLIN CORE Dublin Core syntax
Dublin Core concepts and semantics are designed to be syntax independent and are equally applicable in a variety of contexts, as long as the metadata is in a form suitable for interpretation both by machines and by human beings. The Dublin Core Abstract Model provides a reference model against which particular Dublin Core encoding guidelines can be compared, independent of any particular encoding syntax. Such a reference model allows implementers to gain a better understanding of the kinds of descriptions they are trying to encode and facilitates the development of better mappings and translations between different syntax.
25
RDF A standard developed by World Wide Web Consortium (W3C)
RDF (Resource Description Framework) is a standard for publishing of interlinked data sets other W3C standards: (X)HTML – standard for publishing of interlinked documents CSS – standard for cascading style sheets XML – standard for data exchange PNG – standard for storage of graphics (pictures) Etc.
26
RDF Used as a standard format for OPEN DATA
Open Data is data published on Internet which are complete easily accessible machine-readable using standards with freely accessible specification published with clearly defined terms of use of data with minimum restriction accessible for users with minimum costs
27
RDF Requirements for the use of OPEN DATA
Users are not restricted in the way of using them Users may publish them further Data must have the record of their author (even when they are further published) When the data are further published, other users must have the same rights – the use of data must not be restricted during dissemination
28
RDF Used as a standard format for LINKED DATA
The main principles of Linked Data enable to create an ecosystem of web applications that publish, enrich and use data about entities in one globally shared data environment („data web“).
29
RDF Contemporary Web of documents Built upon simple principles
HTML format for publication of documents URL as unique global identificators of documents HTTP for localisation of documents and access to them via URL Hypertext links among documents HTTP je používáno webovými prohlížeči (IE, Firefox, Chrome...) vyhledávači (Google, Seznam...)
30
RDF Contemporary Web of documents - drawbacks
Provides lots of information about Prague However the information is not suitable for machine- processing Data are published as documents in a lot of places (Municipal office, Czech statistical office, Business register, Treasury, Regional information service...) Documents are designed for people, not for machine-reading Documents about Prague and related entities are not interlinked
31
RDF Linked data Basic principles Identify data with URI
Use so that others may refer to them and people/applications might find them Make the entities available in the standard format RDF for download or as a RDF data API (SPARQL) on the Link the entities using RDF links to related entities in order to provide their context and enable further navigation HTTP je používáno webovými prohlížeči (IE, Firefox, Chrome...) vyhledávači (Google, Seznam...)
32
RDF Comparison of formats as to their suitability Format
Independence on application Structured records Description of data structure Description of data semantics Creation of links PDF No DOC(X), RTF TXT Yes HTML Partially XLS(X) CSV JSON XML Odata RDF
33
RDF Use RDF data cubes subject to DCV standard (Data Cube Vocabulary)
ETL tools (= Extract, Transform & Load) Read data from internal relation database Transform data into the form of open data Save open data in the internal storage Make open data available via open data interface HTTP je používáno webovými prohlížeči (IE, Firefox, Chrome...) vyhledávači (Google, Seznam...)
34
RDF Open Data Node (ODN)
Open source SW tool developed under an EU project COMSODE ( Designed for: Organisations that want to publish open data Organisations that want to combine existing open data from different sources, clean them, enrich them and make them further available in the open form Software developers using open data HTTP je používáno webovými prohlížeči (IE, Firefox, Chrome...) vyhledávači (Google, Seznam...)
35
DDI The Data Documentation Initiative (DDI) is an effort to create an international standard for describing data from the social, behavioral, and economic sciences. The DDI metadata specification supports the entire research data life cycle. DDI metadata accompanies and enables data conceptualization, collection, processing, distribution, discovery, analysis, repurposing, and archiving.
36
DDI
37
DDI DDI Alliance (http://www.ddialliance.org/)
a self-sustaining membership organization that develops and promotes the DDI specification and associated tools, education, and outreach programs. Example of member organisations Australian Bureau of Statistics (ABS) Eurostat Food and Agriculture Organization (FAO) National Institute of Statistics and Economic Studies (INSEE) Open Data Foundation (Associate Member) Statistics New Zealand World Bank, Development Data Group (DECDG) and many others like universities, research institutes, data archives... DDI-RDF Discovery Vocabulary (Disco) This specification is designed to support the discovery of microdata sets and related metadata using RDF technologies in the Web of Linked Data. The vocabulary leverages the DDI specification to create a simplified version of this model for the discovery of data files. It is based on a subset of the DDI XML formats of DDI Codebook and DDI Lifecycle. It supports identifying programmatically the relevant datasets for a specific research purpose. Existing DDI XML instances can be transformed into this RDF format and therefore exposed in the Web of Linked Data. The reverse process is not intended, as the developers of the RDF discovery vocabulary have defined DDI-RDF components and reused components of other RDF vocabularies which make sense only in the Linked Data field. Resources Specification PHDD - Physical Data Description Description of the physical properties of existing or published data (tables) in a rectangular format. The data could be either represented in records with character-separated values (CSV) or in records with fixed length. PHDD could be used standalone or together with related vocabularies like Data Catalog Vocabulary (DCAT) or DDI-RDF Discovery (Disco). Descriptions in PHDD could be added to Web pages which provide tables in rectangular format. This would enable processing of this data by programs. The combined usage of PHDD, DDI-RDF Discovery, and DCAT would support the creation of data repositories which provide metadata for the description of collections, for data discovery, and for processing of the data. XKOS - Extended Knowledge Organization System XKOS leverages the Simple Knowledge Organization System (SKOS) for managing statistical classifications and concept management systems, since SKOS is widely used. LOD is used to create Web artifacts that machines can interpret, so publishing machine-readable statistical classifications and other concept management systems as SKOS instances is desired. The XKOS developers found that SKOS was insufficient for the problem. No aspect of SKOS was found to be wrong, just incomplete. Therefore, an extension to SKOS, called XKOS, is proposed. XKOS extends SKOS for the needs of statistical classifications. It does so in two main directions. First, it defines a number of terms that enable the representation of statistical classifications with their structure and textual properties, as well as the relations between classifications. Second, it refines SKOS semantic properties to allow the use of more specific relations between concepts. Those specific relations can be used for the representation of classifications or for any other case where SKOS is employed. XKOS adds the extensions that are desirable to meet the requirements of the statistical community.
38
DDI Consists of 2 separate development lines DDI-Codebook 2.1
It is a light-weight version of the standard, intended primarily to document simple survey data. DDI Codebook was the first version of the DDI to be published (Version 1 was released in 2000). DDI 2.0 was released in 2003, with Version 2.1 following two years later. Version 2.* added coverage of aggregate data and geography. DDI versions 1.0 – 2.1 were „Document Type Definition“ (DTD) A new version in the DDI Codebook line was published on January 29, This version is an XML Schema. It incorporates new substantive elements requested by the community and is designed to make it easier to migrate documents to DDI Lifecycle for those interested in doing so. DDI 1.0 published in 2000 DDI 2.0 published in 2003
39
DDI Consists of 2 separate development lines DDI-Lifecycle 3.1
Encompassing all of the DDI-Codebook specification and extending it, DDI- Lifecycle is designed to document and manage data across the entire life cycle, from conceptualization to data publication and analysis and beyond. Based on XML Schemas, DDI-Lifecycle is modular and extensible. The current version of the DDI-L Specification is Version 3.2, published in March 2014. DDI 1.0 published in 2000 DDI 2.0 published in 2003
40
DDI Consists of 2 separate development lines DDI-Lifecycle 3.1
Users new to DDI are encouraged to use this DDI-Lifecycle development line as it incorporates added functionality. DDI-Lifecycle supports: Metadata reuse across the data life cycle Metadata-driven survey design Question banks Complex data, e.g., longitudinal data Detailed geographic information Multiple languages Compliance with other metadata standards like ISO 11179 Process management and automation DDI 1.0 published in 2000 DDI 2.0 published in 2003
41
DDI Mapped to other standards:
Dublin Core (Basic Bibliographic Information) MARC (Bibliographic Information) GSIM (Generic Statistical Information Model) ISO/IES Data Registry ISO (Geography) SDMX (Aggregate data) METS (Content Wrapper) PREMIS (Preservation)
42
DDI RDF Vocabularies (drafted)
DDI-RDF Discovery Vocabulary („Disco“) – for publishing metadata about datasets into the Web of Linked Data Physical Data Description (PHDD) – for describing existing data in rectangular format Extended Knowledge Organization System (XKOS) – an RDF vocabulary for describing statistical classifications, which is an extension of the popular SKOS vocabulary. The public review of all vocabularies is planned for 2014. DDI-RDF Discovery Vocabulary (Disco) This specification is designed to support the discovery of microdata sets and related metadata using RDF technologies in the Web of Linked Data. The vocabulary leverages the DDI specification to create a simplified version of this model for the discovery of data files. It is based on a subset of the DDI XML formats of DDI Codebook and DDI Lifecycle. It supports identifying programmatically the relevant datasets for a specific research purpose. Existing DDI XML instances can be transformed into this RDF format and therefore exposed in the Web of Linked Data. The reverse process is not intended, as the developers of the RDF discovery vocabulary have defined DDI-RDF components and reused components of other RDF vocabularies which make sense only in the Linked Data field. Resources Specification PHDD - Physical Data Description Description of the physical properties of existing or published data (tables) in a rectangular format. The data could be either represented in records with character-separated values (CSV) or in records with fixed length. PHDD could be used standalone or together with related vocabularies like Data Catalog Vocabulary (DCAT) or DDI-RDF Discovery (Disco). Descriptions in PHDD could be added to Web pages which provide tables in rectangular format. This would enable processing of this data by programs. The combined usage of PHDD, DDI-RDF Discovery, and DCAT would support the creation of data repositories which provide metadata for the description of collections, for data discovery, and for processing of the data. XKOS - Extended Knowledge Organization System XKOS leverages the Simple Knowledge Organization System (SKOS) for managing statistical classifications and concept management systems, since SKOS is widely used. LOD is used to create Web artifacts that machines can interpret, so publishing machine-readable statistical classifications and other concept management systems as SKOS instances is desired. The XKOS developers found that SKOS was insufficient for the problem. No aspect of SKOS was found to be wrong, just incomplete. Therefore, an extension to SKOS, called XKOS, is proposed. XKOS extends SKOS for the needs of statistical classifications. It does so in two main directions. First, it defines a number of terms that enable the representation of statistical classifications with their structure and textual properties, as well as the relations between classifications. Second, it refines SKOS semantic properties to allow the use of more specific relations between concepts. Those specific relations can be used for the representation of classifications or for any other case where SKOS is employed. XKOS adds the extensions that are desirable to meet the requirements of the statistical community.
43
DDI Controlled Vocabularies
They play a critical role in metadata standards in terms of (1) semantics – definition of the meaning of metadata elements (2) content – declaration of instructions for what and how values should be assigned to elements.Physical Data Description (PHDD) – for describing existing data in rectangular format => Control of synonyms, Control of lexical anomalities (removing leading articles, prepositions, conjunctions), Clearly defined terminology, Support for machine-actionability A first set of controlled vocabularies is now being developed for the DDI standard, to be used to describe specific aspects of research data across the data life cycle. The vocabularies are published independently of the DDI schemas in an XML format called Genericode, an OASIS specification. The Genericode format provides a tabular model for code lists. DDI-RDF Discovery Vocabulary (Disco) This specification is designed to support the discovery of microdata sets and related metadata using RDF technologies in the Web of Linked Data. The vocabulary leverages the DDI specification to create a simplified version of this model for the discovery of data files. It is based on a subset of the DDI XML formats of DDI Codebook and DDI Lifecycle. It supports identifying programmatically the relevant datasets for a specific research purpose. Existing DDI XML instances can be transformed into this RDF format and therefore exposed in the Web of Linked Data. The reverse process is not intended, as the developers of the RDF discovery vocabulary have defined DDI-RDF components and reused components of other RDF vocabularies which make sense only in the Linked Data field. Resources Specification PHDD - Physical Data Description Description of the physical properties of existing or published data (tables) in a rectangular format. The data could be either represented in records with character-separated values (CSV) or in records with fixed length. PHDD could be used standalone or together with related vocabularies like Data Catalog Vocabulary (DCAT) or DDI-RDF Discovery (Disco). Descriptions in PHDD could be added to Web pages which provide tables in rectangular format. This would enable processing of this data by programs. The combined usage of PHDD, DDI-RDF Discovery, and DCAT would support the creation of data repositories which provide metadata for the description of collections, for data discovery, and for processing of the data. XKOS - Extended Knowledge Organization System XKOS leverages the Simple Knowledge Organization System (SKOS) for managing statistical classifications and concept management systems, since SKOS is widely used. LOD is used to create Web artifacts that machines can interpret, so publishing machine-readable statistical classifications and other concept management systems as SKOS instances is desired. The XKOS developers found that SKOS was insufficient for the problem. No aspect of SKOS was found to be wrong, just incomplete. Therefore, an extension to SKOS, called XKOS, is proposed. XKOS extends SKOS for the needs of statistical classifications. It does so in two main directions. First, it defines a number of terms that enable the representation of statistical classifications with their structure and textual properties, as well as the relations between classifications. Second, it refines SKOS semantic properties to allow the use of more specific relations between concepts. Those specific relations can be used for the representation of classifications or for any other case where SKOS is employed. XKOS adds the extensions that are desirable to meet the requirements of the statistical community.
44
SDMX For current information see presentation of Eurostat
45
SIMS – Single Integrated Metadata Structure
Dynamic and unique inventory of ESS quality and metadata statistical concepts Developed in order to: streamline and harmonise metadata and quality reporting in the ESS decrease the reporting burden on the statistical authorities by creating the framework for “once for all purposes” reporting, where each concept is only reported upon once and is re-usable for other reporting create an integrated and consistent quality and metadata reporting framework where the reports are stored in the same database create a flexible and up to date system where future extensions are possible by adding new concepts DDI-RDF Discovery Vocabulary (Disco) This specification is designed to support the discovery of microdata sets and related metadata using RDF technologies in the Web of Linked Data. The vocabulary leverages the DDI specification to create a simplified version of this model for the discovery of data files. It is based on a subset of the DDI XML formats of DDI Codebook and DDI Lifecycle. It supports identifying programmatically the relevant datasets for a specific research purpose. Existing DDI XML instances can be transformed into this RDF format and therefore exposed in the Web of Linked Data. The reverse process is not intended, as the developers of the RDF discovery vocabulary have defined DDI-RDF components and reused components of other RDF vocabularies which make sense only in the Linked Data field. Resources Specification PHDD - Physical Data Description Description of the physical properties of existing or published data (tables) in a rectangular format. The data could be either represented in records with character-separated values (CSV) or in records with fixed length. PHDD could be used standalone or together with related vocabularies like Data Catalog Vocabulary (DCAT) or DDI-RDF Discovery (Disco). Descriptions in PHDD could be added to Web pages which provide tables in rectangular format. This would enable processing of this data by programs. The combined usage of PHDD, DDI-RDF Discovery, and DCAT would support the creation of data repositories which provide metadata for the description of collections, for data discovery, and for processing of the data. XKOS - Extended Knowledge Organization System XKOS leverages the Simple Knowledge Organization System (SKOS) for managing statistical classifications and concept management systems, since SKOS is widely used. LOD is used to create Web artifacts that machines can interpret, so publishing machine-readable statistical classifications and other concept management systems as SKOS instances is desired. The XKOS developers found that SKOS was insufficient for the problem. No aspect of SKOS was found to be wrong, just incomplete. Therefore, an extension to SKOS, called XKOS, is proposed. XKOS extends SKOS for the needs of statistical classifications. It does so in two main directions. First, it defines a number of terms that enable the representation of statistical classifications with their structure and textual properties, as well as the relations between classifications. Second, it refines SKOS semantic properties to allow the use of more specific relations between concepts. Those specific relations can be used for the representation of classifications or for any other case where SKOS is employed. XKOS adds the extensions that are desirable to meet the requirements of the statistical community.
46
SIMS – Single Integrated Metadata Structure
In this structure, all statistical concepts of the two existing ESS report structures (ESMS and ESQRS) have been included and streamlined, by assuring that all concepts appear and are therefore reported upon only once (direct re-usability of existing information). It is a dynamic structure in the sense that additional statistical metadata and quality concepts can be included if necessary in the future. DDI-RDF Discovery Vocabulary (Disco) This specification is designed to support the discovery of microdata sets and related metadata using RDF technologies in the Web of Linked Data. The vocabulary leverages the DDI specification to create a simplified version of this model for the discovery of data files. It is based on a subset of the DDI XML formats of DDI Codebook and DDI Lifecycle. It supports identifying programmatically the relevant datasets for a specific research purpose. Existing DDI XML instances can be transformed into this RDF format and therefore exposed in the Web of Linked Data. The reverse process is not intended, as the developers of the RDF discovery vocabulary have defined DDI-RDF components and reused components of other RDF vocabularies which make sense only in the Linked Data field. Resources Specification PHDD - Physical Data Description Description of the physical properties of existing or published data (tables) in a rectangular format. The data could be either represented in records with character-separated values (CSV) or in records with fixed length. PHDD could be used standalone or together with related vocabularies like Data Catalog Vocabulary (DCAT) or DDI-RDF Discovery (Disco). Descriptions in PHDD could be added to Web pages which provide tables in rectangular format. This would enable processing of this data by programs. The combined usage of PHDD, DDI-RDF Discovery, and DCAT would support the creation of data repositories which provide metadata for the description of collections, for data discovery, and for processing of the data. XKOS - Extended Knowledge Organization System XKOS leverages the Simple Knowledge Organization System (SKOS) for managing statistical classifications and concept management systems, since SKOS is widely used. LOD is used to create Web artifacts that machines can interpret, so publishing machine-readable statistical classifications and other concept management systems as SKOS instances is desired. The XKOS developers found that SKOS was insufficient for the problem. No aspect of SKOS was found to be wrong, just incomplete. Therefore, an extension to SKOS, called XKOS, is proposed. XKOS extends SKOS for the needs of statistical classifications. It does so in two main directions. First, it defines a number of terms that enable the representation of statistical classifications with their structure and textual properties, as well as the relations between classifications. Second, it refines SKOS semantic properties to allow the use of more specific relations between concepts. Those specific relations can be used for the representation of classifications or for any other case where SKOS is employed. XKOS adds the extensions that are desirable to meet the requirements of the statistical community.
47
SIMS – Single Integrated Metadata Structure
Principles of harmonisation of ESMS & ESQRS All concepts in the existing metadata and quality report structures are included The statistical concepts appear only once The same concept names and the same quality indicators are always used in the different ESS metadata and quality report structures The descriptions and the guidelines for the compilation of the concepts and sub-concepts have been reviewed and harmonised The concepts are consistent with the SDMX statistical standards as listed in the SDMX Content-oriented Guidelines. DDI-RDF Discovery Vocabulary (Disco) This specification is designed to support the discovery of microdata sets and related metadata using RDF technologies in the Web of Linked Data. The vocabulary leverages the DDI specification to create a simplified version of this model for the discovery of data files. It is based on a subset of the DDI XML formats of DDI Codebook and DDI Lifecycle. It supports identifying programmatically the relevant datasets for a specific research purpose. Existing DDI XML instances can be transformed into this RDF format and therefore exposed in the Web of Linked Data. The reverse process is not intended, as the developers of the RDF discovery vocabulary have defined DDI-RDF components and reused components of other RDF vocabularies which make sense only in the Linked Data field. Resources Specification PHDD - Physical Data Description Description of the physical properties of existing or published data (tables) in a rectangular format. The data could be either represented in records with character-separated values (CSV) or in records with fixed length. PHDD could be used standalone or together with related vocabularies like Data Catalog Vocabulary (DCAT) or DDI-RDF Discovery (Disco). Descriptions in PHDD could be added to Web pages which provide tables in rectangular format. This would enable processing of this data by programs. The combined usage of PHDD, DDI-RDF Discovery, and DCAT would support the creation of data repositories which provide metadata for the description of collections, for data discovery, and for processing of the data. XKOS - Extended Knowledge Organization System XKOS leverages the Simple Knowledge Organization System (SKOS) for managing statistical classifications and concept management systems, since SKOS is widely used. LOD is used to create Web artifacts that machines can interpret, so publishing machine-readable statistical classifications and other concept management systems as SKOS instances is desired. The XKOS developers found that SKOS was insufficient for the problem. No aspect of SKOS was found to be wrong, just incomplete. Therefore, an extension to SKOS, called XKOS, is proposed. XKOS extends SKOS for the needs of statistical classifications. It does so in two main directions. First, it defines a number of terms that enable the representation of statistical classifications with their structure and textual properties, as well as the relations between classifications. Second, it refines SKOS semantic properties to allow the use of more specific relations between concepts. Those specific relations can be used for the representation of classifications or for any other case where SKOS is employed. XKOS adds the extensions that are desirable to meet the requirements of the statistical community.
48
SIMS – Single Integrated Metadata Structure
Enables the derivation of different subsets of information in the form of pre-defined report structures The short user-oriented or user quality report (U) is implemented through the improved visibility and readability of the quality related concepts that are included in ESMS The detailed producer-oriented or producer quality report (P) is implemented via the ESQRS report structure All quality related concepts and indicators of both the user and producer oriented quality reports (together with all other metadata concepts) form an integrated part of the SIMS inventory. DDI-RDF Discovery Vocabulary (Disco) This specification is designed to support the discovery of microdata sets and related metadata using RDF technologies in the Web of Linked Data. The vocabulary leverages the DDI specification to create a simplified version of this model for the discovery of data files. It is based on a subset of the DDI XML formats of DDI Codebook and DDI Lifecycle. It supports identifying programmatically the relevant datasets for a specific research purpose. Existing DDI XML instances can be transformed into this RDF format and therefore exposed in the Web of Linked Data. The reverse process is not intended, as the developers of the RDF discovery vocabulary have defined DDI-RDF components and reused components of other RDF vocabularies which make sense only in the Linked Data field. Resources Specification PHDD - Physical Data Description Description of the physical properties of existing or published data (tables) in a rectangular format. The data could be either represented in records with character-separated values (CSV) or in records with fixed length. PHDD could be used standalone or together with related vocabularies like Data Catalog Vocabulary (DCAT) or DDI-RDF Discovery (Disco). Descriptions in PHDD could be added to Web pages which provide tables in rectangular format. This would enable processing of this data by programs. The combined usage of PHDD, DDI-RDF Discovery, and DCAT would support the creation of data repositories which provide metadata for the description of collections, for data discovery, and for processing of the data. XKOS - Extended Knowledge Organization System XKOS leverages the Simple Knowledge Organization System (SKOS) for managing statistical classifications and concept management systems, since SKOS is widely used. LOD is used to create Web artifacts that machines can interpret, so publishing machine-readable statistical classifications and other concept management systems as SKOS instances is desired. The XKOS developers found that SKOS was insufficient for the problem. No aspect of SKOS was found to be wrong, just incomplete. Therefore, an extension to SKOS, called XKOS, is proposed. XKOS extends SKOS for the needs of statistical classifications. It does so in two main directions. First, it defines a number of terms that enable the representation of statistical classifications with their structure and textual properties, as well as the relations between classifications. Second, it refines SKOS semantic properties to allow the use of more specific relations between concepts. Those specific relations can be used for the representation of classifications or for any other case where SKOS is employed. XKOS adds the extensions that are desirable to meet the requirements of the statistical community.
49
SIMS – Single Integrated Metadata Structure
DDI-RDF Discovery Vocabulary (Disco) This specification is designed to support the discovery of microdata sets and related metadata using RDF technologies in the Web of Linked Data. The vocabulary leverages the DDI specification to create a simplified version of this model for the discovery of data files. It is based on a subset of the DDI XML formats of DDI Codebook and DDI Lifecycle. It supports identifying programmatically the relevant datasets for a specific research purpose. Existing DDI XML instances can be transformed into this RDF format and therefore exposed in the Web of Linked Data. The reverse process is not intended, as the developers of the RDF discovery vocabulary have defined DDI-RDF components and reused components of other RDF vocabularies which make sense only in the Linked Data field. Resources Specification PHDD - Physical Data Description Description of the physical properties of existing or published data (tables) in a rectangular format. The data could be either represented in records with character-separated values (CSV) or in records with fixed length. PHDD could be used standalone or together with related vocabularies like Data Catalog Vocabulary (DCAT) or DDI-RDF Discovery (Disco). Descriptions in PHDD could be added to Web pages which provide tables in rectangular format. This would enable processing of this data by programs. The combined usage of PHDD, DDI-RDF Discovery, and DCAT would support the creation of data repositories which provide metadata for the description of collections, for data discovery, and for processing of the data. XKOS - Extended Knowledge Organization System XKOS leverages the Simple Knowledge Organization System (SKOS) for managing statistical classifications and concept management systems, since SKOS is widely used. LOD is used to create Web artifacts that machines can interpret, so publishing machine-readable statistical classifications and other concept management systems as SKOS instances is desired. The XKOS developers found that SKOS was insufficient for the problem. No aspect of SKOS was found to be wrong, just incomplete. Therefore, an extension to SKOS, called XKOS, is proposed. XKOS extends SKOS for the needs of statistical classifications. It does so in two main directions. First, it defines a number of terms that enable the representation of statistical classifications with their structure and textual properties, as well as the relations between classifications. Second, it refines SKOS semantic properties to allow the use of more specific relations between concepts. Those specific relations can be used for the representation of classifications or for any other case where SKOS is employed. XKOS adds the extensions that are desirable to meet the requirements of the statistical community.
50
SIMS – Single Integrated Metadata Structure
DDI-RDF Discovery Vocabulary (Disco) This specification is designed to support the discovery of microdata sets and related metadata using RDF technologies in the Web of Linked Data. The vocabulary leverages the DDI specification to create a simplified version of this model for the discovery of data files. It is based on a subset of the DDI XML formats of DDI Codebook and DDI Lifecycle. It supports identifying programmatically the relevant datasets for a specific research purpose. Existing DDI XML instances can be transformed into this RDF format and therefore exposed in the Web of Linked Data. The reverse process is not intended, as the developers of the RDF discovery vocabulary have defined DDI-RDF components and reused components of other RDF vocabularies which make sense only in the Linked Data field. Resources Specification PHDD - Physical Data Description Description of the physical properties of existing or published data (tables) in a rectangular format. The data could be either represented in records with character-separated values (CSV) or in records with fixed length. PHDD could be used standalone or together with related vocabularies like Data Catalog Vocabulary (DCAT) or DDI-RDF Discovery (Disco). Descriptions in PHDD could be added to Web pages which provide tables in rectangular format. This would enable processing of this data by programs. The combined usage of PHDD, DDI-RDF Discovery, and DCAT would support the creation of data repositories which provide metadata for the description of collections, for data discovery, and for processing of the data. XKOS - Extended Knowledge Organization System XKOS leverages the Simple Knowledge Organization System (SKOS) for managing statistical classifications and concept management systems, since SKOS is widely used. LOD is used to create Web artifacts that machines can interpret, so publishing machine-readable statistical classifications and other concept management systems as SKOS instances is desired. The XKOS developers found that SKOS was insufficient for the problem. No aspect of SKOS was found to be wrong, just incomplete. Therefore, an extension to SKOS, called XKOS, is proposed. XKOS extends SKOS for the needs of statistical classifications. It does so in two main directions. First, it defines a number of terms that enable the representation of statistical classifications with their structure and textual properties, as well as the relations between classifications. Second, it refines SKOS semantic properties to allow the use of more specific relations between concepts. Those specific relations can be used for the representation of classifications or for any other case where SKOS is employed. XKOS adds the extensions that are desirable to meet the requirements of the statistical community.
51
Any questions?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.