Download presentation
Presentation is loading. Please wait.
Published byPauliina Karvonen Modified over 5 years ago
1
Semantic Statistics DDI Lifecycle: Moving Forward Outcome of the Recent Workshops in Dagstuhl
Joachim Wackerow
2
Workshop on Semantic Statistics for Social, Behavioural, and Economic Sciences: Leveraging the DDI Model for the Linked Data Web
3
The XKOS vocabulary
4
Aims Harmonized representation of concept (especially classification) systems for DDI and SDMX/Datacube Using modelling semantics and languages of the semantic web (RDFS, OWL) Leveraging existing LOD standards like SKOS Leveraging ISO standards like ISO704 or ISO1087-1
5
SKOS I Simple Knowledge Organization System
Models organized sets of concepts (thesauri, code lists...) Concepts With labels (preferred, alternative, hidden...) and notes (editorial, historical...) Grouped in Concept Schemes Organized in Collections Linked by semantic properties Associative: related, closeMatch, exactMatch... Hierarchical: narrower, broader, narrowMatch...
6
SKOS II Extensible Fit for our purpose, with a few adjunctions
Example: SKOS-XL Fit for our purpose, with a few adjunctions Richer semantic relations Elements from the Neuchâtel model Classification levels Correspondance tables
7
XKOS Organization Classifications Semantic properties (from ISO 1087)
ClassificationLevel, ConceptAssociation and ConceptAssociationCollection classes 2 datatype properties, 7 object properties Semantic properties (from ISO 1087) 7 associative properties 4 hierarchical properties disjoint
8
DDI-RDF
9
Why DDI as Linked Data? Currently no such ontology available
To increase visibility of data holdings using mainstream Web technologies To open DDI to the Linked Data community To process DDI-RDF by RDF tools To link DDI-RDF to other RDF data To better identify opportunities for merging datasets To enable inferencing To research microdata within the LOD cloud
10
How was the DDI Ontology developed?
DDI subset of the most important DDI elements Use cases Experts in the statistics domain formulated use cases which are seen as most significant to solve frequent problems Most important use case: discover microdata connected with multiple studies
11
Discovery Use Case Which studies are connected with a specific coverage consisting of the 3 dimensions: time, country, and subject? What questions with a specific question text are contained in the study questionnaire? What questions are connected with a concept with a specific label? What questions are combined with a variable with an associated coverage consisting of the 3 dimensions time, country, and subject? What concepts are linked to particular variables or questions? What representation does a specific variable have? What codes and what categories are part of this representation? What variable label does a variable with a particular variable name have? What‘s the maximum value of a certain variable? What are the absolute and relative frequencies of a specific code? What data files contain the entire dataset?
18
Two RDF Vocabularies New specifications of the DDI Alliance:
XKOS - Harmonized representation of concept (especially classification) systems for DDI and SDMX/Datacube DDI-RDF – for Discovery Purposes
19
Workshop on DDI Lifecycle: Moving
20
Areas for Possible Improvement and Enhancement
Complete data life cycle coverage Broadened focus for new research domains Robust and persistent data model (for the metadata), extension possibilities, implementation for different technical domains Simpler specification that is easier to understand and use including better documentation
21
Why Now? After the experience of developing and using the initial DDI-L structure We find continued improvement of the development line is limited by the lack of a data model Further, we are experiencing pressure for changes from several directions at once new content from substantive working groups, which requires new approaches to the design of the specification.
22
Specific Changes Envisioned I
Abstraction of data capture/collection/source to handle different types of data Current data collection module is questionnaire-centric. Register data and data in the natural and health sciences (i.e., from technical devices or from laboratory analyses). New content on sampling, survey implementation, weighting, and paradata coming out of the Survey Design and Implementation Group New content developed by the Qualitative Working Group Framework for data and metadata quality Framework for access to data and metadata
23
Specific Changes Envisioned II
Process (work flow) description across the data life cycle, including support for automation and replication Integration with existing standards like GSBPM/GSIM, SDMX, CDISC, Triple-S Disclosure review and remediation Data management planning Development of standard queries and/or interface specifications (such as REST) which are needed to allow for interoperable services based on the DDI standard information model
24
Current Thinking
25
Scope Statement The Data Documentation Initiative (DDI) is an international standard for describing data related to the observation and measurement of human activity. With origins in the quantitative social sciences, DDI is increasingly being used by researchers in other disciplines. The DDI specification is also being used to document other data types, such as social media, biomarkers, administrative data, and transaction data. DDI is a model-based metadata specification that can be implemented in a variety of technologies. The specification itself is modular and can document and manage different stages of data lifecycles, such as conceptualization, collection, processing, analysis, distribution, discovery, repurposing, and archiving.
26
Design Goals for Next-Generation DDI
Suggested approach as foundational version canonical model expressed in English with a UML model expressing as much of that canonical model as possible This model can then be expressed in XML Schema RDF/OWL Ontology Relational database schema (only as recommendation) Other languages, ideally via some degree of automation
27
High-level Design Goals I
Interoperability and Standards – The model is optimized to facilitate interoperability with other relevant standards. Simplicity – The model is as simple as possible and easily understandable by different stakeholders. User Driven – User perspectives inform the model to ensure that it meets the needs of the international DDI user community. Terminology – The model uses clear terminology and when possible, uses existing terms and definitions. Iterative Development – The model is developed iteratively, bringing in a range of views from the user community. Documentation – The model includes and is supplemented by robust and accessible documentation. Lifecycle Orientation – The model supports the full research data lifecycle and the statistical production process, facilitating replication and the scientific method.
28
High-level Design Goals II
Reuse and Exchange – The model supports the reuse, exchange, and sharing of data and metadata within and among institutions. Modularity – The model is modular and these modules can be used independently. Stability – The model is stable and new versions are developed in a controlled manner. Extensibility – The model has a common core and is extensible. Tool Independence – The model is not dependent on any specific IT setting or tool. Innovation – The model supports both current and new ways of documenting, producing, and using data and leverages modern technologies. Actionable Metadata – The model provides actionable metadata that can be used to drive production and data collection processes.
29
Modularized DDI Model
30
Alignment with GSIM Simplified view of GSIM Objects
31
DDI Development Lines DDI Codebook DDI Lifecycle DDI 2.1 (DTD)
DDI 2.5 (XML Schema) DDI Lifecycle DDI 3.1 (XML Schema) DDI X (model-based) Bindings as XML Schema RDF Data as a Service, Service Oriented Architecture (SOA), Web services, REST Recommendation for DBMS Schema
32
Data as a Service Calls to support reading from a repository
GetByIdentifier – Retrieve an object based on its identifier. GetListOfVersions – Retrieve a list of all of the versions of an object. GetListOfItemsByListOfIdentifiers – Given a list of identifiers, return a list of items. GetListOfIdentifiersThatItemDependsOn – Given the identifier of a single item, return a list of all the items it depends on (recursively), i.e., get the item graph. Search (a faceted search by itemtype, field, language, organization, etc.) -- Some items need to be searchable depending on content as for instance, text in a label, or description. Additional functions GetListOfItemForType – List all of the items of a given type. GetForFunctionalGroup – E.g., which “simple surveys” are in the repository.
33
Community-Driven Development Process
Domain experts will do their modeling using a simplified English UML Definition of scope, structure, and documentation with no data types at this level Technical review Normative aspects of the model Output will be UML representation plus a text document Bindings like XML Schema representation Implementation Reference implementation Feedback from community
34
Next Steps Finalizing results from workshop
Resolution of open issues (especially regarding modeling approaches) Proposal to DDI Alliance Formal decision Possible further workshop(s) Timeline unknown yet iterative development is a key to success basic work to publish the core and two test cases to take about a year and a half
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.