Tomasz Miksa, SBA Research, Wien, Austria

Slides:



Advertisements
Similar presentations
Transformations at GPO: An Update on the Government Printing Office's Future Digital System George Barnum Coalition for Networked Information December.
Advertisements

ReQuest (Validating Semantic Searches) Norman Piedade de Noronha 16 th July, 2004.
Data Seal of Approval Overview Lightning Talk RDA Plenary 5 – San Diego March 11, 2015 Mary Vardigan University of Michigan Inter-university Consortium.
PhD-course Research Data Management (RDM) Expert Centre Research Data.
The Data Attribution Abdul Saboor PhD Research Student Model Base Development and Software Quality Assurance Research Group Freie.
DATA FOUNDATION TERMINOLOGY WG 4 th Plenary Update THE PLUM GOALS This model together with the derived terminology can be used Across communities and stakeholders.
SC32 WG2 Metadata Standards Tutorial Metadata Registries and Big Data WG2 N1945 June 9, 2014 Beijing, China.
RDA Data Foundation and Terminology (DFT) IG: Introduction Prepared for RDA 6 th Plenary Paris, Sept. 25, 2015 Gary Berg-Cross, Raphael Ritz Co-Chairs.
RESEARCH DATA IN UQ ESPACE ANDS WORKSHOP MANAGING RESEARCH DATA IN REPOSITORIES CASE STUDY: THE UNIVERSITY OF QUEENSLAND LIBRARY Andrew Heath Scholarly.
Hydro DWG at the RDA Plenary BoF - Improve sharing of water resource data globally 24 September BREAKOUT :30-15:00.
11 Researcher practice in data management Margaret Henty.
Options for customising DMPonline Sarah Jones Digital Curation Centre, Glasgow DMPonline workshop, 9-10 November.
Global Water Information Interest Group meeting RDA 7 th Plenary, 1 st March 2016, Tokyo Global Water Information Interest Group Welcome to the inaugural.
Writing a data management plan (DMP) Stephen Grace and David McElroy Writing a DMP workshop, UEL 5 March 2015.
Why do researchers need a Data Management Plan (DMP)? For all the same reasons you should take care of your data… To ensure that valuable data resources.
Evaluating Barriers to Output Adoption in the Digital Humanities Lindsay Poirier RDA Data Share Fellow, Co-Chair Empirical Humanities Metadata WG Plenary.
1 This slide indicated the continuous cycle of creating raw data or derived data based on collections of existing data. Identify components that could.
Intentions and Goals Comparison of core documents from DFIG and Publishing Workflow IG show that there is much overlap despite different starting points.
WP3: Common policies and implementation strategies
CESSDA SaW Training on Trust, Identifying Demand & Networking
Sample Fit-Gap Kick-off
Introduction to Persistent Identifiers
Research Data Repository Interoperability WG David Wilcox, Thomas Jejkal Montreal, 09/20/17 CC BY-SA 4.0.
Designing a better future: Active, actionable DMPs
Herman Smith United Nations Statistics Division
DIAS & DIAS data release 2 years DIAS-GCI Cooperation Hiroko KINUTANI DIAS (Data Integration and Analysis System in Japan) , St. Petersburg.
Paolo Budroni, University of Vienna
WG Research Data Collections RDA P10 Montréal – September 2017
Data Ingestion in ENES and collaboration with RDA
Fernando Aguilar, IFCA-CSIC
Recent developments for machine-actionable DMPs
Machine-actionable DMPs: a review of recent work
Exposing DMPs Working Group
General Finnish DMP Guidance
SIIF pilot support group Meeting 17 Apr 2013
Project Plan Template (Help text appears in cursive on slides and in the notes field)
Update RDA Secretariat 20 September 2017
EOSCpilot Skills Landscape & Framework
Policy and Best Practice … in practice
Metadata for research outputs management Part 2
11. The future of SDMX Introducing the SDMX Roadmap 2020
EOSCpilot All Hands Meeting 9 March 2018, Pisa
WP7: Training & Education
SA DMP TOOL Demonstration
How to Design and Implement Research Outputs Repositories
From Observational Data to Information (OD2I IG )
WG Research Data Collections An overview of the recommendation
DMP Common Standards WG
Research Data Alliance (RDA) 9th WG/IG Collaboration Meeting: Repository Platforms for Research Data (RPRD) Interest Group 13nd June 2018 Co-Chairs:
ESS Standardisation State of play
Open Science: the crucial importance of metadata
e-Invoicing – e-Ordering 20/11/2008
Agenda (AM) 9:30-10:15 Introduction to RDA
A Case Study for Synergistically Implementing the Management of Open Data Robert R. Downs NASA Socioeconomic Data and Applications.
EOSCpilot All Hands Meeting 9 March 2018, Pisa
The Research Data Alliance
CIS progress June-November 2011: main highlights
Bird of Feather Session
DMP Common Standards WG
Angus Whyte P-11 Berlin, 22 March 2018
Exchanging Data Management Plans with DDI
Ethics & Data Management
WG PID Kernel Information RDA P11 Berlin – March 2018
Co-Chairs: Keith Jeffery, Rebecca Koskela, Alex Ball
… Two-step approach Conceptual Framework Annex I Annex II Annex III
Leveraging PIDs for object management in data infrastructures RDA UK Node Workshop, July Tobias Weigel (DKRZ)
Data + Research Elements What Publishers Can Do (and Are Doing) to Facilitate Data Integration and Attribution David Parsons – Lawrence, KS, 13th February.
1st Call for Collaboration Projects
Interoperability and data for open science
Open Science: is the Research Data Alliance a help or a hindrance?
Presentation transcript:

Domain Specific Extensions for Machine-actionable Data Management Plans Tomasz Miksa, SBA Research, Wien, Austria João Cardoso, INESC-ID, Lisboa Portugal Paul Walk, Antleaf Ltd, United Kingdom, 29/12/2018 www.rd-alliance.org - @resdatall

www.rd-alliance.org - @resdatall Agenda You learn about us We learn about you Exercise 1 – processes for maDMPs demo of new tools and processes work in groups + discussion Coffee break Exercise 2 – models for maDMPs demo of new tools and models Wrap-up 29/12/2018 www.rd-alliance.org - @resdatall

www.rd-alliance.org - @resdatall you learn about us 29/12/2018 www.rd-alliance.org - @resdatall

Data Management Plans (DMPs) manually created text documents considered as bureaucracy created too late vague depend on human factor scrupulousness awareness

Data Management Plans How to discover these tools? Which one do I need to use? Why do I have to provide the same information again? Why haven’t they consulted us before? Who is going to pay for this? We don’t have enough people for that!

Research data lifecycle Stakeholders involved in research data management require information at certain stages can provide information if requested at a proper stage Many problems can be avoided when timing is right information flow is ensured => many stakeholders involved who need information can provide it if asked at proper stage

Automated Data Management Workflow

Machine-actionable DMPs living documents automate data management collect information from systems trigger actions in systems facilitate validation This requires well-defined RDM workflows data management infrastructure common data model "dc:creator":[ { "foaf:name":"John Smith", "@id":"orcid.org/0000-1111-2222-3333", "foaf:mbox":"mailto:jsmith@tuwien.ac.at", "madmp:institution":"AT-TU-Wien" } ],

www.rd-alliance.org - @resdatall Example Current DMPs – model questionnaires <administrative_data> <question>Who will be the Principle Investigator?</question> <answer>The PI will be John Smith from our university.</answer> </administrative_data> Machine-actionable DMPs – model information "dc:creator":[ { "foaf:name":"John Smith", "@id":"orcid.org/0000-1111-2222-3333", "foaf:mbox":"mailto:jsmith@tuwien.ac.at", "madmp:institution":" AT-Vienna-University-of-Technology" } ], dc, foaf – reuse existing standards madmp – create own when needed AT Vienna – use controled vocabularies when possible or PIDs for instituion ORCID – use PIDs when possible 29/12/2018 www.rd-alliance.org - @resdatall

Example Currently available – not very useful Machine-actionable DMP <administrative_data> <question>Who will be the Principle Investigator?</question> <answer>The PI will be John Smith from our university.</answer> </administrative_data> Machine-actionable DMP "dc:creator":[ { "foaf:name":"John Smith", "@id":"orcid.org/0000-1111-2222-3333", "foaf:mbox":"mailto:jsmith@tuwien.ac.at", "madmp:institution":"AT-Vienna-University-of-Technology" } ], Reuse existing standards, e.g. Dublin Core, PREMIS, etc. dc, foaf – reuse existing standards madmp – create own when needed AT Vienna – use controled vocabularies when possible or PIDs for instituion ORCID – use PIDs when possible 29/12/2018 www.rd-alliance.org - @resdatall

Example Currently available – not very useful Machine-actionable DMP <administrative_data> <question>Who will be the Principle Investigator?</question> <answer>The PI will be John Smith from our university.</answer> </administrative_data> Machine-actionable DMP "dc:creator":[ { "foaf:name":"John Smith", "@id":"orcid.org/0000-1111-2222-3333", "foaf:mbox":"mailto:jsmith@tuwien.ac.at", "madmp:institution":"AT-Vienna-University-of-Technology" } ], Use PIDs whenever possible, e.g. ORCID dc, foaf – reuse existing standards madmp – create own when needed AT Vienna – use controled vocabularies when possible or PIDs for instituion ORCID – use PIDs when possible 29/12/2018 www.rd-alliance.org - @resdatall

Example Currently available – not very useful Machine-actionable DMP <administrative_data> <question>Who will be the Principle Investigator?</question> <answer>The PI will be John Smith from our university.</answer> </administrative_data> Machine-actionable DMP "dc:creator":[ { "foaf:name":"John Smith", "@id":"orcid.org/0000-1111-2222-3333", "foaf:mbox":"mailto:jsmith@tuwien.ac.at", "madmp:institution":"AT-Vienna-University-of-Technology" } ], Use controlled vocabularies dc, foaf – reuse existing standards madmp – create own when needed AT Vienna – use controled vocabularies when possible or PIDs for instituion ORCID – use PIDs when possible 29/12/2018 www.rd-alliance.org - @resdatall

Example Currently available – not very useful Machine-actionable DMP <administrative_data> <question>Who will be the Principle Investigator?</question> <answer>The PI will be John Smith from our university.</answer> </administrative_data> Machine-actionable DMP "dc:creator":[ { "foaf:name":"John Smith", "@id":"orcid.org/0000-1111-2222-3333", "foaf:mbox":"mailto:jsmith@tuwien.ac.at", "madmp:institution":"AT-Vienna-University-of-Technology" } ], Develop own concepts and vocabularies only when needed dc, foaf – reuse existing standards madmp – create own when needed AT Vienna – use controled vocabularies when possible or PIDs for instituion ORCID – use PIDs when possible 29/12/2018 www.rd-alliance.org - @resdatall

Machine-actionable DMPs

www.rd-alliance.org - @resdatall What is RDA Research Data Alliance community-driven organization 6,000 members from 130 countries different stakeholders Plenary meetings Interest Groups (IGs) Active DMPs Working Groups (WGs) DMP Common Standards 29/12/2018 www.rd-alliance.org - @resdatall

DMP Common Standards - Outputs Common data model for machine-actionable DMPs to model information from standard DMPs NOT a template NOT a questionnaire modular design core set of elements domain specific extensions Reference implementations ready to use models JSON, XML, RDF, etc. Guidelines for adoption of the common data model requirements for supporting systems pilot studies 29/12/2018 www.rd-alliance.org - @resdatall

www.rd-alliance.org - @resdatall https://www.rd-alliance.org/groups/dmp-common-standards-wg 29/12/2018 www.rd-alliance.org - @resdatall

www.rd-alliance.org - @resdatall Workshop objectives Evaluate processes that read from/write to maDMPs which systems can be integrated? how maDMPs connect to activities of various stakeholders? Identify how information must be modelled in maDMPs which specific fields are needed? what models/dictionaries/standards can be reused? identify stakeholders at each lifecycle stage e.g. who can provide cost estimations when planning? identify how available information changes over the lifetime of a DMP e.g. when exact size of data is important and when not? how need for information changes  over the lifetime of a DMP e.g. is a persistent identifier important to funders in the planning phase or when a project ends? 29/12/2018 www.rd-alliance.org - @resdatall

www.rd-alliance.org - @resdatall we learn about you 29/12/2018 www.rd-alliance.org - @resdatall

www.rd-alliance.org - @resdatall Introduction round Your name Your location Your role What’s the most important thing you want to get out of this meeting? 29/12/2018 www.rd-alliance.org - @resdatall

www.rd-alliance.org - @resdatall Processes for maDMPs Exercise 1 29/12/2018 www.rd-alliance.org - @resdatall

Horizon 2020 DMP survey report Horizon 2020 template for Data Management Plans (Version 1.0.0) [Data set]. Zenodo. http://doi.org/10.5281/zenodo.1120245 29/12/2018 www.rd-alliance.org - @resdatall

Planning phase Goal: get estimations and recommendations (which are feasible to implement later) BASIC INFO ADMINISTRATIVE DATA ANALYSE DATA FIND REPOSITORY SELECT LICENSE GENERATE DMP John Smith

Project and Post-project phases Goal: update DMP with real information by re-using (linking) information provided elsewhere OAI-PMH BASIC INFO ADMINISTRATIVE DATA GET METADATA PRESERVATION GENERATE DMP John Smith 10 YEARS

www.rd-alliance.org - @resdatall Planning phase - demo https://github.com/IrinaAvram/DMPGenerator 29/12/2018 www.rd-alliance.org - @resdatall

Project and post-project phase - demo 1 https://github.com/mdietrichstein/digitalpreservation-dmp 29/12/2018 www.rd-alliance.org - @resdatall

Project and post-project phase - demo 2 https://github.com/alexschwarzresearch/DMPlanner 29/12/2018 www.rd-alliance.org - @resdatall

www.rd-alliance.org - @resdatall maDMPs use cases 29/12/2018 www.rd-alliance.org - @resdatall

BPMN process - overview Business Process Modelling Notation (BPMN) Defined 10 workflows 29/12/2018 www.rd-alliance.org - @resdatall

www.rd-alliance.org - @resdatall Start DMP 29/12/2018 www.rd-alliance.org - @resdatall

www.rd-alliance.org - @resdatall Get Cost / Storage   29/12/2018 www.rd-alliance.org - @resdatall

www.rd-alliance.org - @resdatall Exercise Work in groups analyse BPMN processes play with tools http://tool2.rda-ws-tpdl2018.sysresearch.org http://tool3.rda-ws-tpdl2018.sysresearch.org Discuss what would you change in the workflows? do the workflows fit into the context of your organisation/domain? which other stakeholder needs should be addressed? which other systems can be used? what else could be automated? Report back 29/12/2018 www.rd-alliance.org - @resdatall

www.rd-alliance.org - @resdatall Coffee break 29/12/2018 www.rd-alliance.org - @resdatall

www.rd-alliance.org - @resdatall https://goo.gl/LPqYag Add your notes Add your registration data (name, tpdl-id) 29/12/2018 www.rd-alliance.org - @resdatall

www.rd-alliance.org - @resdatall Models for maDMPs Exercise 2 29/12/2018 www.rd-alliance.org - @resdatall

Generated landing page of a DMP https://oblassers.github.io/fair-data-science/

Metadata for specific data objects

Metadata - details { "@type": "dmp:Container", "github": "https://github.com/oblassers/fair-data-science/blob/master/Dockerfile", "dc:title": "Dockerfile", "dmp:hasIntelectualPropertyRights": { "dcterms:license": "https://opensource.org/licenses/MIT" }, "dmp:hasMetadata": { "dcterms:description": "Dockerfile", "premis:hasObjectCharacteristics": { "premis:fixity": { "premis:hasMessageDigestAlgorithm": "premis:Fixity:SHA-256", "premis:messageDigest": "a16c7c70cccd3b706d0e64038675a0b302c6250a159fd27b4f069565e1464797" } "dmp:hasDataVolume": "103 bytes" https://github.com/oblassers/fair-data-science/blob/master/dmp.json

User story consultation Goals identify stakeholders at each lifecycle stage define which information they provide define which information they expect As a <stakeholder>, I want <goal> so that <reason >. As a researcher, I want to inform repository operator on the amount of data in the planning phase, so that they provide information on costs. https://github.com/RDA-DMP-Common/user-stories/ 29/12/2018 www.rd-alliance.org - @resdatall

User story consultation https://github.com/RDA-DMP-Common/user-stories/ 100+ issues defined inputs from Europe and Australia inputs from individuals and workshops 29/12/2018 www.rd-alliance.org - @resdatall

www.rd-alliance.org - @resdatall User story labelling https://github.com/RDA-DMP-Common/user-stories/projects/2 Reviewed by chairs and authors classified in scope - useful for model definition out of scope – often referring to the ecosystem, practices – important but not directly for the common data model labelled 29/12/2018 www.rd-alliance.org - @resdatall

www.rd-alliance.org - @resdatall User story labelling https://github.com/RDA-DMP-Common/user-stories/wiki 3 major categories (colours) stakeholders involved project phase subject of information conveyed access control volume financial licensing metadata repository security storage etc. 29/12/2018 www.rd-alliance.org - @resdatall

User story visualisation https://goo.gl/znBL3F interactive visualisation - changes on GitHub are visible immediately shows relations between stakeholders, phases and information 29/12/2018 www.rd-alliance.org - @resdatall

From user stories to requirements https://docs.google.com/document/d/1sWVy0Rqj9fGsjs6GyFnBd3fH6XF2088zjK8U-1wLq4c/edit?usp=sharing Refactoring of user stories Goal: finding overlaps, gaps, duplicates Example below Metadata taxonomy/classification [14,11] Links to metadata of the real data [89, 39] Funder information [7] Link publications to data [55] Authorship [88] Multilingual metadata [65] Include raw metadata directly in the model [91, 85] 29/12/2018 www.rd-alliance.org - @resdatall

From user stories to requirements https://docs.google.com/document/d/1sWVy0Rqj9fGsjs6GyFnBd3fH6XF2088zjK8U-1wLq4c/edit?usp=sharing Refactoring of user stories Goal: finding overlaps, gaps, duplicates Example below Metadata taxonomy/classification [14,11] Links to metadata of the real data [89, 39] Funder information [7] Link publications to data [55] Authorship [88] Multilingual metadata [65] Include raw metadata directly in the model [91, 85] ‘yellow’ label used to classify user stories 29/12/2018 www.rd-alliance.org - @resdatall

From user stories to requirements https://docs.google.com/document/d/1sWVy0Rqj9fGsjs6GyFnBd3fH6XF2088zjK8U-1wLq4c/edit?usp=sharing Refactoring of user stories Goal: finding overlaps, gaps, duplicates Example below Metadata taxonomy/classification [14,11] Links to metadata of the real data [89, 39] Funder information [7] Link publications to data [55] Authorship [88] Multilingual metadata [65] Include raw metadata directly in the model [91, 85] short summary of what user stories are about – more specific requirements 29/12/2018 www.rd-alliance.org - @resdatall

From user stories to requirements https://docs.google.com/document/d/1sWVy0Rqj9fGsjs6GyFnBd3fH6XF2088zjK8U-1wLq4c/edit?usp=sharing Refactoring of user stories Goal: finding overlaps, gaps, duplicates Example below Metadata taxonomy/classification [14,11] Links to metadata of the real data [89, 39] Funder information [7] Link publications to data [55] Authorship [88] Multilingual metadata [65] Include raw metadata directly in the model [91, 85] IDs of user stories (to keep connection to the GitHub consultation) 29/12/2018 www.rd-alliance.org - @resdatall

Requirements grouping Similar requirements exist under different labels Example information on the author of the DMP is relevant for administrative activities reuse We split requirements and grouped them using five categories Administrative, Roles and Responsibilities Data Infrastructure Security, Privacy and Access Control Policies, legal and ethical aspects 29/12/2018 www.rd-alliance.org - @resdatall

Requirements grouping example (Data) Format Format [80, 12, 99, 62, 67, 54, 80] Volume Data size estimate [5, 77, 80, 100] For specific type of data [62] Data size real [54] Provenance [54] Metadata taxonomy/classification [14,11] Links to metadata of the real data [89, 39] Link publications to data [55] Authorship [88] Multilingual metadata [65] Include raw metadata directly in the model [91, 85] Reuse Links to (meta-)data location [89, 90, 56, 39, 60] Repository [42] Persistent identifier for data [92] Link publications to data [55, 88] Link to License/Contract allowing data usage/storing [56] Note: we did not move all requirements falling under a specific label, but only a subset that is relevant in this context – in the given example, relevant for data description. Other requirements for Reuse were put into other categories. 29/12/2018 www.rd-alliance.org - @resdatall

www.rd-alliance.org - @resdatall Next steps 1st consultation (user stories) went broad helped us defined the scope of the maDMPs what information should a maDMP contain? who provides and uses this information? 2nd consultation will go deep how do we model specific requirements which specific fields are needed? which models exist? 29/12/2018 www.rd-alliance.org - @resdatall

Consultation 2 – ‘going deep’ https://goo.gl/DRieP4 5 documents to collect requirements, models, specific fields, etc. Administrative, Roles and Responsibilities Data Infrastructure Security, Privacy and Access Control Policies, legal and ethical aspects 29/12/2018 www.rd-alliance.org - @resdatall

www.rd-alliance.org - @resdatall Exercise Work in groups Discuss which specific fields are needed? what models/dictionaries/standards can be reused? Add your ideas to the documents https://goo.gl/DRieP4 Report back 29/12/2018 www.rd-alliance.org - @resdatall

www.rd-alliance.org - @resdatall Wrap-up 29/12/2018 www.rd-alliance.org - @resdatall

www.rd-alliance.org - @resdatall Wrap-up What can we automate? Which services will always be manually performed? How broad can we integrate? university-wise? country-wise? European-wise? Which services can we share? 29/12/2018 www.rd-alliance.org - @resdatall

www.rd-alliance.org - @resdatall 10 principles for maDMPs http://doi.org/10.5281/zenodo.1172673 29/12/2018 www.rd-alliance.org - @resdatall

Defining requirements for machine-actionable Data Management Plans http://ifs.tuwien.ac.at/~miksa/papers/2018-iPres- maDMPs.pdf 29/12/2018 www.rd-alliance.org - @resdatall

www.rd-alliance.org - @resdatall RDA Plenary meeting 29/12/2018 www.rd-alliance.org - @resdatall

https://www.rd-alliance.org/groups/dmp-common- standards-wg Thank you! https://www.rd-alliance.org/groups/dmp-common- standards-wg 29/12/2018 www.rd-alliance.org - @resdatall

www.rd-alliance.org - @resdatall 29/12/2018 www.rd-alliance.org - @resdatall

Data model Top level vocabulary Based on DMP themes Extended by domain specific standards OWL ontology : https://purl.org/madpms

Example