Domain Specific Extensions for Machine-actionable Data Management Plans Tomasz Miksa, SBA Research, Wien, Austria João Cardoso, INESC-ID, Lisboa Portugal Paul Walk, Antleaf Ltd, United Kingdom, 29/12/2018 www.rd-alliance.org - @resdatall
www.rd-alliance.org - @resdatall Agenda You learn about us We learn about you Exercise 1 – processes for maDMPs demo of new tools and processes work in groups + discussion Coffee break Exercise 2 – models for maDMPs demo of new tools and models Wrap-up 29/12/2018 www.rd-alliance.org - @resdatall
www.rd-alliance.org - @resdatall you learn about us 29/12/2018 www.rd-alliance.org - @resdatall
Data Management Plans (DMPs) manually created text documents considered as bureaucracy created too late vague depend on human factor scrupulousness awareness
Data Management Plans How to discover these tools? Which one do I need to use? Why do I have to provide the same information again? Why haven’t they consulted us before? Who is going to pay for this? We don’t have enough people for that!
Research data lifecycle Stakeholders involved in research data management require information at certain stages can provide information if requested at a proper stage Many problems can be avoided when timing is right information flow is ensured => many stakeholders involved who need information can provide it if asked at proper stage
Automated Data Management Workflow
Machine-actionable DMPs living documents automate data management collect information from systems trigger actions in systems facilitate validation This requires well-defined RDM workflows data management infrastructure common data model "dc:creator":[ { "foaf:name":"John Smith", "@id":"orcid.org/0000-1111-2222-3333", "foaf:mbox":"mailto:jsmith@tuwien.ac.at", "madmp:institution":"AT-TU-Wien" } ],
www.rd-alliance.org - @resdatall Example Current DMPs – model questionnaires <administrative_data> <question>Who will be the Principle Investigator?</question> <answer>The PI will be John Smith from our university.</answer> </administrative_data> Machine-actionable DMPs – model information "dc:creator":[ { "foaf:name":"John Smith", "@id":"orcid.org/0000-1111-2222-3333", "foaf:mbox":"mailto:jsmith@tuwien.ac.at", "madmp:institution":" AT-Vienna-University-of-Technology" } ], dc, foaf – reuse existing standards madmp – create own when needed AT Vienna – use controled vocabularies when possible or PIDs for instituion ORCID – use PIDs when possible 29/12/2018 www.rd-alliance.org - @resdatall
Example Currently available – not very useful Machine-actionable DMP <administrative_data> <question>Who will be the Principle Investigator?</question> <answer>The PI will be John Smith from our university.</answer> </administrative_data> Machine-actionable DMP "dc:creator":[ { "foaf:name":"John Smith", "@id":"orcid.org/0000-1111-2222-3333", "foaf:mbox":"mailto:jsmith@tuwien.ac.at", "madmp:institution":"AT-Vienna-University-of-Technology" } ], Reuse existing standards, e.g. Dublin Core, PREMIS, etc. dc, foaf – reuse existing standards madmp – create own when needed AT Vienna – use controled vocabularies when possible or PIDs for instituion ORCID – use PIDs when possible 29/12/2018 www.rd-alliance.org - @resdatall
Example Currently available – not very useful Machine-actionable DMP <administrative_data> <question>Who will be the Principle Investigator?</question> <answer>The PI will be John Smith from our university.</answer> </administrative_data> Machine-actionable DMP "dc:creator":[ { "foaf:name":"John Smith", "@id":"orcid.org/0000-1111-2222-3333", "foaf:mbox":"mailto:jsmith@tuwien.ac.at", "madmp:institution":"AT-Vienna-University-of-Technology" } ], Use PIDs whenever possible, e.g. ORCID dc, foaf – reuse existing standards madmp – create own when needed AT Vienna – use controled vocabularies when possible or PIDs for instituion ORCID – use PIDs when possible 29/12/2018 www.rd-alliance.org - @resdatall
Example Currently available – not very useful Machine-actionable DMP <administrative_data> <question>Who will be the Principle Investigator?</question> <answer>The PI will be John Smith from our university.</answer> </administrative_data> Machine-actionable DMP "dc:creator":[ { "foaf:name":"John Smith", "@id":"orcid.org/0000-1111-2222-3333", "foaf:mbox":"mailto:jsmith@tuwien.ac.at", "madmp:institution":"AT-Vienna-University-of-Technology" } ], Use controlled vocabularies dc, foaf – reuse existing standards madmp – create own when needed AT Vienna – use controled vocabularies when possible or PIDs for instituion ORCID – use PIDs when possible 29/12/2018 www.rd-alliance.org - @resdatall
Example Currently available – not very useful Machine-actionable DMP <administrative_data> <question>Who will be the Principle Investigator?</question> <answer>The PI will be John Smith from our university.</answer> </administrative_data> Machine-actionable DMP "dc:creator":[ { "foaf:name":"John Smith", "@id":"orcid.org/0000-1111-2222-3333", "foaf:mbox":"mailto:jsmith@tuwien.ac.at", "madmp:institution":"AT-Vienna-University-of-Technology" } ], Develop own concepts and vocabularies only when needed dc, foaf – reuse existing standards madmp – create own when needed AT Vienna – use controled vocabularies when possible or PIDs for instituion ORCID – use PIDs when possible 29/12/2018 www.rd-alliance.org - @resdatall
Machine-actionable DMPs
www.rd-alliance.org - @resdatall What is RDA Research Data Alliance community-driven organization 6,000 members from 130 countries different stakeholders Plenary meetings Interest Groups (IGs) Active DMPs Working Groups (WGs) DMP Common Standards 29/12/2018 www.rd-alliance.org - @resdatall
DMP Common Standards - Outputs Common data model for machine-actionable DMPs to model information from standard DMPs NOT a template NOT a questionnaire modular design core set of elements domain specific extensions Reference implementations ready to use models JSON, XML, RDF, etc. Guidelines for adoption of the common data model requirements for supporting systems pilot studies 29/12/2018 www.rd-alliance.org - @resdatall
www.rd-alliance.org - @resdatall https://www.rd-alliance.org/groups/dmp-common-standards-wg 29/12/2018 www.rd-alliance.org - @resdatall
www.rd-alliance.org - @resdatall Workshop objectives Evaluate processes that read from/write to maDMPs which systems can be integrated? how maDMPs connect to activities of various stakeholders? Identify how information must be modelled in maDMPs which specific fields are needed? what models/dictionaries/standards can be reused? identify stakeholders at each lifecycle stage e.g. who can provide cost estimations when planning? identify how available information changes over the lifetime of a DMP e.g. when exact size of data is important and when not? how need for information changes over the lifetime of a DMP e.g. is a persistent identifier important to funders in the planning phase or when a project ends? 29/12/2018 www.rd-alliance.org - @resdatall
www.rd-alliance.org - @resdatall we learn about you 29/12/2018 www.rd-alliance.org - @resdatall
www.rd-alliance.org - @resdatall Introduction round Your name Your location Your role What’s the most important thing you want to get out of this meeting? 29/12/2018 www.rd-alliance.org - @resdatall
www.rd-alliance.org - @resdatall Processes for maDMPs Exercise 1 29/12/2018 www.rd-alliance.org - @resdatall
Horizon 2020 DMP survey report Horizon 2020 template for Data Management Plans (Version 1.0.0) [Data set]. Zenodo. http://doi.org/10.5281/zenodo.1120245 29/12/2018 www.rd-alliance.org - @resdatall
Planning phase Goal: get estimations and recommendations (which are feasible to implement later) BASIC INFO ADMINISTRATIVE DATA ANALYSE DATA FIND REPOSITORY SELECT LICENSE GENERATE DMP John Smith
Project and Post-project phases Goal: update DMP with real information by re-using (linking) information provided elsewhere OAI-PMH BASIC INFO ADMINISTRATIVE DATA GET METADATA PRESERVATION GENERATE DMP John Smith 10 YEARS
www.rd-alliance.org - @resdatall Planning phase - demo https://github.com/IrinaAvram/DMPGenerator 29/12/2018 www.rd-alliance.org - @resdatall
Project and post-project phase - demo 1 https://github.com/mdietrichstein/digitalpreservation-dmp 29/12/2018 www.rd-alliance.org - @resdatall
Project and post-project phase - demo 2 https://github.com/alexschwarzresearch/DMPlanner 29/12/2018 www.rd-alliance.org - @resdatall
www.rd-alliance.org - @resdatall maDMPs use cases 29/12/2018 www.rd-alliance.org - @resdatall
BPMN process - overview Business Process Modelling Notation (BPMN) Defined 10 workflows 29/12/2018 www.rd-alliance.org - @resdatall
www.rd-alliance.org - @resdatall Start DMP 29/12/2018 www.rd-alliance.org - @resdatall
www.rd-alliance.org - @resdatall Get Cost / Storage 29/12/2018 www.rd-alliance.org - @resdatall
www.rd-alliance.org - @resdatall Exercise Work in groups analyse BPMN processes play with tools http://tool2.rda-ws-tpdl2018.sysresearch.org http://tool3.rda-ws-tpdl2018.sysresearch.org Discuss what would you change in the workflows? do the workflows fit into the context of your organisation/domain? which other stakeholder needs should be addressed? which other systems can be used? what else could be automated? Report back 29/12/2018 www.rd-alliance.org - @resdatall
www.rd-alliance.org - @resdatall Coffee break 29/12/2018 www.rd-alliance.org - @resdatall
www.rd-alliance.org - @resdatall https://goo.gl/LPqYag Add your notes Add your registration data (name, tpdl-id) 29/12/2018 www.rd-alliance.org - @resdatall
www.rd-alliance.org - @resdatall Models for maDMPs Exercise 2 29/12/2018 www.rd-alliance.org - @resdatall
Generated landing page of a DMP https://oblassers.github.io/fair-data-science/
Metadata for specific data objects
Metadata - details { "@type": "dmp:Container", "github": "https://github.com/oblassers/fair-data-science/blob/master/Dockerfile", "dc:title": "Dockerfile", "dmp:hasIntelectualPropertyRights": { "dcterms:license": "https://opensource.org/licenses/MIT" }, "dmp:hasMetadata": { "dcterms:description": "Dockerfile", "premis:hasObjectCharacteristics": { "premis:fixity": { "premis:hasMessageDigestAlgorithm": "premis:Fixity:SHA-256", "premis:messageDigest": "a16c7c70cccd3b706d0e64038675a0b302c6250a159fd27b4f069565e1464797" } "dmp:hasDataVolume": "103 bytes" https://github.com/oblassers/fair-data-science/blob/master/dmp.json
User story consultation Goals identify stakeholders at each lifecycle stage define which information they provide define which information they expect As a <stakeholder>, I want <goal> so that <reason >. As a researcher, I want to inform repository operator on the amount of data in the planning phase, so that they provide information on costs. https://github.com/RDA-DMP-Common/user-stories/ 29/12/2018 www.rd-alliance.org - @resdatall
User story consultation https://github.com/RDA-DMP-Common/user-stories/ 100+ issues defined inputs from Europe and Australia inputs from individuals and workshops 29/12/2018 www.rd-alliance.org - @resdatall
www.rd-alliance.org - @resdatall User story labelling https://github.com/RDA-DMP-Common/user-stories/projects/2 Reviewed by chairs and authors classified in scope - useful for model definition out of scope – often referring to the ecosystem, practices – important but not directly for the common data model labelled 29/12/2018 www.rd-alliance.org - @resdatall
www.rd-alliance.org - @resdatall User story labelling https://github.com/RDA-DMP-Common/user-stories/wiki 3 major categories (colours) stakeholders involved project phase subject of information conveyed access control volume financial licensing metadata repository security storage etc. 29/12/2018 www.rd-alliance.org - @resdatall
User story visualisation https://goo.gl/znBL3F interactive visualisation - changes on GitHub are visible immediately shows relations between stakeholders, phases and information 29/12/2018 www.rd-alliance.org - @resdatall
From user stories to requirements https://docs.google.com/document/d/1sWVy0Rqj9fGsjs6GyFnBd3fH6XF2088zjK8U-1wLq4c/edit?usp=sharing Refactoring of user stories Goal: finding overlaps, gaps, duplicates Example below Metadata taxonomy/classification [14,11] Links to metadata of the real data [89, 39] Funder information [7] Link publications to data [55] Authorship [88] Multilingual metadata [65] Include raw metadata directly in the model [91, 85] 29/12/2018 www.rd-alliance.org - @resdatall
From user stories to requirements https://docs.google.com/document/d/1sWVy0Rqj9fGsjs6GyFnBd3fH6XF2088zjK8U-1wLq4c/edit?usp=sharing Refactoring of user stories Goal: finding overlaps, gaps, duplicates Example below Metadata taxonomy/classification [14,11] Links to metadata of the real data [89, 39] Funder information [7] Link publications to data [55] Authorship [88] Multilingual metadata [65] Include raw metadata directly in the model [91, 85] ‘yellow’ label used to classify user stories 29/12/2018 www.rd-alliance.org - @resdatall
From user stories to requirements https://docs.google.com/document/d/1sWVy0Rqj9fGsjs6GyFnBd3fH6XF2088zjK8U-1wLq4c/edit?usp=sharing Refactoring of user stories Goal: finding overlaps, gaps, duplicates Example below Metadata taxonomy/classification [14,11] Links to metadata of the real data [89, 39] Funder information [7] Link publications to data [55] Authorship [88] Multilingual metadata [65] Include raw metadata directly in the model [91, 85] short summary of what user stories are about – more specific requirements 29/12/2018 www.rd-alliance.org - @resdatall
From user stories to requirements https://docs.google.com/document/d/1sWVy0Rqj9fGsjs6GyFnBd3fH6XF2088zjK8U-1wLq4c/edit?usp=sharing Refactoring of user stories Goal: finding overlaps, gaps, duplicates Example below Metadata taxonomy/classification [14,11] Links to metadata of the real data [89, 39] Funder information [7] Link publications to data [55] Authorship [88] Multilingual metadata [65] Include raw metadata directly in the model [91, 85] IDs of user stories (to keep connection to the GitHub consultation) 29/12/2018 www.rd-alliance.org - @resdatall
Requirements grouping Similar requirements exist under different labels Example information on the author of the DMP is relevant for administrative activities reuse We split requirements and grouped them using five categories Administrative, Roles and Responsibilities Data Infrastructure Security, Privacy and Access Control Policies, legal and ethical aspects 29/12/2018 www.rd-alliance.org - @resdatall
Requirements grouping example (Data) Format Format [80, 12, 99, 62, 67, 54, 80] Volume Data size estimate [5, 77, 80, 100] For specific type of data [62] Data size real [54] Provenance [54] Metadata taxonomy/classification [14,11] Links to metadata of the real data [89, 39] Link publications to data [55] Authorship [88] Multilingual metadata [65] Include raw metadata directly in the model [91, 85] Reuse Links to (meta-)data location [89, 90, 56, 39, 60] Repository [42] Persistent identifier for data [92] Link publications to data [55, 88] Link to License/Contract allowing data usage/storing [56] Note: we did not move all requirements falling under a specific label, but only a subset that is relevant in this context – in the given example, relevant for data description. Other requirements for Reuse were put into other categories. 29/12/2018 www.rd-alliance.org - @resdatall
www.rd-alliance.org - @resdatall Next steps 1st consultation (user stories) went broad helped us defined the scope of the maDMPs what information should a maDMP contain? who provides and uses this information? 2nd consultation will go deep how do we model specific requirements which specific fields are needed? which models exist? 29/12/2018 www.rd-alliance.org - @resdatall
Consultation 2 – ‘going deep’ https://goo.gl/DRieP4 5 documents to collect requirements, models, specific fields, etc. Administrative, Roles and Responsibilities Data Infrastructure Security, Privacy and Access Control Policies, legal and ethical aspects 29/12/2018 www.rd-alliance.org - @resdatall
www.rd-alliance.org - @resdatall Exercise Work in groups Discuss which specific fields are needed? what models/dictionaries/standards can be reused? Add your ideas to the documents https://goo.gl/DRieP4 Report back 29/12/2018 www.rd-alliance.org - @resdatall
www.rd-alliance.org - @resdatall Wrap-up 29/12/2018 www.rd-alliance.org - @resdatall
www.rd-alliance.org - @resdatall Wrap-up What can we automate? Which services will always be manually performed? How broad can we integrate? university-wise? country-wise? European-wise? Which services can we share? 29/12/2018 www.rd-alliance.org - @resdatall
www.rd-alliance.org - @resdatall 10 principles for maDMPs http://doi.org/10.5281/zenodo.1172673 29/12/2018 www.rd-alliance.org - @resdatall
Defining requirements for machine-actionable Data Management Plans http://ifs.tuwien.ac.at/~miksa/papers/2018-iPres- maDMPs.pdf 29/12/2018 www.rd-alliance.org - @resdatall
www.rd-alliance.org - @resdatall RDA Plenary meeting 29/12/2018 www.rd-alliance.org - @resdatall
https://www.rd-alliance.org/groups/dmp-common- standards-wg Thank you! https://www.rd-alliance.org/groups/dmp-common- standards-wg 29/12/2018 www.rd-alliance.org - @resdatall
www.rd-alliance.org - @resdatall 29/12/2018 www.rd-alliance.org - @resdatall
Data model Top level vocabulary Based on DMP themes Extended by domain specific standards OWL ontology : https://purl.org/madpms
Example