Download presentation
Presentation is loading. Please wait.
Domain Specific Extensions for Machine-actionable Data Management Plans
Tomasz Miksa, SBA Research, Wien, Austria João Cardoso, INESC-ID, Lisboa Portugal Paul Walk, Antleaf Ltd, United Kingdom, 29/12/2018 - @resdatall
2 - @resdatall
Agenda You learn about us We learn about you Exercise 1 – processes for maDMPs demo of new tools and processes work in groups + discussion Coffee break Exercise 2 – models for maDMPs demo of new tools and models Wrap-up 29/12/2018 - @resdatall
3 - @resdatall
you learn about us 29/12/2018 - @resdatall
Data Management Plans (DMPs)
manually created text documents considered as bureaucracy created too late vague depend on human factor scrupulousness awareness
Data Management Plans How to discover these tools? Which one do I need to use? Why do I have to provide the same information again? Why haven’t they consulted us before? Who is going to pay for this? We don’t have enough people for that!
Research data lifecycle
Stakeholders involved in research data management require information at certain stages can provide information if requested at a proper stage Many problems can be avoided when timing is right information flow is ensured => many stakeholders involved who need information can provide it if asked at proper stage
Automated Data Management Workflow
Machine-actionable DMPs
living documents automate data management collect information from systems trigger actions in systems facilitate validation This requires well-defined RDM workflows data management infrastructure common data model "dc:creator":[ { "foaf:name":"John Smith", "madmp:institution":"AT-TU-Wien" } ],
9 - @resdatall
Example Current DMPs – model questionnaires <administrative_data> <question>Who will be the Principle Investigator?</question> <answer>The PI will be John Smith from our university.</answer> </administrative_data> Machine-actionable DMPs – model information "dc:creator":[ { "foaf:name":"John Smith", "madmp:institution":" AT-Vienna-University-of-Technology" } ], dc, foaf – reuse existing standards madmp – create own when needed AT Vienna – use controled vocabularies when possible or PIDs for instituion ORCID – use PIDs when possible 29/12/2018 - @resdatall
Example Currently available – not very useful Machine-actionable DMP
<administrative_data> <question>Who will be the Principle Investigator?</question> <answer>The PI will be John Smith from our university.</answer> </administrative_data> Machine-actionable DMP "dc:creator":[ { "foaf:name":"John Smith", "madmp:institution":"AT-Vienna-University-of-Technology" } ], Reuse existing standards, e.g. Dublin Core, PREMIS, etc. dc, foaf – reuse existing standards madmp – create own when needed AT Vienna – use controled vocabularies when possible or PIDs for instituion ORCID – use PIDs when possible 29/12/2018 - @resdatall
Example Currently available – not very useful Machine-actionable DMP
<administrative_data> <question>Who will be the Principle Investigator?</question> <answer>The PI will be John Smith from our university.</answer> </administrative_data> Machine-actionable DMP "dc:creator":[ { "foaf:name":"John Smith", "madmp:institution":"AT-Vienna-University-of-Technology" } ], Use PIDs whenever possible, e.g. ORCID dc, foaf – reuse existing standards madmp – create own when needed AT Vienna – use controled vocabularies when possible or PIDs for instituion ORCID – use PIDs when possible 29/12/2018 - @resdatall
Example Currently available – not very useful Machine-actionable DMP
<administrative_data> <question>Who will be the Principle Investigator?</question> <answer>The PI will be John Smith from our university.</answer> </administrative_data> Machine-actionable DMP "dc:creator":[ { "foaf:name":"John Smith", "madmp:institution":"AT-Vienna-University-of-Technology" } ], Use controlled vocabularies dc, foaf – reuse existing standards madmp – create own when needed AT Vienna – use controled vocabularies when possible or PIDs for instituion ORCID – use PIDs when possible 29/12/2018 - @resdatall
Example Currently available – not very useful Machine-actionable DMP
<administrative_data> <question>Who will be the Principle Investigator?</question> <answer>The PI will be John Smith from our university.</answer> </administrative_data> Machine-actionable DMP "dc:creator":[ { "foaf:name":"John Smith", "madmp:institution":"AT-Vienna-University-of-Technology" } ], Develop own concepts and vocabularies only when needed dc, foaf – reuse existing standards madmp – create own when needed AT Vienna – use controled vocabularies when possible or PIDs for instituion ORCID – use PIDs when possible 29/12/2018 - @resdatall
Machine-actionable DMPs
15 - @resdatall
What is RDA Research Data Alliance community-driven organization 6,000 members from 130 countries different stakeholders Plenary meetings Interest Groups (IGs) Active DMPs Working Groups (WGs) DMP Common Standards 29/12/2018 - @resdatall
DMP Common Standards - Outputs
Common data model for machine-actionable DMPs to model information from standard DMPs NOT a template NOT a questionnaire modular design core set of elements domain specific extensions Reference implementations ready to use models JSON, XML, RDF, etc. Guidelines for adoption of the common data model requirements for supporting systems pilot studies 29/12/2018 - @resdatall
17 - @resdatall
29/12/2018 - @resdatall
18 - @resdatall
Workshop objectives Evaluate processes that read from/write to maDMPs which systems can be integrated? how maDMPs connect to activities of various stakeholders? Identify how information must be modelled in maDMPs which specific fields are needed? what models/dictionaries/standards can be reused? identify stakeholders at each lifecycle stage e.g. who can provide cost estimations when planning? identify how available information changes over the lifetime of a DMP e.g. when exact size of data is important and when not? how need for information changes over the lifetime of a DMP e.g. is a persistent identifier important to funders in the planning phase or when a project ends? 29/12/2018 - @resdatall
19 - @resdatall
we learn about you 29/12/2018 - @resdatall
20 - @resdatall
Introduction round Your name Your location Your role What’s the most important thing you want to get out of this meeting? 29/12/2018 - @resdatall
21 - @resdatall
Processes for maDMPs Exercise 1 29/12/2018 - @resdatall
Horizon 2020 DMP survey report
Horizon 2020 template for Data Management Plans (Version 1.0.0) [Data set]. Zenodo. 29/12/2018 - @resdatall
Planning phase Goal: get estimations and recommendations (which are feasible to implement later) BASIC INFO ADMINISTRATIVE DATA ANALYSE DATA FIND REPOSITORY SELECT LICENSE GENERATE DMP John Smith
Project and Post-project phases
Goal: update DMP with real information by re-using (linking) information provided elsewhere OAI-PMH BASIC INFO ADMINISTRATIVE DATA GET METADATA PRESERVATION GENERATE DMP John Smith 10 YEARS
25 - @resdatall
Planning phase - demo 29/12/2018 - @resdatall
Project and post-project phase - demo 1
29/12/2018 - @resdatall
Project and post-project phase - demo 2
29/12/2018 - @resdatall
28 - @resdatall
maDMPs use cases 29/12/2018 - @resdatall
BPMN process - overview
Business Process Modelling Notation (BPMN) Defined 10 workflows 29/12/2018 - @resdatall
30 - @resdatall
Start DMP 29/12/2018 - @resdatall
31 - @resdatall
Get Cost / Storage 29/12/2018 - @resdatall
32 - @resdatall
Exercise Work in groups analyse BPMN processes play with tools Discuss what would you change in the workflows? do the workflows fit into the context of your organisation/domain? which other stakeholder needs should be addressed? which other systems can be used? what else could be automated? Report back 29/12/2018 - @resdatall
33 - @resdatall
Coffee break 29/12/2018 - @resdatall
34 - @resdatall
Add your notes Add your registration data (name, tpdl-id) 29/12/2018 - @resdatall
35 - @resdatall
Models for maDMPs Exercise 2 29/12/2018 - @resdatall
Generated landing page of a DMP
Metadata for specific data objects
Metadata - details { "@type": "dmp:Container",
"github": " "dc:title": "Dockerfile", "dmp:hasIntelectualPropertyRights": { "dcterms:license": " }, "dmp:hasMetadata": { "dcterms:description": "Dockerfile", "premis:hasObjectCharacteristics": { "premis:fixity": { "premis:hasMessageDigestAlgorithm": "premis:Fixity:SHA-256", "premis:messageDigest": "a16c7c70cccd3b706d0e a0b302c6250a159fd27b4f069565e " } "dmp:hasDataVolume": "103 bytes"
User story consultation
Goals identify stakeholders at each lifecycle stage define which information they provide define which information they expect As a <stakeholder>, I want <goal> so that <reason >. As a researcher, I want to inform repository operator on the amount of data in the planning phase, so that they provide information on costs. 29/12/2018 - @resdatall
User story consultation
100+ issues defined inputs from Europe and Australia inputs from individuals and workshops 29/12/2018 - @resdatall
42 - @resdatall
User story labelling Reviewed by chairs and authors classified in scope - useful for model definition out of scope – often referring to the ecosystem, practices – important but not directly for the common data model labelled 29/12/2018 - @resdatall
43 - @resdatall
User story labelling 3 major categories (colours) stakeholders involved project phase subject of information conveyed access control volume financial licensing metadata repository security storage etc. 29/12/2018 - @resdatall
User story visualisation
interactive visualisation - changes on GitHub are visible immediately shows relations between stakeholders, phases and information 29/12/2018 - @resdatall
From user stories to requirements
Refactoring of user stories Goal: finding overlaps, gaps, duplicates Example below Metadata taxonomy/classification [14,11] Links to metadata of the real data [89, 39] Funder information [7] Link publications to data [55] Authorship [88] Multilingual metadata [65] Include raw metadata directly in the model [91, 85] 29/12/2018 - @resdatall
From user stories to requirements
Refactoring of user stories Goal: finding overlaps, gaps, duplicates Example below Metadata taxonomy/classification [14,11] Links to metadata of the real data [89, 39] Funder information [7] Link publications to data [55] Authorship [88] Multilingual metadata [65] Include raw metadata directly in the model [91, 85] ‘yellow’ label used to classify user stories 29/12/2018 - @resdatall
From user stories to requirements
Refactoring of user stories Goal: finding overlaps, gaps, duplicates Example below Metadata taxonomy/classification [14,11] Links to metadata of the real data [89, 39] Funder information [7] Link publications to data [55] Authorship [88] Multilingual metadata [65] Include raw metadata directly in the model [91, 85] short summary of what user stories are about – more specific requirements 29/12/2018 - @resdatall
From user stories to requirements
Refactoring of user stories Goal: finding overlaps, gaps, duplicates Example below Metadata taxonomy/classification [14,11] Links to metadata of the real data [89, 39] Funder information [7] Link publications to data [55] Authorship [88] Multilingual metadata [65] Include raw metadata directly in the model [91, 85] IDs of user stories (to keep connection to the GitHub consultation) 29/12/2018 - @resdatall
Requirements grouping
Similar requirements exist under different labels Example information on the author of the DMP is relevant for administrative activities reuse We split requirements and grouped them using five categories Administrative, Roles and Responsibilities Data Infrastructure Security, Privacy and Access Control Policies, legal and ethical aspects 29/12/2018 - @resdatall
Requirements grouping example (Data)
Format Format [80, 12, 99, 62, 67, 54, 80] Volume Data size estimate [5, 77, 80, 100] For specific type of data [62] Data size real [54] Provenance [54] Metadata taxonomy/classification [14,11] Links to metadata of the real data [89, 39] Link publications to data [55] Authorship [88] Multilingual metadata [65] Include raw metadata directly in the model [91, 85] Reuse Links to (meta-)data location [89, 90, 56, 39, 60] Repository [42] Persistent identifier for data [92] Link publications to data [55, 88] Link to License/Contract allowing data usage/storing [56] Note: we did not move all requirements falling under a specific label, but only a subset that is relevant in this context – in the given example, relevant for data description. Other requirements for Reuse were put into other categories. 29/12/2018 - @resdatall
51 - @resdatall
Next steps 1st consultation (user stories) went broad helped us defined the scope of the maDMPs what information should a maDMP contain? who provides and uses this information? 2nd consultation will go deep how do we model specific requirements which specific fields are needed? which models exist? 29/12/2018 - @resdatall
Consultation 2 – ‘going deep’
5 documents to collect requirements, models, specific fields, etc. Administrative, Roles and Responsibilities Data Infrastructure Security, Privacy and Access Control Policies, legal and ethical aspects 29/12/2018 - @resdatall
53 - @resdatall
Exercise Work in groups Discuss which specific fields are needed? what models/dictionaries/standards can be reused? Add your ideas to the documents Report back 29/12/2018 - @resdatall
54 - @resdatall
Wrap-up 29/12/2018 - @resdatall
55 - @resdatall
Wrap-up What can we automate? Which services will always be manually performed? How broad can we integrate? university-wise? country-wise? European-wise? Which services can we share? 29/12/2018 - @resdatall
56 - @resdatall
10 principles for maDMPs 29/12/2018 - @resdatall
Defining requirements for machine-actionable Data Management Plans
maDMPs.pdf 29/12/2018 - @resdatall
58 - @resdatall
RDA Plenary meeting 29/12/2018 - @resdatall
59 standards-wg
Thank you! standards-wg 29/12/2018 - @resdatall
60 - @resdatall
29/12/2018 - @resdatall
Data model Top level vocabulary Based on DMP themes
Extended by domain specific standards OWL ontology :
Similar presentations
© 2025 Inc.
All rights reserved.