Presentation is loading. Please wait.

Presentation is loading. Please wait.

Tomasz Miksa, SBA Research, Wien, Austria

Similar presentations


Presentation on theme: "Tomasz Miksa, SBA Research, Wien, Austria"— Presentation transcript:

1 Domain Specific Extensions for Machine-actionable Data Management Plans
Tomasz Miksa, SBA Research, Wien, Austria João Cardoso, INESC-ID, Lisboa Portugal Paul Walk, Antleaf Ltd, United Kingdom, 29/12/2018 - @resdatall

2 www.rd-alliance.org - @resdatall
Agenda You learn about us We learn about you Exercise 1 – processes for maDMPs demo of new tools and processes work in groups + discussion Coffee break Exercise 2 – models for maDMPs demo of new tools and models Wrap-up 29/12/2018 - @resdatall

3 www.rd-alliance.org - @resdatall
you learn about us 29/12/2018 - @resdatall

4 Data Management Plans (DMPs)
manually created text documents considered as bureaucracy created too late vague depend on human factor scrupulousness awareness

5 Data Management Plans How to discover these tools? Which one do I need to use? Why do I have to provide the same information again? Why haven’t they consulted us before? Who is going to pay for this? We don’t have enough people for that!

6 Research data lifecycle
Stakeholders involved in research data management require information at certain stages can provide information if requested at a proper stage Many problems can be avoided when timing is right information flow is ensured => many stakeholders involved who need information can provide it if asked at proper stage

7 Automated Data Management Workflow

8 Machine-actionable DMPs
living documents automate data management collect information from systems trigger actions in systems facilitate validation This requires well-defined RDM workflows data management infrastructure common data model "dc:creator":[ { "foaf:name":"John Smith", "madmp:institution":"AT-TU-Wien" } ],

9 www.rd-alliance.org - @resdatall
Example Current DMPs – model questionnaires <administrative_data> <question>Who will be the Principle Investigator?</question> <answer>The PI will be John Smith from our university.</answer> </administrative_data> Machine-actionable DMPs – model information "dc:creator":[ { "foaf:name":"John Smith", "madmp:institution":" AT-Vienna-University-of-Technology" } ], dc, foaf – reuse existing standards madmp – create own when needed AT Vienna – use controled vocabularies when possible or PIDs for instituion ORCID – use PIDs when possible 29/12/2018 - @resdatall

10 Example Currently available – not very useful Machine-actionable DMP
<administrative_data> <question>Who will be the Principle Investigator?</question> <answer>The PI will be John Smith from our university.</answer> </administrative_data> Machine-actionable DMP "dc:creator":[ { "foaf:name":"John Smith", "madmp:institution":"AT-Vienna-University-of-Technology" } ], Reuse existing standards, e.g. Dublin Core, PREMIS, etc. dc, foaf – reuse existing standards madmp – create own when needed AT Vienna – use controled vocabularies when possible or PIDs for instituion ORCID – use PIDs when possible 29/12/2018 - @resdatall

11 Example Currently available – not very useful Machine-actionable DMP
<administrative_data> <question>Who will be the Principle Investigator?</question> <answer>The PI will be John Smith from our university.</answer> </administrative_data> Machine-actionable DMP "dc:creator":[ { "foaf:name":"John Smith", "madmp:institution":"AT-Vienna-University-of-Technology" } ], Use PIDs whenever possible, e.g. ORCID dc, foaf – reuse existing standards madmp – create own when needed AT Vienna – use controled vocabularies when possible or PIDs for instituion ORCID – use PIDs when possible 29/12/2018 - @resdatall

12 Example Currently available – not very useful Machine-actionable DMP
<administrative_data> <question>Who will be the Principle Investigator?</question> <answer>The PI will be John Smith from our university.</answer> </administrative_data> Machine-actionable DMP "dc:creator":[ { "foaf:name":"John Smith", "madmp:institution":"AT-Vienna-University-of-Technology" } ], Use controlled vocabularies dc, foaf – reuse existing standards madmp – create own when needed AT Vienna – use controled vocabularies when possible or PIDs for instituion ORCID – use PIDs when possible 29/12/2018 - @resdatall

13 Example Currently available – not very useful Machine-actionable DMP
<administrative_data> <question>Who will be the Principle Investigator?</question> <answer>The PI will be John Smith from our university.</answer> </administrative_data> Machine-actionable DMP "dc:creator":[ { "foaf:name":"John Smith", "madmp:institution":"AT-Vienna-University-of-Technology" } ], Develop own concepts and vocabularies only when needed dc, foaf – reuse existing standards madmp – create own when needed AT Vienna – use controled vocabularies when possible or PIDs for instituion ORCID – use PIDs when possible 29/12/2018 - @resdatall

14 Machine-actionable DMPs

15 www.rd-alliance.org - @resdatall
What is RDA Research Data Alliance community-driven organization 6,000 members from 130 countries different stakeholders Plenary meetings Interest Groups (IGs) Active DMPs Working Groups (WGs) DMP Common Standards 29/12/2018 - @resdatall

16 DMP Common Standards - Outputs
Common data model for machine-actionable DMPs to model information from standard DMPs NOT a template NOT a questionnaire modular design core set of elements domain specific extensions Reference implementations ready to use models JSON, XML, RDF, etc. Guidelines for adoption of the common data model requirements for supporting systems pilot studies 29/12/2018 - @resdatall

17 www.rd-alliance.org - @resdatall
29/12/2018 - @resdatall

18 www.rd-alliance.org - @resdatall
Workshop objectives Evaluate processes that read from/write to maDMPs which systems can be integrated? how maDMPs connect to activities of various stakeholders? Identify how information must be modelled in maDMPs which specific fields are needed? what models/dictionaries/standards can be reused? identify stakeholders at each lifecycle stage e.g. who can provide cost estimations when planning? identify how available information changes over the lifetime of a DMP e.g. when exact size of data is important and when not? how need for information changes  over the lifetime of a DMP e.g. is a persistent identifier important to funders in the planning phase or when a project ends? 29/12/2018 - @resdatall

19 www.rd-alliance.org - @resdatall
we learn about you 29/12/2018 - @resdatall

20 www.rd-alliance.org - @resdatall
Introduction round Your name Your location Your role What’s the most important thing you want to get out of this meeting? 29/12/2018 - @resdatall

21 www.rd-alliance.org - @resdatall
Processes for maDMPs Exercise 1 29/12/2018 - @resdatall

22 Horizon 2020 DMP survey report
Horizon 2020 template for Data Management Plans (Version 1.0.0) [Data set]. Zenodo. 29/12/2018 - @resdatall

23 Planning phase Goal: get estimations and recommendations (which are feasible to implement later) BASIC INFO ADMINISTRATIVE DATA ANALYSE DATA FIND REPOSITORY SELECT LICENSE GENERATE DMP John Smith

24 Project and Post-project phases
Goal: update DMP with real information by re-using (linking) information provided elsewhere OAI-PMH BASIC INFO ADMINISTRATIVE DATA GET METADATA PRESERVATION GENERATE DMP John Smith 10 YEARS

25 www.rd-alliance.org - @resdatall
Planning phase - demo 29/12/2018 - @resdatall

26 Project and post-project phase - demo 1
29/12/2018 - @resdatall

27 Project and post-project phase - demo 2
29/12/2018 - @resdatall

28 www.rd-alliance.org - @resdatall
maDMPs use cases 29/12/2018 - @resdatall

29 BPMN process - overview
Business Process Modelling Notation (BPMN) Defined 10 workflows 29/12/2018 - @resdatall

30 www.rd-alliance.org - @resdatall
Start DMP 29/12/2018 - @resdatall

31 www.rd-alliance.org - @resdatall
Get Cost / Storage 29/12/2018 - @resdatall

32 www.rd-alliance.org - @resdatall
Exercise Work in groups analyse BPMN processes play with tools Discuss what would you change in the workflows? do the workflows fit into the context of your organisation/domain? which other stakeholder needs should be addressed? which other systems can be used? what else could be automated? Report back 29/12/2018 - @resdatall

33 www.rd-alliance.org - @resdatall
Coffee break 29/12/2018 - @resdatall

34 www.rd-alliance.org - @resdatall
Add your notes Add your registration data (name, tpdl-id) 29/12/2018 - @resdatall

35 www.rd-alliance.org - @resdatall
Models for maDMPs Exercise 2 29/12/2018 - @resdatall

36 Generated landing page of a DMP

37

38 Metadata for specific data objects

39 Metadata - details { "@type": "dmp:Container",
"github": " "dc:title": "Dockerfile", "dmp:hasIntelectualPropertyRights": { "dcterms:license": " }, "dmp:hasMetadata": { "dcterms:description": "Dockerfile", "premis:hasObjectCharacteristics": { "premis:fixity": { "premis:hasMessageDigestAlgorithm": "premis:Fixity:SHA-256", "premis:messageDigest": "a16c7c70cccd3b706d0e a0b302c6250a159fd27b4f069565e " } "dmp:hasDataVolume": "103 bytes"

40 User story consultation
Goals identify stakeholders at each lifecycle stage define which information they provide define which information they expect As a <stakeholder>, I want <goal> so that <reason >. As a researcher, I want to inform repository operator on the amount of data in the planning phase, so that they provide information on costs. 29/12/2018 - @resdatall

41 User story consultation
100+ issues defined inputs from Europe and Australia inputs from individuals and workshops 29/12/2018 - @resdatall

42 www.rd-alliance.org - @resdatall
User story labelling Reviewed by chairs and authors classified in scope - useful for model definition out of scope – often referring to the ecosystem, practices – important but not directly for the common data model labelled 29/12/2018 - @resdatall

43 www.rd-alliance.org - @resdatall
User story labelling 3 major categories (colours) stakeholders involved project phase subject of information conveyed access control volume financial licensing metadata repository security storage etc. 29/12/2018 - @resdatall

44 User story visualisation
interactive visualisation - changes on GitHub are visible immediately shows relations between stakeholders, phases and information 29/12/2018 - @resdatall

45 From user stories to requirements
Refactoring of user stories Goal: finding overlaps, gaps, duplicates Example below Metadata taxonomy/classification [14,11] Links to metadata of the real data [89, 39] Funder information [7] Link publications to data [55] Authorship [88] Multilingual metadata [65] Include raw metadata directly in the model [91, 85] 29/12/2018 - @resdatall

46 From user stories to requirements
Refactoring of user stories Goal: finding overlaps, gaps, duplicates Example below Metadata taxonomy/classification [14,11] Links to metadata of the real data [89, 39] Funder information [7] Link publications to data [55] Authorship [88] Multilingual metadata [65] Include raw metadata directly in the model [91, 85] ‘yellow’ label used to classify user stories 29/12/2018 - @resdatall

47 From user stories to requirements
Refactoring of user stories Goal: finding overlaps, gaps, duplicates Example below Metadata taxonomy/classification [14,11] Links to metadata of the real data [89, 39] Funder information [7] Link publications to data [55] Authorship [88] Multilingual metadata [65] Include raw metadata directly in the model [91, 85] short summary of what user stories are about – more specific requirements 29/12/2018 - @resdatall

48 From user stories to requirements
Refactoring of user stories Goal: finding overlaps, gaps, duplicates Example below Metadata taxonomy/classification [14,11] Links to metadata of the real data [89, 39] Funder information [7] Link publications to data [55] Authorship [88] Multilingual metadata [65] Include raw metadata directly in the model [91, 85] IDs of user stories (to keep connection to the GitHub consultation) 29/12/2018 - @resdatall

49 Requirements grouping
Similar requirements exist under different labels Example information on the author of the DMP is relevant for administrative activities reuse We split requirements and grouped them using five categories Administrative, Roles and Responsibilities Data Infrastructure Security, Privacy and Access Control Policies, legal and ethical aspects 29/12/2018 - @resdatall

50 Requirements grouping example (Data)
Format Format [80, 12, 99, 62, 67, 54, 80] Volume Data size estimate [5, 77, 80, 100] For specific type of data [62] Data size real [54] Provenance [54] Metadata taxonomy/classification [14,11] Links to metadata of the real data [89, 39] Link publications to data [55] Authorship [88] Multilingual metadata [65] Include raw metadata directly in the model [91, 85] Reuse Links to (meta-)data location [89, 90, 56, 39, 60] Repository [42] Persistent identifier for data [92] Link publications to data [55, 88] Link to License/Contract allowing data usage/storing [56] Note: we did not move all requirements falling under a specific label, but only a subset that is relevant in this context – in the given example, relevant for data description. Other requirements for Reuse were put into other categories. 29/12/2018 - @resdatall

51 www.rd-alliance.org - @resdatall
Next steps 1st consultation (user stories) went broad helped us defined the scope of the maDMPs what information should a maDMP contain? who provides and uses this information? 2nd consultation will go deep how do we model specific requirements which specific fields are needed? which models exist? 29/12/2018 - @resdatall

52 Consultation 2 – ‘going deep’
5 documents to collect requirements, models, specific fields, etc. Administrative, Roles and Responsibilities Data Infrastructure Security, Privacy and Access Control Policies, legal and ethical aspects 29/12/2018 - @resdatall

53 www.rd-alliance.org - @resdatall
Exercise Work in groups Discuss which specific fields are needed? what models/dictionaries/standards can be reused? Add your ideas to the documents Report back 29/12/2018 - @resdatall

54 www.rd-alliance.org - @resdatall
Wrap-up 29/12/2018 - @resdatall

55 www.rd-alliance.org - @resdatall
Wrap-up What can we automate? Which services will always be manually performed? How broad can we integrate? university-wise? country-wise? European-wise? Which services can we share? 29/12/2018 - @resdatall

56 www.rd-alliance.org - @resdatall
10 principles for maDMPs 29/12/2018 - @resdatall

57 Defining requirements for machine-actionable Data Management Plans
maDMPs.pdf 29/12/2018 - @resdatall

58 www.rd-alliance.org - @resdatall
RDA Plenary meeting 29/12/2018 - @resdatall

59 https://www.rd-alliance.org/groups/dmp-common- standards-wg
Thank you! standards-wg 29/12/2018 - @resdatall

60 www.rd-alliance.org - @resdatall
29/12/2018 - @resdatall

61 Data model Top level vocabulary Based on DMP themes
Extended by domain specific standards OWL ontology :

62 Example


Download ppt "Tomasz Miksa, SBA Research, Wien, Austria"

Similar presentations


Ads by Google