Presentation is loading. Please wait.

Presentation is loading. Please wait.

PaN-data WP7 - Integration Brian Matthews STFC-e-Science.

Similar presentations


Presentation on theme: "PaN-data WP7 - Integration Brian Matthews STFC-e-Science."— Presentation transcript:

1 PaN-data WP7 - Integration Brian Matthews STFC-e-Science

2 Integration Workpackage Last work-package to start – M8 (January) – Goes on to the end of project Dependencies on outcomes of other WPs – Users, Data, Software Deliverables – D7.1: Report on survey of publication repositories, cross-linking and long-term preservation (M12). – D7.2: Proposal for integration of practices (M16). – D7.3 : Final report on standards for publication repositories, cross- linking and long-term preservation (M18) STFC (4 SM), DLS (2 SM), others @ 0.5SM Early, so now general ideas on the work in the area. – Get the right people together in advance, – Quite an open ended work-package – Start thinking

3 WP7 Development of standards for integration and cross-linking of outputs Objectives To foster the integration of the whole science lifecycle, focussing on linking of publications and data, interaction between institutional repositories of publications, packaging for long-term preservation, and services for search and reuse. Methodology: Publications repositories complete the lifecycle of innovation. Linking to Users, Data and Software enable traceability of published results through the scientific process. Sharing of the final results provides a foundation for the next cycle of science, and packaging enables long-term preservation of the outputs of research. Association of data with the publications resulting from it is a basis for preservation through Representation Information—a term from the OAIS standard (Open Archival Information System), meaning information necessary to ensure continued understandability and usability of a digital resource. Furthermore, this is also a basis for reuse of data across diverse communities, since the supplementary information needed for continued understandability is also valuable for transfer across communities. The European Support Action PARSE. Insight (of which STFC is WPL) is producing a roadmap for digital preservation in Europe, informed by a large-scale survey of attitudes and practices in a wide range of scientific disciplines. The roadmap includes components such as tools for creation of Representation Information, and will be taken into account in the project work. Task 7.1: Review existing provision for publication repositories, citation recording and long-term preservation in use across the facilities and in the user community, including facility libraries. (M8-M12) Task 7.2: Propose strategy on integration of practices across the community (M12-M16). Task 7.3: Develop final proposal on integration of practices across the community (M17-18). (Note: the final workshop to disseminate the results of the work package takes place in WP3) Deliverables D7.1: Report on survey of publication repositories, cross-linking and long-term preservation (M12). D7.2: Proposal for integration of practices (M16). D7.3 : Final report on standards for publication repositories, cross-linking and long-term preservation (M18)

4 Objective 7 – Integration and cross- linking of outputs To foster the integration of the whole science lifecycle, focussing on linking of publications and data, interaction between institutional repositories of publications, packaging for long-term preservation, and services for search and reuse.

5 Desired Information Flow Reference Linking Research Outputs User registration data; Instrument allocation data etc. Comments, annotations, ratings etc. Risk assessment data; other sample data Analyse Derived Data Research Concept and/or Experiment Design Acquire Sample Peer-review Proposal Conduct Experiment Generate, Create, & Collect Raw Data Process Raw Data into Derived Data Interpret & Analyse Results Data Archive, Preservation & Curation IPR, Embargo & Access Control Validate, Reuse & Repurpose Data Publish Research Results DataDerived DataProcessed Data Raw, Correction & Calibration Data Papers, articles, presentations, reports I2S2: An Idealised Scientific Research Activity Lifecycle Model Documentation, Metadata & Storage (Reference, Provenance, Context, Calibration etc.) Start Project Write Proposal (include DMP) Scholarly Knowledge Write Usage Reports Publication Database Research ActivityResearch Admin Activity Archive Activity Information Flow KEY Prepare Supplementary Data Prepare Manuscript Peer Review Research Discover & Access Appraisal & Quality Control Programs (generate customised software) Publication Activity Integration and linking via - Common information exchange model - Common tools, services and protocols

6 Facilities Lifecycle Proposal Approval Scheduling Experiment Data storage Record Publication Scientist submits application for beamtime Facility committee approves application Facility registers, trains, and schedules scientist’s visit Scientists visits, facility run’s experiment Subsequent publication registered with facility Raw data filtered, cleansed and stored Data analysis Tools for processing made available Link Why Link? - Discovery of results - Auditing of usage of facility - Allowing greater reuse of data - Validation of results

7 Raw Data Data Analysis Analysed Data Publication Data Publications Facility 1 Raw Data Data Analysis Analysed Data Publication Data Publications Facility 2 Raw Data Data Analysis Analysed Data Publication Data Publications Facility 3 Capacity Storage Publications Repositories Data Repositories Raw Data Catalogue Data Analysis Analysed Data Catalogue Publication Data Catalogue Publications Catalogue Single Infrastructure  Single User Experience Software Repositories

8 Objective 7 – Integration and cross- linking of outputs To foster the integration of the whole science lifecycle, focussing on linking of publications and data, interaction between institutional repositories of publications, packaging for long-term preservation, and services for search and reuse. Outcomes 1.promote the linking of publications,... to the data on which they are based, 2.foster the development of interaction between repositories of publications,... 3.work towards packaging the full scientific results of particular experiments for archival purposes,... aimed at the long-term preservation of the data and other results, 4.define search services... which will enable single searches..., and importantly will open up the possibility of reuse of data across different disciplines through the same mechanism of packaging for archival with the needed supplementary information for understanding and reuse.

9 Issues Existing repositories Data citation Constructing and maintaining links – Identifying users, data resources, software – Federating and accessing linked infrastructure – Linked Web of Data Digital preservation Packaging and access

10 Existing publication management systems What existing methods do facilities use to track publications arising from work at their facilities? – In house – Libraries – Public services – Entry points

11 Citation of data – Persistent Identifiers (e.g. DOIs ) – Standard ways of citing data – Who do you cite? – What do you cite Raw data, Derived data Data delivered to publishers – Data policy

12 Linking publications and data Find datasets that in repositories which are used to derive publications. Find papers which are written from datasets. – Can validate the results of the paper – Can perform new secondary analyses – Can judge the value of a data set from its use – Can give credit to data providers, tracing usage – Can also add forward links to paper- to evaluate their use.

13 Constructing Links Ideally the archives holding the data would be notified that a paper citing them had been submitted. – Metadata associated with those records would be updated to reflect the citations. – The metadata in the publication repository should also link to the metadata in the data archives and vice versa. – It would be great if this notification could be done automatically. Tedious to enter citations “forward citations” (“cited-by”) are hard to track Builds a citation graph – Fits well with the notion of “Linked Web of Data” – Could easily be extended to other components Derived data Software

14 Preservation Preservation policies and planning – What data to preserve, for how long ? Procedures for managing preservation – Persistent Ids – Maintaining media – Maintaining Links – Maintaining context Representation information Packaging preserved data for access to users

15 Access Cross-searching – Common metadata models – Common services E.g. TopCat front end on ICat – Cross-searching Complex data objects – OAI-ORE – SPARQL end-points OAIS packages

16 Tasks Task 7.1: Review existing provision for publication repositories, citation recording and long-term preservation in use across the facilities and in the user community, including facility libraries. (M8-M12) – D7.1: Report on survey of publication repositories, cross- linking and long-term preservation (M12). Task 7.2: Propose strategy on integration of practices across the community (M12-M16). – D7.2: Proposal for integration of practices (M16). Task 7.3: Develop final proposal on integration of practices across the community (M17-18) – D7.3 : Final report on standards for publication repositories, cross-linking and long-term preservation (M18)

17 Who should be involved? All partners involved – Representation from managers of records of publications (libraries) Set up a wiki group to start thinking of issues and approaches Evaluate user, data, software outputs for integration Collect information on suitable publication repositories Collect information on suitable initiatives and standards – Data integration and linking – Data preservation – Persistent identifiers – Data citation Begin to evaluate for best practice Ready to participate with outlines at M9 workshops

18 eCrystals

19 eCrystals citation management screen

20 Publishes a Trackback URI ePubs publication

21 Invoking the trackback Enters the trackback URI

22 A citation of the paper

23 A Citation of the data


Download ppt "PaN-data WP7 - Integration Brian Matthews STFC-e-Science."

Similar presentations


Ads by Google