Download presentation
Presentation is loading. Please wait.
Published byEaster Davis Modified over 9 years ago
1
#DPHEP: Status and Outlook Sustainable Strategies for Long-Term DP at the Exa-scale Jamie.Shiers@cern.ch LHCC Referees Meeting International Collaboration for Data Preservation and Long Term Analysis in High Energy Physics
2
Overview Sustainable Strategy Collaboration Agreement Research Data Alliance H2020 (NSF?) Prospects
3
2020 Vision for LT DP in HEP Long-term – e.g. LC timescales: disruptive change – By 2020, all archived data – e.g. that described in Blueprint, including LHC data – easily findable, fully usable by designated communities with clear (Open) access policies and possibilities to annotate further – Best practices, tools and services well run-in, fully documented and sustainable; built in common with other disciplines, based on standards Vision achievable, but we are far from this today
4
Data Preservation Maturity Model LevelMetricImplications 4Reproducible results by “citizen scientists” Desired(?) by funding agencies: people able to reproduce an analysis should be awarded “a degree” – beyond what can realistically be afforded? 3Reproducible results where consumer ≠ producer and outside immediate community Stronger demonstration of long-term preservation. Knowledge stored is sufficient for physicist outside immediate community to reproduce results 2Reproducible results where consumer ≠ producer but within same “larger community”, e.g. LHC (ATLAS / CMS; CDF / D0, …) Highly desirable for “minimal” long-term preservation. “Knowledge” stored is sufficient for a physicist from a different collaboration (but within same overall programme) to reproduce results 1Reproducible results where consumer = producer Required during lifetime of collaboration 0N/AData is lost: logically or physically. This is probably the reality for the bulk of pre-DPHEP experiments (and even some of those??) Scale (complexity) is probably “exponential”
5
Software Preservation Maturity Model LevelMetricImplications 4Reproducible results by “citizen scientists” Desired(?) by funding agencies: people able to reproduce an analysis should be awarded “a degree” – beyond what can realistically be afforded? 3Reproducible results where consumer ≠ producer and outside immediate community Stronger demonstration of long-term preservation. Knowledge stored is sufficient for physicist outside immediate community to reproduce results 2Reproducible results where consumer ≠ producer but within same “larger community”, e.g. LHC (ATLAS / CMS; CDF / D0, …) Highly desirable for “minimal” long-term preservation. “Knowledge” stored is sufficient for a physicist from a different collaboration (but within same overall programme) to reproduce results 1Reproducible results where consumer = producer Required during lifetime of collaboration 0N/AData is lost: logically or physically. This is probably the reality for the bulk of pre-DPHEP experiments (and even some of those??) REPRODUCIBLE RESULTS AFTER “PORTING” TO NEW ENVIRONMENT!
6
Sustainable Strategy A document on a sustainable strategy for LTDP is available – discussed at DPHEP IB today This version focuses on CERN (IT) – presented yesterday (attached to agenda: doc, ppt) Some comments received (DESY, INFN) – DESY comments included in current draft; – INFN: stress need for standards, e.g. for outreach activities based on data from multiple experiments Intent is to update document to reflect activities of other “Collaboration Members”
7
Summary of Recommendations
8
ICFA Statement on LTDP The International Committee for Future Accelerators (ICFA) supports the efforts of the Data Preservation in High Energy Physics (DPHEP) study group on long-term data preservation and welcomes its transition to an active international collaboration with a full-time project manager. It encourages laboratories, institutes and experiments to review the draft DPHEP Collaboration Agreement with a view to joining by mid- to late-2013. ICFA notes the lack of effort available to pursue these activities in the short-term and the possible consequences on data preservation in the medium to long-term. We further note the opportunities in this area for international collaboration with other disciplines and encourage the DPHEP Collaboration to vigorously pursue its activities. In particular, the effort required to prepare project proposals must be prioritized, in addition to supporting on-going data preservation activities. ICFA notes the important benefits of long-term data preservation to exploit the full scientific potential of the, often unique, datasets. This potential includes not only future scientific publications but also educational outreach purposes, and the Open Access policies emerging from the funding agencies. 15 March 2013
9
DPHEP Collaboration Agreement A draft has been prepared by the CERN legal service, has been sent to ICFA and available to DPHEP since early 2013 Some comments have been received and integrated AFAIK CERN, DESY, FNAL and SLAC “ready” to sign Target: prior to CHEP 2013 (RDA-2 might be better!) Next steps: get legal services in touch with each other and complete process CERN & DESY: defining activities as part of Collaboration
10
RDA Preservation WG The RDA – strongly supported by EU, NSF, AU – seen as an element of implementing HLEG 2030 visionRDA A WG on DP was approved in MayDP – Chair: David Giaretta (APA, SCIDIP-ES, author of “Advanced DP”, ex-DCC, ex-STFC) – Co-chair: JDS The intent is to show progress by each RDA plenary (March, September) and co-ordinate international activities, identify candidate services for standardization, lobby for funding…RDA plenary
11
Component Breakdown Can break this down into three distinct areas – (OAIS reference model is somewhat more complex: this is a zeroth iteration) “Archive issues” Digital Libraries & “Adding Value” to data “Knowledge retention” – the Crux of the Matter
12
Archive Issues We (HEP) has significant experience of 100PB+ distributed data stores Plan is to coordinate long-term “bit preservation” issues via HEPiX And with other disciplines e.g. via IEEE MSST ×Sustainable models for long-term multi- disciplinary data archives still to be solved H2020 funding targetted for this
13
Digital Libraries Significant investment in this space, including multiple EU (and other) funded projects No reason to believe that the issues will not be solved, nor that funding models will not exist, e.g. adapted from “traditional” libraries Related topics: “linked data”, “adding value to data” – again with projects / communities Should work closely with these projects / communities, not start new initiatives
14
Where to Invest – Summary Tools and Services, e.g. Invenio: could be solved. (2-3 years?) Archival Storage Functionality: should be solved. (i.e. “now”) Support to the Experiments for DPHEP Levels 3-4: must be solved – but how? 14
15
Who Can Help? Mobilize resources through existing structures: – Research Data Alliance: Funding / strong interest from EU, US, AU, others Part of roadmap to “Riding the Wave” 2030 Vision STFC and DCC personnel strongly involved in setup – WLCG: Efforts on “software re-design” for new architectures Experiment efforts on Software Validation (to be coordinated via DPHEP), building on DESY & others – DPHEP: Coordination within HEP and with other projects / disciplines National & International Projects – H2020 / NSF funding lines – National projects also play an important role
16
Trust Data Curation Data Generators Community Support Services Users Common Data Services User functionalities, data capture & transfer, virtual research environments Data discovery & navigation workflow generation, annotation, interoperability Persistent storage, identification, authenticity, workflow execution, mining Collaborative Data Infrastructure – Riding The Wave HLEG Report
17
H2020 Prospects According to Kostas Glinos (e-IRG meeting, Dublin) first calls: December 11 2013 “Framework for action” (part of open consultation) has a “fiche” targetting DP DPHEP ICFA report (2020 vision) sent to Carlos MP “References to RDA are appreciated and I really hope that you take a leading role in bringing people and key players together around a global initiative to tackle the issue of “highly reliable and highly trusted infrastructures for research data preservation”. IMHO: need to prepare now (collaboration, WP, tasks) – likely discuss this at RDA Plenary, CHEP 2013, PV …
18
A Strategy for H2020? Front-end: collaborate with on-going efforts in Digital Libraries, Linked Data, PV etc. – Significant effort (also HEP expertise): very high probability of further funding in H2020 (+RDA) – DP(HEP) is already part of these projects: feed in requirements & collaborate (PRELIDA WS??) Back-end: collaborate through HEPiX & IEEE MSST – Seek specific H2020 funding for CDIs, including TCO, long-term, sustainable inter-disciplinary archives Middle: – Collaborative effort on Validation Frameworks, Virtualization, Training, Outreach etc. Includes institute / national funding – Work for “Concurrency Framework” and other efforts so that future migrations less painful; more repeatable – [ CERNLIB consortium ] – Seek further funds (H2020, RDA) to further develop and generalize Several (all?) relevant “fiches” in “Call for Action” document – fiche 01: community support data services – fiche 02: infrastructure for Open Access – fiche 03: storing, managing and preserving research data – fiche 04: discovery and provenance of research data – fiche 05: towards global data e-infrastructures – fiche 06: global A&A e-infrastructures – fiche 07: skills and new professions for research data
19
Other Activities Various project proposals in preparation / review On-going activities in the experiments: “DPHEP classic” as well as LHC Discussions with CMS on validation system – other LHC experiments expected to join DPHEP session at CHEP 2013 – outlook for CHEP 2015? (tighter integration into programme) Presentations accepted at numerous conferences / workshops – building more links with other disciplines DPHEP IB (modeled on WLCG) monthly call
20
WhatWhen Collaboration AgreementQ3-Q4 2013 Preparation for H2020Now – Q3/Q4 2013 HEPiX WG in place<Q4 2014 First H2020 calls openDec 2014 ICFA report (work plan, including sustainability plan) DESY, Feb 20-21 2014 H2020 ProposalEnd Q1 2014 DPHEP Portal Availablemid 2014 H2020 newsJuly 2014 LEP Data “recovery” (CERNLIB???)End 2014? Validation framework(s)2014 / 2015? Long-term CDI #12015 – 2017 Full(?) understanding of costs2016/17? Sustainable, repeatable LTDP201?
21
Summary Making good progress on multiple fronts “Sustainable strategy” being discussed (and then put in place) Good inter-disciplinary collaboration Optimistic regarding H2020 and also NSF(+) – but needs work! #DPHEP for news! #DPHEP
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.