#DPHEP: Status and Outlook Sustainable Strategies for Long-Term DP at the Exa-scale LHCC Referees Meeting International Collaboration.

Slides:



Advertisements
Similar presentations
GEOSS Data Sharing Principles. GEOSS 10-Year Implementation Plan 5.4 Data Sharing The societal benefits of Earth observations cannot be achieved without.
Advertisements

Presentation at WebEx Meeting June 15,  Context  Challenge  Anticipated Outcomes  Framework  Timeline & Guidance  Comment and Questions.
Connect communicate collaborate View on eResearch 2020 study Draft report on “The Role of e-Infrastructures in the Creation of Global Virtual Research.
BELMONT FORUM E-INFRASTRUCTURES AND DATA MANAGEMENT PROJECT Updates and Next Steps to Deliver the final Community Strategy and Implementation Plan Maria.
Open Library Environment Designing technology for the way libraries really work November 19, 2008 ~ ASERL, Atlanta Lynne O’Brien Director, Academic Technology.
1 Ideas About the Future of HPC in Europe “The views expressed in this presentation are those of the author and do not necessarily reflect the views of.
Co-funded by the European Union under FP7-ICT Alliance Permanent Access to the Records of Science in Europe Network Co-ordinated by aparsen.eu #APARSEN.
Slide: 1 27 th CEOS Plenary |Montréal | November 2013 Agenda Item: 15 Chu ISHIDA(JAXA) on behalf of Rick Lawford, GEO Water CoP leader GEO Water.
Research and Innovation Research and Innovation Research and Innovation Research and Innovation Research Infrastructures and Horizon 2020 The EU Framework.
US CMS Data Preservation Discussion, 16 February 2012 CMS Data Preservation Policy On behalf of Data preservation working group Active Members:
DATA PRESERVATION IN ALICE FEDERICO CARMINATI. MOTIVATION ALICE is a 150 M CHF investment by a large scientific community The ALICE data is unique and.
FIM-ig Federated Identity Management Interest Group.
Computing in Atmospheric Sciences Workshop: 2003 Challenges of Cyberinfrastructure Alan Blatecky Executive Director San Diego Supercomputer Center.
Exa-Scale Data Preservation in HEP
EGI-Engage EGI-Engage Engaging the EGI Community towards an Open Science Commons Project Overview 9/14/2015 EGI-Engage: a project.
Procurement Innovation for Cloud Services in Europe CERN – 14 May 2014 Bob Jones (CERN) This document produced by Members of the Helix Nebula consortium.
Long-Term Data Preservation in HEP Challenges, Opportunities and Solutions(?) Workshop on Best Practices for Data Management & Sharing.
Long-Term Data Preservation in HEP Challenges, Opportunities and Solutions(?) Joint Data Preservation RDA-3 International Collaboration.
Data Preservation at the Exa-Scale and Beyond Challenges of the Next Decade(s) APARSEN Webinar, November 2014.
U.S. Department of the Interior U.S. Geological Survey CDI Webinar Sept. 5, 2012 Kevin T. Gallagher and Linda C. Gundersen September 5, 2012 CDI Science.
Towards a European network for digital preservation Ideas for a proposal Mariella Guercio, University of Urbino.
Towards the definition of an eIRGRoma, 10 December An e-Infrastructure in Europe: a strategy and policy driven approach for a policy eIRG A pink.
ID-01 report 17 September 2014 IDIB meeting, Enschede Michel Schouppe
Long-Term Data Preservation: Debriefing Following RDA-4 WLCG GDB, October 2014
Data Preservation in High Energy Physics Towards a Global Effort for Sustainable Long-Term Data Preservation in HEP
Managing, Preserving & Computing with Big Research Data Challenges, Opportunities and Solutions(?) EU-T0 F2F, April 2014 International.
The DPHEP Collaboration & Project(s) Services, Common Projects, Business Model(s) PH/SFT Group Meeting December 2013 International.
1 Direction scientifique Networks of Excellence objectives  Reinforce or strengthen scientific and technological excellence on a given research topic.
Recent Developments in CLARIN-NL Jan Odijk P11 LREC, Istanbul, May 23,
Ian Bird GDB; CERN, 8 th May 2013 March 6, 2013
DPHEP7 / DASPOS Closing DPHEP7, March 2013 International Collaboration for Data Preservation and Long Term Analysis in High Energy.
ARL Workshop on New Collaborative Relationships: The Role of Academic Libraries in the Digital Data Universe September 26-27, 2006 ARL Prue.
Cyberinfrastructure: Many Things to Many People Russ Hobby Program Manager Internet2.
JISC/CNI Conference Edinburgh, 26th June 2002 Challenges of Digital Preservation – do we have a road map? Maggie Jones.
Global Geospatial Information Management (GGIM) A UN-DESA Initiative in collaboration with Cartographic Section, DFS Stefan Schweinfest UNSD.
Course, Curriculum, and Laboratory Improvement (CCLI) Transforming Undergraduate Education in Science, Technology, Engineering and Mathematics PROGRAM.
Long-Term Data Preservation DPHEP Project Manager International Collaboration for Data Preservation and Long Term Analysis in High.
Data Preservation in HEP Use Cases, Business Cases, Costs & Cost Models Grid Deployment Board International Collaboration for Data.
EDLproject WP3 “Developing the European Digital Library” LIBER – EBLIDA workshop Digitisation of Library Material in Europe Copenhagen, October.
INFSO-RI Enabling Grids for E-sciencE The EGEE Project Owen Appleton EGEE Dissemination Officer CERN, Switzerland Danish Grid Forum.
Infrastructure Breakout What capacities should we build now to manage data and migrate it over the future generations of technologies, standards, formats,
Ian Bird Overview Board; CERN, 8 th March 2013 March 6, 2013
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI EGI strategy and Grand Vision Ludek Matyska EGI Council Chair EGI InSPIRE.
Preservation e-Infrastructures, Certification & ADMP IGs DPHEP Status and Outlook RDA Plenary 6 Paris, September 2016 International.
International Collaboration for Data Preservation and Long Term Analysis in High Energy Physics RECODE - Final Workshop - January.
SciencePAD Open Software for Open Science Alberto Di Meglio – CERN.
Ian Bird LCG Project Leader Status of EGEE  EGI transition WLCG LHCC Referees’ meeting 21 st September 2009.
DPHEP – International Perspectives
The DPHEP Collaboration & Project(s) Services, Common Projects, Business Model(s) EGI “towards H2020” Workshop December 2013 International.
12 March, 2002 LCG Applications Area - Introduction slide 1 LCG Applications Session LCG Launch Workshop March 12, 2002 John Harvey, CERN LHCb Computing.
Preparing Data Management Plans for WLCG and HNISciCloud IT International Collaboration for Data Preservation and Long Term.
DPHEP Update LTDP = Data Sharing – In Time and Space WLCG Overview Board, May 2014 International Collaboration for Data Preservation.
A Shared Commitment to Digital Preservation and Access.
Long-Term Data Preservation WLCG Overview Board, March 2013 Twitter: #DPHEP International Collaboration for Data Preservation and.
Data Preservation in HEP Use Cases, Business Cases, Costs & Cost Models Grid Deployment Board International Collaboration for Data.
Usecases: 1.ISIS Neutron Source 2.DP for HEP Matthew Viljoen STFC, UK APARSEN-EGI workshop: preserving big data for research Amsterdam Science Park 4-6.
OPEN SCIENCE AND RESEARCH LEADS TO SURPRISING DISCOVERIES AND CREATIVE INSIGHTS Welcome from Ministry of Education and Culture The NeIC 2015 Conference,
A 10-year Vision for Global Research Data Infrastructures Erwin Laure KTH 1st WG Meeting, London,
LHCbComputing Update of LHC experiments Computing & Software Models Selection of slides from last week’s GDB
School on Grid & Cloud Computing International Collaboration for Data Preservation and Long Term Analysis in High Energy Physics.
Ian Bird, CERN WLCG Project Leader Amsterdam, 24 th January 2012.
Deliverables, final review and final reporting
Digital Sustainability on the EU Policy Level
HEP LTDP Use Case & EOSC Pilot
EOSCpilot WP4: Use Case 5 Material for
Jarek Nabrzyski Director, Center for Research Computing
APARSEN Webinar, November 2014
Data Preservation Update Data Preservation, Curation & Stewardship
What does DPHEP do? DPHEP has become a Collaboration with signatures from the main HEP laboratories and some funding agencies worldwide. It has established.
Common Solutions to Common Problems
Presentation transcript:

#DPHEP: Status and Outlook Sustainable Strategies for Long-Term DP at the Exa-scale LHCC Referees Meeting International Collaboration for Data Preservation and Long Term Analysis in High Energy Physics

Overview Sustainable Strategy Collaboration Agreement Research Data Alliance H2020 (NSF?) Prospects

2020 Vision for LT DP in HEP Long-term – e.g. LC timescales: disruptive change – By 2020, all archived data – e.g. that described in Blueprint, including LHC data – easily findable, fully usable by designated communities with clear (Open) access policies and possibilities to annotate further – Best practices, tools and services well run-in, fully documented and sustainable; built in common with other disciplines, based on standards  Vision achievable, but we are far from this today

Data Preservation Maturity Model LevelMetricImplications 4Reproducible results by “citizen scientists” Desired(?) by funding agencies: people able to reproduce an analysis should be awarded “a degree” – beyond what can realistically be afforded? 3Reproducible results where consumer ≠ producer and outside immediate community Stronger demonstration of long-term preservation. Knowledge stored is sufficient for physicist outside immediate community to reproduce results 2Reproducible results where consumer ≠ producer but within same “larger community”, e.g. LHC (ATLAS / CMS; CDF / D0, …) Highly desirable for “minimal” long-term preservation. “Knowledge” stored is sufficient for a physicist from a different collaboration (but within same overall programme) to reproduce results 1Reproducible results where consumer = producer Required during lifetime of collaboration 0N/AData is lost: logically or physically. This is probably the reality for the bulk of pre-DPHEP experiments (and even some of those??) Scale (complexity) is probably “exponential”

Software Preservation Maturity Model LevelMetricImplications 4Reproducible results by “citizen scientists” Desired(?) by funding agencies: people able to reproduce an analysis should be awarded “a degree” – beyond what can realistically be afforded? 3Reproducible results where consumer ≠ producer and outside immediate community Stronger demonstration of long-term preservation. Knowledge stored is sufficient for physicist outside immediate community to reproduce results 2Reproducible results where consumer ≠ producer but within same “larger community”, e.g. LHC (ATLAS / CMS; CDF / D0, …) Highly desirable for “minimal” long-term preservation. “Knowledge” stored is sufficient for a physicist from a different collaboration (but within same overall programme) to reproduce results 1Reproducible results where consumer = producer Required during lifetime of collaboration 0N/AData is lost: logically or physically. This is probably the reality for the bulk of pre-DPHEP experiments (and even some of those??) REPRODUCIBLE RESULTS AFTER “PORTING” TO NEW ENVIRONMENT!

Sustainable Strategy A document on a sustainable strategy for LTDP is available – discussed at DPHEP IB today This version focuses on CERN (IT) – presented yesterday (attached to agenda: doc, ppt) Some comments received (DESY, INFN) – DESY comments included in current draft; – INFN: stress need for standards, e.g. for outreach activities based on data from multiple experiments Intent is to update document to reflect activities of other “Collaboration Members”

Summary of Recommendations

ICFA Statement on LTDP The International Committee for Future Accelerators (ICFA) supports the efforts of the Data Preservation in High Energy Physics (DPHEP) study group on long-term data preservation and welcomes its transition to an active international collaboration with a full-time project manager. It encourages laboratories, institutes and experiments to review the draft DPHEP Collaboration Agreement with a view to joining by mid- to late ICFA notes the lack of effort available to pursue these activities in the short-term and the possible consequences on data preservation in the medium to long-term. We further note the opportunities in this area for international collaboration with other disciplines and encourage the DPHEP Collaboration to vigorously pursue its activities. In particular, the effort required to prepare project proposals must be prioritized, in addition to supporting on-going data preservation activities. ICFA notes the important benefits of long-term data preservation to exploit the full scientific potential of the, often unique, datasets. This potential includes not only future scientific publications but also educational outreach purposes, and the Open Access policies emerging from the funding agencies. 15 March 2013

DPHEP Collaboration Agreement A draft has been prepared by the CERN legal service, has been sent to ICFA and available to DPHEP since early 2013 Some comments have been received and integrated AFAIK CERN, DESY, FNAL and SLAC “ready” to sign Target: prior to CHEP 2013 (RDA-2 might be better!) Next steps: get legal services in touch with each other and complete process CERN & DESY: defining activities as part of Collaboration

RDA Preservation WG The RDA – strongly supported by EU, NSF, AU – seen as an element of implementing HLEG 2030 visionRDA A WG on DP was approved in MayDP – Chair: David Giaretta (APA, SCIDIP-ES, author of “Advanced DP”, ex-DCC, ex-STFC) – Co-chair: JDS The intent is to show progress by each RDA plenary (March, September) and co-ordinate international activities, identify candidate services for standardization, lobby for funding…RDA plenary

Component Breakdown Can break this down into three distinct areas – (OAIS reference model is somewhat more complex: this is a zeroth iteration) “Archive issues” Digital Libraries & “Adding Value” to data “Knowledge retention” – the Crux of the Matter

Archive Issues We (HEP) has significant experience of 100PB+ distributed data stores Plan is to coordinate long-term “bit preservation” issues via HEPiX And with other disciplines e.g. via IEEE MSST ×Sustainable models for long-term multi- disciplinary data archives still to be solved  H2020 funding targetted for this

Digital Libraries Significant investment in this space, including multiple EU (and other) funded projects No reason to believe that the issues will not be solved, nor that funding models will not exist, e.g. adapted from “traditional” libraries Related topics: “linked data”, “adding value to data” – again with projects / communities  Should work closely with these projects / communities, not start new initiatives

Where to Invest – Summary Tools and Services, e.g. Invenio: could be solved. (2-3 years?) Archival Storage Functionality: should be solved. (i.e. “now”) Support to the Experiments for DPHEP Levels 3-4: must be solved – but how? 14

Who Can Help? Mobilize resources through existing structures: – Research Data Alliance: Funding / strong interest from EU, US, AU, others Part of roadmap to “Riding the Wave” 2030 Vision STFC and DCC personnel strongly involved in setup – WLCG: Efforts on “software re-design” for new architectures Experiment efforts on Software Validation (to be coordinated via DPHEP), building on DESY & others – DPHEP: Coordination within HEP and with other projects / disciplines National & International Projects – H2020 / NSF funding lines – National projects also play an important role

Trust Data Curation Data Generators Community Support Services Users Common Data Services User functionalities, data capture & transfer, virtual research environments Data discovery & navigation workflow generation, annotation, interoperability Persistent storage, identification, authenticity, workflow execution, mining Collaborative Data Infrastructure – Riding The Wave HLEG Report

H2020 Prospects According to Kostas Glinos (e-IRG meeting, Dublin) first calls: December “Framework for action” (part of open consultation) has a “fiche” targetting DP DPHEP ICFA report (2020 vision) sent to Carlos MP “References to RDA are appreciated and I really hope that you take a leading role in bringing people and key players together around a global initiative to tackle the issue of “highly reliable and highly trusted infrastructures for research data preservation”. IMHO: need to prepare now (collaboration, WP, tasks) – likely discuss this at RDA Plenary, CHEP 2013, PV …

A Strategy for H2020? Front-end: collaborate with on-going efforts in Digital Libraries, Linked Data, PV etc. – Significant effort (also HEP expertise): very high probability of further funding in H2020 (+RDA) – DP(HEP) is already part of these projects: feed in requirements & collaborate (PRELIDA WS??) Back-end: collaborate through HEPiX & IEEE MSST – Seek specific H2020 funding for CDIs, including TCO, long-term, sustainable inter-disciplinary archives Middle: – Collaborative effort on Validation Frameworks, Virtualization, Training, Outreach etc. Includes institute / national funding – Work for “Concurrency Framework” and other efforts so that future migrations less painful; more repeatable – [ CERNLIB consortium ] – Seek further funds (H2020, RDA) to further develop and generalize Several (all?) relevant “fiches” in “Call for Action” document – fiche 01: community support data services – fiche 02: infrastructure for Open Access – fiche 03: storing, managing and preserving research data – fiche 04: discovery and provenance of research data – fiche 05: towards global data e-infrastructures – fiche 06: global A&A e-infrastructures – fiche 07: skills and new professions for research data

Other Activities Various project proposals in preparation / review On-going activities in the experiments: “DPHEP classic” as well as LHC Discussions with CMS on validation system – other LHC experiments expected to join DPHEP session at CHEP 2013 – outlook for CHEP 2015? (tighter integration into programme) Presentations accepted at numerous conferences / workshops – building more links with other disciplines DPHEP IB (modeled on WLCG) monthly call

WhatWhen Collaboration AgreementQ3-Q Preparation for H2020Now – Q3/Q HEPiX WG in place<Q First H2020 calls openDec 2014 ICFA report (work plan, including sustainability plan) DESY, Feb H2020 ProposalEnd Q DPHEP Portal Availablemid 2014 H2020 newsJuly 2014 LEP Data “recovery” (CERNLIB???)End 2014? Validation framework(s)2014 / 2015? Long-term CDI #12015 – 2017 Full(?) understanding of costs2016/17? Sustainable, repeatable LTDP201?

Summary Making good progress on multiple fronts “Sustainable strategy” being discussed (and then put in place) Good inter-disciplinary collaboration Optimistic regarding H2020 and also NSF(+) – but needs work! #DPHEP for news! #DPHEP