School on Grid & Cloud Computing International Collaboration for Data Preservation and Long Term Analysis in High Energy Physics.

Slides:



Advertisements
Similar presentations
Co-funded by the European Union under FP7-ICT Co-ordinated by aparsen.eu #APARSEN Welcome to the Conference !! Juan Bicarregui Chair, APA Executive.
Advertisements

Co-ordinated by aparsen.eu #APARSEN Co-funded by the European Union under FP7-ICT Cost models and sustainability Simon Lambert,
Co-ordinated by aparsen.eu #APARSEN Co-funded by the European Union under FP7-ICT The APARSEN Virtual Centre of Excellence: how we got here Simon.
Co-funded by the European Union under FP7-ICT Alliance Permanent Access to the Records of Science in Europe Network Co-ordinated by aparsen.eu #APARSEN.
Data Seal of Approval Overview Lightning Talk RDA Plenary 5 – San Diego March 11, 2015 Mary Vardigan University of Michigan Inter-university Consortium.
DATA PRESERVATION IN ALICE FEDERICO CARMINATI. MOTIVATION ALICE is a 150 M CHF investment by a large scientific community The ALICE data is unique and.
Exa-Scale Data Preservation in HEP
Co-funded by the European Union under FP7-ICT Co-ordinated by aparsen.eu #APARSEN Cost-Benefit Models WEBINAR: Digital Preservation Cost Models.
Long-Term Data Preservation in HEP Challenges, Opportunities and Solutions(?) Workshop on Best Practices for Data Management & Sharing.
Data Archiving and Networked Services DANS is an institute of KNAW en NWO Trusted Digital Archives and the Data Seal of Approval Peter Doorn Data Archiving.
Long-Term Data Preservation in HEP Challenges, Opportunities and Solutions(?) Joint Data Preservation RDA-3 International Collaboration.
Co-funded by the European Union under FP7-ICT Co-ordinated by aparsen.eu #APARSEN Why persistent identifiers are crucial in digital preservation.
Data Preservation at the Exa-Scale and Beyond Challenges of the Next Decade(s) APARSEN Webinar, November 2014.
Caring and Sharing Collaboration in Digital Curation outside North America Ross Harvey Simmons College, Boston Curation Matters: 17 June 2010.
The Data Center of the 21 st Century John Bates NOAA National Climatic Data Center.
Co-ordinated by aparsen.eu #APARSEN Co-funded by the European Union under FP7-ICT Value proposition: Key for DP Sustainability Ruben RIESTRA APARSEN.
Long-Term Data Preservation: Debriefing Following RDA-4 WLCG GDB, October 2014
Data Management and Accessibility S.M. Kaye PPPL Research Seminar 12/16/2013.
Data Preservation in High Energy Physics Towards a Global Effort for Sustainable Long-Term Data Preservation in HEP
Managing, Preserving & Computing with Big Research Data Challenges, Opportunities and Solutions(?) EU-T0 F2F, April 2014 International.
The DPHEP Collaboration & Project(s) Services, Common Projects, Business Model(s) PH/SFT Group Meeting December 2013 International.
Co-ordinated by aparsen.eu #APARSEN Co-funded by the European Union under FP7-ICT How certification fits the APARSEN project Simon Lambert,
Co-funded by the European Union under FP7-ICT Co-ordinated by aparsen.eu #APARSEN Issues in preparedness for sustainable digital preservation: the.
Co-ordinated by aparsen.eu #APARSEN Co-funded by the European Union under FP7-ICT The importance of interoperability and intelligibility in digital.
Long Term Data Preservation LTDP = Data Sharing – In Time and Space Big Data, Open Data Workshop, May 2014 International Collaboration.
Data Preservation in HEP Use Cases, Business Cases, Costs & Cost Models Grid Deployment Board International Collaboration for Data.
INFSO-RI Enabling Grids for E-sciencE The EGEE Project Owen Appleton EGEE Dissemination Officer CERN, Switzerland Danish Grid Forum.
Office of Science Statement on Digital Data Management Laura Biven, PhD Senior Science and Technology Advisor Office of the Deputy Director for Science.
#DPHEP: Status and Outlook Sustainable Strategies for Long-Term DP at the Exa-scale LHCC Referees Meeting International Collaboration.
Preservation e-Infrastructures, Certification & ADMP IGs DPHEP Status and Outlook RDA Plenary 6 Paris, September 2016 International.
International Collaboration for Data Preservation and Long Term Analysis in High Energy Physics RECODE - Final Workshop - January.
The DPHEP Collaboration & Project(s) Services, Common Projects, Business Model(s) EGI “towards H2020” Workshop December 2013 International.
Preparing Data Management Plans for WLCG and HNISciCloud IT International Collaboration for Data Preservation and Long Term.
DPHEP Update LTDP = Data Sharing – In Time and Space WLCG Overview Board, May 2014 International Collaboration for Data Preservation.
Long-Term Data Preservation WLCG Overview Board, March 2013 Twitter: #DPHEP International Collaboration for Data Preservation and.
Data Preservation in HEP Use Cases, Business Cases, Costs & Cost Models Grid Deployment Board International Collaboration for Data.
Usecases: 1.ISIS Neutron Source 2.DP for HEP Matthew Viljoen STFC, UK APARSEN-EGI workshop: preserving big data for research Amsterdam Science Park 4-6.
Update on Data Preservation (CERN / WLCG Scope) WLCG OB June 2016 International Collaboration for Data Preservation and Long Term.
Co-funded by the European Union under FP7-ICT Alliance Permanent Access to the Records of Science in Europe Network aparsen.eu #APARSEN Options.
Co-funded by the European Union under FP7-ICT Co-ordinated by aparsen.eu #APARSEN CoE offerings Simon Lambert STFC All Hands Meeting, Amsterdam,
Ian Bird, CERN WLCG Project Leader Amsterdam, 24 th January 2012.
CESSDA SaW Training on Trust, Identifying Demand & Networking
Deliverables, final review and final reporting
Digital Sustainability on the EU Policy Level
Digital Sustainability on the EU Policy Level
HEP LTDP Use Case & EOSC Pilot
Digital Repository Certification Schema A Pathway for Implementing the GEO Data Sharing and Data Management Principles Robert R. Downs, PhD Sr. Digital.
Long Term Data Preservation meets the European Open Science Cloud
Certification of CERN as a Trusted Digital Repository
RDA US Science workshop Arlington VA, Aug 2014 Cees de Laat with many slides from Ed Seidel/Rob Pennington.
The Five Secrets of Project Scheduling A PMO Approach
EOSCpilot WP4: Use Case 5 Material for
APARSEN Webinar, November 2014
D33.1B PEER REVIEW OF DIGITAL REPOSITORIES
Trustworthiness of Preservation Systems
2. ISO Certification Discussed already at 2015 PoW and several WLCG OB meetings Proposed approach: An Operational Circular that describes the organisation's.
Ways to upgrade the FAIRness of your data repository.
Summit 2017 Breakout Group 2: Data Management (DM)
National e-Infrastructure Vision
Coordinator’s overview
Data Preservation Update Data Preservation, Curation & Stewardship
What is ATTRACT? ATTRACT (breAkThrough innovaTion pRogrAmme for deteCtor / inrAstructure eCosysTem) A proposal from CERN to the European Commission (EC)
Connecting the European Grid Infrastructure to Research Communities
EOSCpilot Skills Landscape & Framework
Research Data Management
What does DPHEP do? DPHEP has become a Collaboration with signatures from the main HEP laboratories and some funding agencies worldwide. It has established.
Common Solutions to Common Problems
Sergio Andreozzi Strategy and Policy Manager (EGI.eu)
Bird of Feather Session
It’s all about people Data-related training experiences from EUDAT, OpenAIRE, DANS Marjan Grootveld, DANS EDISON workshop, 29 August 2017.
Presentation transcript:

School on Grid & Cloud Computing International Collaboration for Data Preservation and Long Term Analysis in High Energy Physics

Training Overview 1.Background to Long-Term Data Preservation in HEP – the DPHEP Study Group 2.“2020 vision” for DP in HEP 3.DP in other disciplines – how we can benefit significantly from work (models, standards, procedures, wisdom, tools, services etc.) of others – the bulk of the material comes from other projects / disciplines 4.A strategy for DP in HEP 2

2020 Vision for LT DP in HEP Long-term – e.g. FCC timescales: disruptive change – By 2020, all archived data – e.g. that described in DPHEP Blueprint, including LHC data – easily findable, fully usable by designated communities with clear (Open) access policies and possibilities to annotate further – Best practices, tools and services well run-in, fully documented and sustainable; built in common with other disciplines, based on standards – DPHEP portal, through which data / tools accessed  Agree with Funding Agencies clear targets & metrics 3

4 Volume: 100PB + ~50PB/year (+400PB/year from 2020)

Collaboration – Benefits In terms of 2020 vision, collaboration with other projects has arguably advanced us (in terms of implementation of the vision) by several years I typically quote 3-5 years and don’t think that I am exaggerating Concrete examples include “Full Costs of Curation”, as well as proposed “Data Seal of Approval+” With or without project funding, we should continue – and even strengthen – this collaboration – APA events, iDCC, iPRES etc. + joint workshops around RDA The HEP “gene pool” is closed and actually quite small – we tend to recycle the same ideas and “new ones” sometimes needed 5

APARSEN Training & Knowledge Base 6

Requirements from Funding Agencies To integrate data management planning into the overall research plan, all proposals submitted to the Office of Science for research funding are required to include a Data Management Plan (DMP) of no more than two pages that describes how data generated through the course of the proposed research will be shared and preserved or explains why data sharing and/or preservation are not possible or scientifically appropriate. At a minimum, DMPs must describe how data sharing and preservation will enable validation of results, or how results could be validated if data are not shared or preserved. Similar requirements from European FAs and EU (H2020) 7

How to respond? a)Each project / experiment responds to individual FA policies – n x m b)We agree together – service providers, experiments, funding agencies – on a common approach – DPHEP can (should?) help coordinate b) almost certainly (much) cheaper / more efficient but what does it mean in detail? 8

Certification David Giaretta, APA Webinar, December 2013 aparsen.eu #APARSEN Co-funded by the European Union under FP7-ICT OAIS Open Archival Information System reference model provides: - fundamental concepts for preservation - fundamental definitions so people can speak without confusion - “now adopted as the de facto standard for building digital archives"  In Cyberinfrastructure Vision for 21st Century Discovery ►

Data Seal of Approval: Guidelines Guidelines Relating to Data Producers: 1.The data producer deposits the data in a data repository with sufficient information for others to assess the quality of the data and compliance with disciplinary and ethical norms. 2.The data producer provides the data in formats recommended by the data repository. 3.The data producer provides the data together with the metadata requested by the data repository.

2.Digital library tools (Invenio) & services (CDS, INSPIRE, ZENODO) + domain tools (HepData, RIVET, RECAST…) 3.Sustainable software, coupled with advanced virtualization techniques, “snap-shotting” and validation frameworks 4.Proven bit preservation at the 100PB scale, together with a sustainable funding model with an outlook to 2040/50 5.Open Data 11 (and several EB of data)

DPHEP Portal – Zenodo like? 12

David South | Data Preservation and Long Term Analysis in HEP | CHEP 2012, May | Page 13 Documentation projects with INSPIREHEP.net > Internal notes from all HERA experiments now available on INSPIRE  A collaborative effort to provide “consistent” documentation across all HEP experiments – starting with those at CERN – as from 2015  (Often done in an inconsistent and/or ad-hoc way, particularly for older experiments)

Summary It would be misleading to present DP in HEP as a “solved problem” – it is not However, many of the building blocks are understood with corresponding services, tools and support units A strategy, building on certified repositories and generic tools, complemented by additional metrics, is being elaborated Its still only 2014 – good progress expected in coming 2-3 years – well ahead of “2020”! 14