The DPHEP Collaboration & Project(s) Services, Common Projects, Business Model(s) EGI “towards H2020” Workshop December 2013 International.

Slides:



Advertisements
Similar presentations
Conference xxx - August 2003 Fabrizio Gagliardi EDG Project Leader and EGEE designated Project Director Position paper Delivery of industrial-strength.
Advertisements

Presentation at WebEx Meeting June 15,  Context  Challenge  Anticipated Outcomes  Framework  Timeline & Guidance  Comment and Questions.
December 2008 MRC Data Support Services (DSS) Chris Morris 13 th February 2009 Sharing Research Data: Pioneers, Policies and Protocols The seventh cat.
Beyond the ALCPG David B. MacFarlane Associate Laboratory Director for PPA.
DATA PRESERVATION IN ALICE FEDERICO CARMINATI. MOTIVATION ALICE is a 150 M CHF investment by a large scientific community The ALICE data is unique and.
Computing in Atmospheric Sciences Workshop: 2003 Challenges of Cyberinfrastructure Alan Blatecky Executive Director San Diego Supercomputer Center.
Exa-Scale Data Preservation in HEP
GEO Work Plan Symposium 2012 ID-05 Resource Mobilization for Capacity Building (individual, institutional & infrastructure)
Writing Impact into Research Funding Applications Paula Gurteen Centre for Advanced Studies.
Long-Term Data Preservation in HEP Challenges, Opportunities and Solutions(?) Workshop on Best Practices for Data Management & Sharing.
Long-Term Data Preservation in HEP Challenges, Opportunities and Solutions(?) Joint Data Preservation RDA-3 International Collaboration.
RDA Data Foundation and Terminology (DFT) IG: Introduction Prepared for RDA 6 th Plenary Paris, Sept. 25, 2015 Gary Berg-Cross, Raphael Ritz Co-Chairs.
Data Preservation at the Exa-Scale and Beyond Challenges of the Next Decade(s) APARSEN Webinar, November 2014.
Usability Issues Documentation J. Apostolakis for Geant4 16 January 2009.
ID-01 report 17 September 2014 IDIB meeting, Enschede Michel Schouppe
Long-Term Data Preservation: Debriefing Following RDA-4 WLCG GDB, October 2014
1 C.Diaconu, DPHEP3, CERN, December 7-9, 2009 Blueprint Start the production of a detailed document on data preservation – Gets in details of the individual.
Data Preservation in High Energy Physics Towards a Global Effort for Sustainable Long-Term Data Preservation in HEP
Managing, Preserving & Computing with Big Research Data Challenges, Opportunities and Solutions(?) EU-T0 F2F, April 2014 International.
EGI_DS or “can WLCG operate after EGEE?” Jamie Shiers ~~~ WLCG GDB, May 14 th 2008.
ASCAC-BERAC Joint Panel on Accelerating Progress Toward GTL Goals Some concerns that were expressed by ASCAC members.
The DPHEP Collaboration & Project(s) Services, Common Projects, Business Model(s) PH/SFT Group Meeting December 2013 International.
CLIC Implementation Studies Ph. Lebrun & J. Osborne CERN CLIC Collaboration Meeting addressing the Work Packages CERN, 3-4 November 2011.
24-Aug-11 ILCSC -Mumbai Global Design Effort 1 ILC: Future after 2012 preserving GDE assets post-TDR pre-construction program.
ESIP Vision: “Achieve a sustainable world” by Serving as facilitator and advisor for the Earth science information community Promoting efficient flow of.
A Data Centre for Science and Industry Roadmap. INNOVATION NETWORKING DATA PROCESSING DATA REPOSITORY.
Enabling e-Research in Combustion Research Community T.V Pham 1, P.M. Dew 1, L.M.S. Lau 1 and M.J. Pilling 2 1 School of Computing 2 School of Chemistry.
DPHEP7 / DASPOS Closing DPHEP7, March 2013 International Collaboration for Data Preservation and Long Term Analysis in High Energy.
Southend Together Secretariat 21 st February Developing Southend Together’s Sustainable Community Strategy
ARL Workshop on New Collaborative Relationships: The Role of Academic Libraries in the Digital Data Universe September 26-27, 2006 ARL Prue.
Fire Emissions Network Sept. 4, 2002 A white paper for the development of a NSF Digital Government Program proposal Stefan Falke Washington University.
1 Future Circular Collider Study Preparatory Collaboration Board Meeting September 2014 R-D Heuer Global Future Circular Collider (FCC) Study Goals and.
Federations: The New Infrastructure Speaker Name Here Date Here Speaker Name Here Date Here.
Long Term Data Preservation LTDP = Data Sharing – In Time and Space Big Data, Open Data Workshop, May 2014 International Collaboration.
Long-Term Data Preservation DPHEP Project Manager International Collaboration for Data Preservation and Long Term Analysis in High.
Data Preservation in HEP Use Cases, Business Cases, Costs & Cost Models Grid Deployment Board International Collaboration for Data.
INFSO-RI Enabling Grids for E-sciencE The EGEE Project Owen Appleton EGEE Dissemination Officer CERN, Switzerland Danish Grid Forum.
Update on HEP SSC WLCG MB, 6 th July 2009 Jamie Shiers Grid Support Group IT Department, CERN.
#DPHEP: Status and Outlook Sustainable Strategies for Long-Term DP at the Exa-scale LHCC Referees Meeting International Collaboration.
Summary of HEP SW workshop Ian Bird MB 15 th April 2014.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Operations Automation Team Kickoff Meeting.
LHC Computing – the 3 rd Decade Jamie Shiers LHC OPN meeting October 2010.
Preservation e-Infrastructures, Certification & ADMP IGs DPHEP Status and Outlook RDA Plenary 6 Paris, September 2016 International.
International Collaboration for Data Preservation and Long Term Analysis in High Energy Physics RECODE - Final Workshop - January.
DPHEP – International Perspectives
Preparing Data Management Plans for WLCG and HNISciCloud IT International Collaboration for Data Preservation and Long Term.
DPHEP Update LTDP = Data Sharing – In Time and Space WLCG Overview Board, May 2014 International Collaboration for Data Preservation.
A Shared Commitment to Digital Preservation and Access.
Long-Term Data Preservation WLCG Overview Board, March 2013 Twitter: #DPHEP International Collaboration for Data Preservation and.
Data Preservation in HEP Use Cases, Business Cases, Costs & Cost Models Grid Deployment Board International Collaboration for Data.
Usecases: 1.ISIS Neutron Source 2.DP for HEP Matthew Viljoen STFC, UK APARSEN-EGI workshop: preserving big data for research Amsterdam Science Park 4-6.
Authentication and Authorisation for Research and Collaboration Heiko Hütter, Martin Haase, Peter Gietz, David Groep AARC 3 rd.
Update on Data Preservation (CERN / WLCG Scope) WLCG OB June 2016 International Collaboration for Data Preservation and Long Term.
LHCbComputing Update of LHC experiments Computing & Software Models Selection of slides from last week’s GDB
Ian Bird LCG Project Leader Summary of EGI workshop.
School on Grid & Cloud Computing International Collaboration for Data Preservation and Long Term Analysis in High Energy Physics.
Ian Bird, CERN WLCG Project Leader Amsterdam, 24 th January 2012.
Digital Sustainability on the EU Policy Level
HEP LTDP Use Case & EOSC Pilot
Long Term Data Preservation meets the European Open Science Cloud
Certification of CERN as a Trusted Digital Repository
iPRES 2016, CH
RDA US Science workshop Arlington VA, Aug 2014 Cees de Laat with many slides from Ed Seidel/Rob Pennington.
EOSCpilot WP4: Use Case 5 Material for
APARSEN Webinar, November 2014
2. ISO Certification Discussed already at 2015 PoW and several WLCG OB meetings Proposed approach: An Operational Circular that describes the organisation's.
Process of the 2nd update of the European Strategy for Particle Physics FCC week, 29 May 2017, Berlin Sijbrand de Jong, President of the CERN Council (slides.
What does DPHEP do? DPHEP has become a Collaboration with signatures from the main HEP laboratories and some funding agencies worldwide. It has established.
Sergio Andreozzi Strategy and Policy Manager (EGI.eu)
Bird of Feather Session
Presentation transcript:

The DPHEP Collaboration & Project(s) Services, Common Projects, Business Model(s) EGI “towards H2020” Workshop December 2013 International Collaboration for Data Preservation and Long Term Analysis in High Energy Physics

Introduction This presentation is based on the Poster produced for the RDA plenary in Washington Further information can be found in the other documents attached to the agenda Caveat: role of project money in addressing very-long term problems – “eyes wide open” 2

The Problem The data from the world’s particle accelerators and colliders is both costly and time consuming to produce. It contains a wealth of scientific potential, plus high value for educational outreach. Given that much of the data is unique, it is essential to preserve not only the data but also the full capability to reproduce past analyses and perform new ones. There are numerous cases where data from a past experiment was re-analyzed: we must retain the ability in the future. 3

The Approach Whilst retaining a holistic view, the problem is broken down into a number of key areas. Each is addressed using state-of-the-art techniques, that include: 1.Digital library tools (Invenio 2 ) & services (CDS 3, INSPIRE 4, ZENODO 5 ) 2.Sustainable software, coupled with advanced virtualization techniques 6 and validation frameworks 7 3.Proven bit preservation at the 100PB scale, together with a sustainable funding model with an outlook to 2040/50 4

INSPIRE Example 5

Results Several tens of PB of data – from the BaBar experiment at SLAC, the CDF and D0 experiments at Fermilab, as well as the H1, HERMES and ZEUS experiments at DESY are preserved. Data from the LEP and LHC experiments at CERN as well as that from many others, is also addressed with a strong focus on sustainable solutions. These preservation activities build on the tools and services listed above.  LHC data will grow from 100PB to ~1EB: all must be preserved 6

DPHEP Preservation Levels 7

8

Partners DPHEP involves all major laboratories and experiments worldwide, plus a number of key funding agencies. Working through the RDA, the APA and others, the intent is to share experience, services and tools as widely as possible. Bit preservation experience and services at the 100PB scale were presented recently at the RDA Europe event. Detailed curation costs were shown at the 4C workshop held after iPRES and will be followed at a dedicated workshop in January 9

Partners 10

Conclusions DPHEP can bring valuable experience, tools and services to help tackle the Long Term Data Preservation issue for all disciplines. Peer-to-peer collaboration with other communities – and targetted funding – are the keys to building a sustainable long-term solution. Significant experience with designing software for long-term sustainability as well as migrations across many generations of computing infrastructures are also expected to be important. 11

12

Summary (Abstract) The goal of the DPHEP collaboration is to enable the long- term preservation of scientific data and knowledge from the world’s High Energy Physics laboratories. Ideally, the full potential of the data should be retained, allowing for new scientific output to be achieved with the archived data. This requires not only “bit preservation” ranging from tens to hundreds of Petabytes but also significant amounts of documentation, metadata, software and “knowledge”. The current objectives include close collaboration with other disciplines and efforts in this domain, the establishment of sustainable solutions with associated cost and funding models and the identification of common solutions, services and standards. 13

BACKUP 14

Brief Background (<2013) DPHEP started as a Study Group in 2008/9 – In the “grid world”, this was between CCRC’08 / STEP’09 & the time of EGEE III / EGI_DS It delivered a Blueprint in May 2012, a summary of which was input to the ESPP in KRKBlueprinESPP – “as well as infrastructures for data analysis, data preservation and distributed data-intensive computing should be maintained and further developed.”  The main recommendations of the Blueprint – including the appointment of a full time project manager – are now being implemented This includes moving to a “Collaboration” (difficult) 15

DPHEP – 1 st Workshop “The problem is substantial and past experience shows that early preparation is needed and sufficient resources should be allocated.” “The “raison d’être” of data preservation should be clearly and convincingly formulated, including a viable economic model.” 16

2013+ During this year, we have built / strengthened links with other communities & projects – This (IMHO) has helped us a great deal! We have converged on a small set of services – Instantiated at multiple sites / collaborations And a similar number of (potential) joint projects  But big questions still remains: how to support long-term (multi-decade) data preservation – M+P; budget lines (APT), resource review (RRB) etc. – Interaction with projects / collaborations such as APA(RSEN), 4C, etc. 17

ICFA Statement on LTDP The International Committee for Future Accelerators (ICFA) supports the efforts of the Data Preservation in High Energy Physics (DPHEP) study group on long-term data preservation and welcomes its transition to an active international collaboration with a full-time project manager. It encourages laboratories, institutes and experiments to review the draft DPHEP Collaboration Agreement with a view to joining by mid- to late ICFA notes the lack of effort available to pursue these activities in the short-term and the possible consequences on data preservation in the medium to long-term. We further note the opportunities in this area for international collaboration with other disciplines and encourage the DPHEP Collaboration to vigorously pursue its activities. In particular, the effort required to prepare project proposals must be prioritized, in addition to supporting on-going data preservation activities. ICFA notes the important benefits of long-term data preservation to exploit the full scientific potential of the, often unique, datasets. This potential includes not only future scientific publications but also educational outreach purposes, and the Open Access policies emerging from the funding agencies. 15 March

2020 Vision for LT DP in HEP Long-term – e.g. LC timescales: disruptive change – By 2020, all archived data – e.g. that described in Blueprint, including LHC data – easily findable, fully usable by designated communities with clear (Open) access policies and possibilities to annotate further – Best practices, tools and services well run-in, fully documented and sustainable; built in common with other disciplines, based on standards – DPHEP portal, through which data / tools accessed  Vision achievable, but we are far from this today 19

“Summary” of CHEP Workshop Services: sustainable bit-level preservation for multiple decades; INSPIRE, CDS, HEPData, … Projects: Rivet, Recast, “CERNLIB consortium”, DPHEP Portal, Validation Tools, Virtualisation Tools etc. Business Plan: based on clear Use + Business Cases and Costs -> explicit funding in MTPs 20

Use Cases Three Use Cases have been identified, based on the “Problem Statement(s)” in the DPHEP Blueprint They are simple enough for discussions with non-experts They may be over-simplified but IMHO this does not dramatically alter the bottom line 21

1 – Long Tail of Papers 22

2 – New Theoretical Insights 23

3 – “Discovery” to “Precision” 24

4 – (whatever) There is a general feeling that “we” should preserve data “forever” “just in case” No clear business case An understanding of the costs can help clarify the strategy (e.g. “best effort” – bit preservation + ?) Preservation of data + software + knowledge beyond human lifetimes not obvious… (Cost benefit analysis) – See PV2013 “South Atlantic Anomaly” 25

Use Case Summary 1.Keep data usable for ~1 decade 2.Keep data usable for ~2 decades 3.Keep data usable for ~3 decades Re-visit after we have understood costs & cost models, plus potential “solutions” 26

COSTS AND COST MODELS 27

Costs – Introduction We do not know exactly what the costs will be in the future But, we can make estimates, based on our “knowledge” and experience In some areas these estimates will be relatively accurate In others, much less so “Acceptable” costs compared to what? – Cost of LHC? WLCG? A specific service, such as DB? 28

A DB Service Costs include: – Hardware; – Licenses & maintenance; – People. There is also value = business case  10 = EUR1M/year 29

Costs of Curation Workshop Within DPHEP, and in collaboration with external projects (e.g. 4C), we are planning a “no stone left un-turned” workshopworkshop Look at the many migrations we have performed in the (recent) past – plus those foreseen  Estimate / calculate costs Come up with scenarios for the future: – 10 year preservation = 3 media migrations + n build systems + p s/w repositories + q O/S versions + … – 20 year preservation: more disruptive changes – 30 year preservation: more still  Manpower almost certainly the dominant cost What can we do to optimize it? Coordinate validation activities -> service Streamline emulation activities -> tool-kit(s) Best practices & support for migration activities -> support activity Can we do things in a way that costs less in the future – and make our data more “preservational”? 30