Long-Term Data Preservation WLCG Overview Board, March 2013 Twitter: #DPHEP International Collaboration for Data Preservation and.

Slides:



Advertisements
Similar presentations
Support for the coordination of activities TECHNOLOGY PLATFORMS Context, Rationale and State of Play Presentation by Julie Sors European Commission Rotterdam,
Advertisements

U.S. Department of the Interior U.S. Geological Survey Beyond the Archive Task Team Discussion at WGISS #25 February, 2008 Lyndon R. Oleson U.S. Geological.
Alan Edwards European Commission 5 th GEO Project Workshop London, UK 8-9 February 2011 * The views expressed in these slides may not in any circumstances.
Project management Project manager must;
Particle Physics, Nuclear Physics and Particle Astrophysics meeting October 2010 John Womersley Director, Science Programmes, STFC.
Action Implementation and Monitoring A risk in PHN practice is that so much attention can be devoted to development of objectives and planning to address.
DATA PRESERVATION IN ALICE FEDERICO CARMINATI. MOTIVATION ALICE is a 150 M CHF investment by a large scientific community The ALICE data is unique and.
GEO Work Plan Symposium 2012 ID-05 Resource Mobilization for Capacity Building (individual, institutional & infrastructure)
HEPAP and P5 Report DIET Federation Roundtable JSPS, Washington, DC; April 29, 2015 Andrew J. Lankford HEPAP Chair University of California, Irvine.
Procurement Innovation for Cloud Services in Europe CERN – 14 May 2014 Bob Jones (CERN) This document produced by Members of the Helix Nebula consortium.
Date Coordinator Name(s) Other Leadership Name(s) ABC Coalition Clean Cities Re-designation.
Long-Term Data Preservation in HEP Challenges, Opportunities and Solutions(?) Workshop on Best Practices for Data Management & Sharing.
Long-Term Data Preservation in HEP Challenges, Opportunities and Solutions(?) Joint Data Preservation RDA-3 International Collaboration.
Getting Involved in the Research Data Alliance Stefanie Kethers
Sharing Research Data Globally Alan Blatecky National Science Foundation Board on Research Data and Information.
Relationships July 9, Producers and Consumers SERI - Relationships Session 1.
Technology Strategy Board Driving Innovation Participation in Framework Programme 7 Octavio Pernas, UK NCP for Health (Industry) 11 th April 2012.
Towards a European network for digital preservation Ideas for a proposal Mariella Guercio, University of Urbino.
Towards the definition of an eIRGRoma, 10 December An e-Infrastructure in Europe: a strategy and policy driven approach for a policy eIRG A pink.
Event Management & ITIL V3
Long-Term Data Preservation: Debriefing Following RDA-4 WLCG GDB, October 2014
1 C.Diaconu, DPHEP3, CERN, December 7-9, 2009 Blueprint Start the production of a detailed document on data preservation – Gets in details of the individual.
ESTELA Summer Workshop, 26 June 2013 The EU-SOLARIS project.
Workshop summary Ian Bird, CERN WLCG Workshop; DESY, 13 th July 2011 Accelerating Science and Innovation Accelerating Science and Innovation.
Astroparticle Physics for Europe ASPERA – 2 Workpackage 5 1. Workpackage 5 - European-wide common calls and other common actions Deborah Miller, STFC.
Data Preservation in High Energy Physics Towards a Global Effort for Sustainable Long-Term Data Preservation in HEP
The DPHEP Collaboration & Project(s) Services, Common Projects, Business Model(s) PH/SFT Group Meeting December 2013 International.
Addressing the Challenges of Implementation of the Results of National Research Initiatives From an Implementing Agency Perspective and from a National.
GOVST-3, Paris, November 2011 GOVST Symposium, Review etc. Andreas Schiller, Eric Dombrowsky and Kirsten Wilmer-Becker.
Consultant Advance Research Team. Outline UNDERSTANDING M&E DATA NEEDS PEOPLE, PARTNERSHIP AND PLANNING 1.Organizational structures with HIV M&E functions.
DPHEP7 / DASPOS Closing DPHEP7, March 2013 International Collaboration for Data Preservation and Long Term Analysis in High Energy.
Ian Bird GDB CERN, 9 th September Sept 2015
Future computing strategy Some considerations Ian Bird WLCG Overview Board CERN, 28 th September 2012.
WP6 – Dissemination Project Name: Enhancing Students Participation in Quality Assurance in Armenian HE- ESPAQ Ref TEMPUS BE-TEMPUS-SMGR.
Collaboration to Clarify the Costs of Curation CERN Costs Workshop Activities and Approaches to Cost Modelling in the 4C Project 13 – 14 January 2014 Germán.
1 Future Circular Collider Study Preparatory Collaboration Board Meeting September 2014 R-D Heuer Global Future Circular Collider (FCC) Study Goals and.
Long-Term Data Preservation DPHEP Project Manager International Collaboration for Data Preservation and Long Term Analysis in High.
Data Preservation in HEP Use Cases, Business Cases, Costs & Cost Models Grid Deployment Board International Collaboration for Data.
12 March, 2002 LCG Applications Area - Introduction slide 1 LCG Applications Session LCG Launch Workshop March 12, 2002 John Harvey, CERN LHCb Computing.
INFSO-RI Enabling Grids for E-sciencE The EGEE Project Owen Appleton EGEE Dissemination Officer CERN, Switzerland Danish Grid Forum.
#DPHEP: Status and Outlook Sustainable Strategies for Long-Term DP at the Exa-scale LHCC Referees Meeting International Collaboration.
Infrastructure Breakout What capacities should we build now to manage data and migrate it over the future generations of technologies, standards, formats,
LHC Computing, SPC-FC-CC-C; H F Hoffmann1 CERN/2379/Rev: Proposal for building the LHC computing environment at CERN (Phase 1) Goals of Phase.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Operations Automation Team Kickoff Meeting.
International Collaboration for Data Preservation and Long Term Analysis in High Energy Physics RECODE - Final Workshop - January.
DPHEP7 / DASPOS Introduction DPHEP7, March 2013 International Collaboration for Data Preservation and Long Term Analysis in High Energy.
SciencePAD Open Software for Open Science Alberto Di Meglio – CERN.
NSF INCLUDES Inclusion Across the Nation of Learners of Underrepresented Discoverers in Engineering and Science AISL PI Meeting, March 1, 2016 Sylvia M.
DPHEP – International Perspectives
The DPHEP Collaboration & Project(s) Services, Common Projects, Business Model(s) EGI “towards H2020” Workshop December 2013 International.
Side Event: Capacity-Building Strategy Initiative for Central, Eastern and South-Eastern Europe Region 37th session of the World Heritage Committee Phnom.
Preparing Data Management Plans for WLCG and HNISciCloud IT International Collaboration for Data Preservation and Long Term.
Consumers, Health, Agriculture and Food Executive Agency 3rd Health Programme The Electronic Submission System (JA 2015) Georgios MARGETIDIS.
Financial Services Sector Coordinating Council (FSSCC) 2011 KEY FSSCC INITIATIVES 2011 Key FSSCC Initiatives Project Name: Project Description: All-Hazards.
Data Preservation in HEP Use Cases, Business Cases, Costs & Cost Models Grid Deployment Board International Collaboration for Data.
Usecases: 1.ISIS Neutron Source 2.DP for HEP Matthew Viljoen STFC, UK APARSEN-EGI workshop: preserving big data for research Amsterdam Science Park 4-6.
Memorandum of Cooperation for the CLIC detector and physics study Lucie Linssen Monthly meeting 19/11/
COST Action and European GBIF Nodes Anne-Sophie Archambeau.
The Lead Agency Council Sports Trust (Sport Otago) Cluster of clubs Interested parties / other.
School on Grid & Cloud Computing International Collaboration for Data Preservation and Long Term Analysis in High Energy Physics.
ARIES WP2 Task 2.2 kick-off Coordination, support and enhancement of communication/outreach activities for accelerators in Europe Jennifer Toes (CERN),
HEP LTDP Use Case & EOSC Pilot
EOSCpilot WP4: Use Case 5 Material for
Information Sharing for Integrated care A 5 Step Blueprint
2. ISO Certification Discussed already at 2015 PoW and several WLCG OB meetings Proposed approach: An Operational Circular that describes the organisation's.
What does DPHEP do? DPHEP has become a Collaboration with signatures from the main HEP laboratories and some funding agencies worldwide. It has established.
Portfolio, Programme and Project
Strategy
Preliminary Project Execution Plan
LHC Computing, RRB; H F Hoffmann
Presentation transcript:

Long-Term Data Preservation WLCG Overview Board, March 2013 Twitter: #DPHEP International Collaboration for Data Preservation and Long Term Analysis in High Energy Physics

Overview Summary of DPHEP Blueprint recommendations Opportunities: collaboration with other disciplines & funding A “2020 vision” and its implementation 2

DPHEP BLUEPRINT 3

DPHEP Entities Organisational BodyDescriptionInput and positioningDPHEP Output DPHEPOrganisation for Data Preservation in High- Energy Physics Projects in data preservation at experiment and laboratory level Working groups on common projects, status report documents DPHEP ChairOverall coordination of DPHEP Appointed by ICFA, represents DPHEP in relationship with other bodies Yearly reports to ICFA, representation to other related scientific bodies DPHEP Project ManagerProject management, administrative, technical, funding Main operational coordinator, maintain contacts, organises meetings, lead proposals for funding Reports to the steering committee Advisory committeeGroup of external personalities Synergy with the wider HEP community, input from other fields and initiatives Project proposals, documents for scrutiny Steering committeeInternal executive body, chaired by the DPHEP Chair Contributions from the participation members Strategic and operational decisions Funding bodiesFunding agencies are invited to take note on the progress reports and periodically analyse the relevance of the funding Direct funding to the DPHEP organisation, under the supervision of the Project Manager Quarterly progress reports 4

DPHEP Entities Organisational BodyDescriptionInput and positioningDPHEP Output DPHEPOrganisation for Data Preservation in High- Energy Physics Projects in data preservation at experiment and laboratory level Working groups on common projects, status report documents DPHEP ChairOverall coordination of DPHEP Appointed by ICFA, represents DPHEP in relationship with other bodies Yearly reports to ICFA, representation to other related scientific bodies DPHEP Project ManagerProject management, administrative, technical, funding Main operational coordinator, maintain contacts, organises meetings, lead proposals for funding Reports to the steering committee Advisory committeeGroup of external personalities Synergy with the wider HEP community, input from other fields and initiatives Project proposals, documents for scrutiny Steering committeeInternal executive body, chaired by the DPHEP Chair Contributions from the participation members Strategic and operational decisions Funding bodiesFunding agencies are invited to take note on the progress reports and periodically analyse the relevance of the funding Direct funding to the DPHEP organisation, under the supervision of the Project Manager Quarterly progress reports Implemented via multi- lateral Collaboration Agreement (draft circulated) 5

DPHEP Entities Organisational BodyDescriptionInput and positioningDPHEP Output DPHEPOrganisation for Data Preservation in High- Energy Physics Projects in data preservation at experiment and laboratory level Working groups on common projects, status report documents DPHEP ChairOverall coordination of DPHEP Appointed by ICFA, represents DPHEP in relationship with other bodies Yearly reports to ICFA, representation to other related scientific bodies DPHEP Project ManagerProject management, administrative, technical, funding Main operational coordinator, maintain contacts, organises meetings, lead proposals for funding Reports to the steering committee Advisory committeeGroup of external personalities Synergy with the wider HEP community, input from other fields and initiatives Project proposals, documents for scrutiny Steering committeeInternal executive body, chaired by the DPHEP Chair Contributions from the participation members Strategic and operational decisions Funding bodiesFunding agencies are invited to take note on the progress reports and periodically analyse the relevance of the funding Direct funding to the DPHEP organisation, under the supervision of the Project Manager Quarterly progress reports Chair of Study Group was Cristinel Diaconu / CPPM & DESY who continues in this role 6

DPHEP Entities Organisational BodyDescriptionInput and positioningDPHEP Output DPHEPOrganisation for Data Preservation in High- Energy Physics Projects in data preservation at experiment and laboratory level Working groups on common projects, status report documents DPHEP ChairOverall coordination of DPHEP Appointed by ICFA, represents DPHEP in relationship with other bodies Yearly reports to ICFA, representation to other related scientific bodies DPHEP Project ManagerProject management, administrative, technical, funding Main operational coordinator, maintain contacts, organises meetings, lead proposals for funding Reports to the steering committee Advisory committeeGroup of external personalities Synergy with the wider HEP community, input from other fields and initiatives Project proposals, documents for scrutiny Steering committeeInternal executive body, chaired by the DPHEP Chair Contributions from the participation members Strategic and operational decisions Funding bodiesFunding agencies are invited to take note on the progress reports and periodically analyse the relevance of the funding Direct funding to the DPHEP organisation, under the supervision of the Project Manager Quarterly progress reports CERN provides Project Manager 2013 – 2015 after which may rotate 7

DPHEP Entities Organisational BodyDescriptionInput and positioningDPHEP Output DPHEPOrganisation for Data Preservation in High- Energy Physics Projects in data preservation at experiment and laboratory level Working groups on common projects, status report documents DPHEP ChairOverall coordination of DPHEP Appointed by ICFA, represents DPHEP in relationship with other bodies Yearly reports to ICFA, representation to other related scientific bodies DPHEP Project ManagerProject management, administrative, technical, funding Main operational coordinator, maintain contacts, organises meetings, lead proposals for funding Reports to the steering committee Advisory committeeGroup of external personalities Synergy with the wider HEP community, input from other fields and initiatives Project proposals, documents for scrutiny Steering committeeInternal executive body, chaired by the DPHEP Chair Contributions from the participation members Strategic and operational decisions Funding bodiesFunding agencies are invited to take note on the progress reports and periodically analyse the relevance of the funding Direct funding to the DPHEP organisation, under the supervision of the Project Manager Quarterly progress reports Broadened to include “influential” names, e.g. from APA, SCIDIP-ES 8

DPHEP Entities Organisational BodyDescriptionInput and positioningDPHEP Output DPHEPOrganisation for Data Preservation in High- Energy Physics Projects in data preservation at experiment and laboratory level Working groups on common projects, status report documents DPHEP ChairOverall coordination of DPHEP Appointed by ICFA, represents DPHEP in relationship with other bodies Yearly reports to ICFA, representation to other related scientific bodies DPHEP Project ManagerProject management, administrative, technical, funding Main operational coordinator, maintain contacts, organises meetings, lead proposals for funding Reports to the steering committee Advisory committeeGroup of external personalities Synergy with the wider HEP community, input from other fields and initiatives Project proposals, documents for scrutiny Steering committeeInternal executive body, chaired by the DPHEP Chair Contributions from the participation members Strategic and operational decisions Funding bodiesFunding agencies are invited to take note on the progress reports and periodically analyse the relevance of the funding Direct funding to the DPHEP organisation, under the supervision of the Project Manager Quarterly progress reports Representatives of parties to Collaboration Agreement 9

DPHEP Entities Organisational BodyDescriptionInput and positioningDPHEP Output DPHEPOrganisation for Data Preservation in High- Energy Physics Projects in data preservation at experiment and laboratory level Working groups on common projects, status report documents DPHEP ChairOverall coordination of DPHEP Appointed by ICFA, represents DPHEP in relationship with other bodies Yearly reports to ICFA, representation to other related scientific bodies DPHEP Project ManagerProject management, administrative, technical, funding Main operational coordinator, maintain contacts, organises meetings, lead proposals for funding Reports to the steering committee Advisory committeeGroup of external personalities Synergy with the wider HEP community, input from other fields and initiatives Project proposals, documents for scrutiny Steering committeeInternal executive body, chaired by the DPHEP Chair Contributions from the participation members Strategic and operational decisions Funding bodiesFunding agencies are invited to take note on the progress reports and periodically analyse the relevance of the funding Direct funding to the DPHEP organisation, under the supervision of the Project Manager Quarterly progress reports e.g. EU, NSF, STFC, INFN, … 10

DPHEP Blueprint Deliverables ObjectiveDeliverable (Measurable) Positioning as forumCatalogue of technical knowledge and practical solutions Description of possible alternatives for governance. Co-ordination of projectsCommon R&D projects meet the expectations of the stakeholders. Harmonisation and liaison Synchronisation of preservation projects in the field. Identification of areas where external knowledge needs to be transferred to HEP. Design sustainable future Characterisation of discipline-wide toolkit for preservation Business plan for long-term preservation in HEP. Outreach and advocacyUnderstanding of needs/opportunities for medium- and small-sized collaborations. Concrete discussions with funding bodies/laboratories. Proposed activities of the DPHEP Organization – p85, Blueprint document. These deliverables are to be met within 2 years of becoming fully operational. 11

DPHEP Preservation Levels Preservation ModelUse case 1.Provide additional documentation Publication-related information search 2.Preserve the data in a simplified format Outreach, simple training analyses 3.Preserve the analysis level software and data format Full scientific analysis based on existing reconstruction 4.Preserve the reconstruction and simulation software and basic level data Full potential of the experimental data 12

DPHEP Levels Preservation ModelUse case 1.Provide additional documentation Publication-related information search 2.Preserve the data in a simplified format Outreach, simple training analyses 3.Preserve the analysis level software and data format Full scientific analysis based on existing reconstruction 4.Preserve the reconstruction and simulation software and basic level data Full potential of the experimental data 13

DPHEP Levels Preservation ModelUse case 1.Provide additional documentation Publication-related information search 2.Preserve the data in a simplified format Outreach, simple training analyses 3.Preserve the analysis level software and data format Full scientific analysis based on existing reconstruction 4.Preserve the reconstruction and simulation software and basic level data Full potential of the experimental data 14

DPHEP Levels Preservation ModelUse case 1.Provide additional documentation Publication-related information search 2.Preserve the data in a simplified format Outreach, simple training analyses 3.Preserve the analysis level software and data format Full scientific analysis based on existing reconstruction 4.Preserve the reconstruction and simulation software and basic level data Full potential of the experimental data HepMC / Rivet toolkit may play a useful – and sustainable – role here. See DPHEP7 15

DPHEP Summary There is a lot of knowledge and experience in the existing DPHEP community that can be leveraged for other efforts, e.g. LHC & LEP LHC is clearly of key interest to WLCG OB but we should not forget LEP before it is too late!LEP On-going (small) effort to document current situation and options for moving forward  CERNLIB felt to be (a) critical factor but there are many external distributions 16

OPPORTUNITIES & FUNDING 17

Collaboration with others Many other disciplines, ranging from science to arts & humanities, already (very) active Numerous conferences and workshops have been up and running for years We have been accepted – partly due to halo effect of the Higgs discovery – with open arms Concrete discussions on further collaboration are funding advancing well  Not limited to Data Preservation – e.g. SKA! 18

Funding DASPOS is up and running with NSF funding Research Data Alliance – with indirect EU, NSF, AUS and other funding – will play a role – Co-chair of RDA WG on DP Clear signs that EU Horizon 2020 will include Data Preservation – e-IRG meeting, EIROforum w/s, RDA, … Now is the time to firm up partnerships & prepare for up-coming projects  STFC and other UK bodies particularly active in above activities: how can we profit from this? 19

A 2020 VISION 20

2020 Vision for LT DP in HEP Long-term: disruptive change(s), e.g. LC era – All archived data – e.g. that described in Blueprint, including LHC data – easily findable, fully usable by designated communities with clear (Open) access policies and possibilities to annotate further – Best practices, tools and services well run-in, fully documented and sustainable; built in common with other disciplines, based on standards  Vision achievable, but we are far from this today 21

Long-Term Commitment To achieve long-term data preservation, we need long-term commitment(s) By 2035, there will have been: – 3-4 updates to the ESPP; – 4-5 new DGs; – X re-organizations of CERN-IT.  We need commitments that outlive all of these! 22

2020 Vision – The OAIS Model 23

OAIS Components In the OAIS model, there are the concepts of producer and consumer DASPOS aims to take data produced by e.g. CMS and show that e.g. ATLAS can reproduce a full analysis, using the software, meta-data, documentation etc. This exercise will be started at DPHEP7 (March 21-22) and hopefully repeated regularly – e.g. annually – so that by 2020 the entire process is well understood, documented and repeatable  It is proposed that the (Archive) Information Packages are simply XML documents stored in Invenio The exact tool-set and feature requirement is still TBD Some tools used on a daily basis – e.g. Twiki! – not suitable for long- term archives  Good opportunity for sharing experiences and best practices with other disciplines / projects, e.g. SCIDIP-ES, APA APA – “Too Big an Issue for any single organisation – we must work together” 24

Archival Storage Experience from WLCG and beyond tells us that data loss and corruption will (and does) occur! – See WLCG SIRs, Tim Bell’s presentation to DPHEP3WLCG SIRspresentationDPHEP3 But there are things that we can do to mitigate risks and recover (often), e.g. rule-based systems: apply checksum and other “tests” upon schedule and/or actions What is the current situation at WLCG sites? Can we coordinate / agree suitable actions? Coordinate via HEPiX, IEEE MSST, APA, EUDAT, RDA etc. Collaboration with industry, e.g. IBM-led FP7 project  Recovery often performed by experiments by re-replicating data: how will this be done in the long-term? 25

DPHEP Level 4 Retaining the full potential of the data is the only really interesting option – but it is by far the most difficult! Difficult does not mean impossible – and we can profit from a period of “meta-stability” while we concentrate on this Past experiments typically ported / re-wrote major parts of their offline environment several times over a period of decades This is inevitable for LHC too – we could make this easier, but it will require an initial investment!  Collaboration with others who face similar problems could help but much of this we have to solve ourselves 26

Where to Invest? Tools and Services, e.g. Invenio Archival Storage Functionality Support to the Experiments for DPHEP Level 4 27

Suggested Topics for DPHEP7 “Ingest Issues” (10’) – How did you (the experiment) decide what data to save, how to make it discoverable / available, how is it documented, where is the data / meta-data etc. What are the access policies and target communities? – What tools do you use? “Archive issues”: (10’) – How is the archive managed? How are errors detected and handled? What is the experience? – What storage system / services are used? “Offline environment issues”: (20’) – What have been the key challenges in keeping the offline environment alive? What are the key lessons learned / pitfalls to be avoided? What would you have done differently if long-term preservation had been a goal from the early days of the experiment?  DPHEP8: around or during CHEP? TBD in coming weeks… DoodleDoodle Outline for site / experiment talks at DPHEP7, March 21-22, CERN 28

S.W.O.T. StrengthsDPHEP is well established within the community and recent contacts to other disciplines are very encouraging WeaknessesEffort is very scarce within the project at a time when manpower is already stretched to the limit elsewhere OpportunitiesThrough a convergence of events there are clear possibilities for significant funding and collaboration in the EU’s Horizon 2020 programme and most likely corresponding programmes in other areas of the world, e.g. NSF-funded projects ThreatsFailure to invest now would jeopardise attempts to “rescue” LEP data as well as to take other preservation events (BaBar, Tevatron, Hera etc.) to a stable and sustainable state. It could also limit our ability to prepare for – and hence participate in – future projects 29

Summary We have outlined the current status of Long-Term Data Preservation in HEP and areas for fruitful collaboration with others Funding, e.g. through EU Horizon 2020, is looking good – we need to invest now to secure this!  Much work needs to be done to turn a dream into reality – particularly and critically in the area of future-proof offline environments However, this is expected to result in a cost- saving in the long-term by reducing effort in inevitable migrations 30

Where to Invest – Summary Tools and Services, e.g. Invenio: could be solved. (2-3 years?) Archival Storage Functionality: should be solved. (i.e. “now”) Support to the Experiments for DPHEP Level 4: must be solved – but how? 31

International Collaboration for Data Preservation and Long Term Analysis in High Energy Physics 32