Presentation is loading. Please wait.

Presentation is loading. Please wait.

Certification of CERN as a Trusted Digital Repository

Similar presentations


Presentation on theme: "Certification of CERN as a Trusted Digital Repository"— Presentation transcript:

1 Jamie.Shiers@cern.ch ITMM July 2016
Certification of CERN as a Trusted Digital Repository ISO based on OAIS (ISO 14271) DPHEP and Cross-Group Opportunities (And Cross-Department, Cross-Organisation…) ITMM July 2016 International Collaboration for Data Preservation and Long Term Analysis in High Energy Physics

2 Where to start?

3

4 4 years since the Higgs discovery!

5 Background to DPHEP DPHEP started as a Study Group initiated by DESY in 2008 / 9 See and The Road to DPHEP (June 2015) It was later adopted by ICFA as a panel (1 of 7) A Blueprint Document was published in May 2012 and input to the ESPP update Data preservation included in the approved strategy document The Study Group migrated to a Collaboration from 2013 ICFA statement – Collaboration Agreement – 2015 Status Report The CERN Services (cross-department) for DP are described in this iPRES paper / IT note (IT, EP, SIS) (A (p)repeat of that PPT may make sense) The general direction for Certification is described in the June 2016 CERN Courier

6 Things I am not going to cover

7

8 DPHEP: An International study group on data preservation

9

10 Cross-group opportunities

11 Opportunities Exist… As part of the technical work providing services for DP according to the Use Cases of the experiments These closely match requirements from Funders for “Data Management Plans” Main opportunities here are: To situate on-going work as part of “something bigger” (part of ESPP) To get recognition for “background work”, e.g. for LEP As part of the Certification Process for CERN as a Trusted Digital Repository Expertise is spread over many people (not just in IT) Learn more about CERN procedures and again situate work as part of a CERN strategic activity Goal is to complete prior to next ESPP update and provide input to it In “new” activities, where past experience and knowledge may be relevant E.g. OPERA data Helping to prepare a Data Management Plan (DMP) for OPERA (& other experiments) Helping with the implementation (conversion of 70TB of Oracle data to non-proprietary format(s) ) Bottom line: “See and be seen”

12 Backup slides

13 Requirements from Funding Agencies
To integrate data management planning into the overall research plan, all proposals submitted to the Office of Science for research funding are required to include a Data Management Plan (DMP) of no more than two pages that describes how data generated through the course of the proposed research will be shared and preserved or explains why data sharing and/or preservation are not possible or scientifically appropriate. At a minimum, DMPs must describe how data sharing and preservation will enable validation of results, or how results could be validated if data are not shared or preserved. Similar requirements from European FAs and EU (H2020)

14 H2020: Annex 1 (DMP Template)
The DMP should address the points below… Data set reference and name Identifier for the DS to be produced Data set description Description; origin; nature & scale; to whom useful; underpins publication? similar data? Standards and metadata Reference to standards of the discipline Data sharing How will it be shared? Embargo periods? Mechanisms for dissemination, s/w and other tools for re-use, access open to restricted to groups, where is repository? Type of repository? Archiving and preservation Description of procedures, how long will it be preserved? End volume? Costs? How will these be covered?

15 HEP LTDP Use Cases Bit preservation as a basic “service” on which higher level components can build; “Maybe CERN does bit preservation better than anyone else in the world” Preserve data, software, and know-how in the collaborations; Basis for reproducibility; Share data and associated software with (wider) scientific community, such as theorists or physicists not part of the original collaboration; Open access to reduced data sets to general public. Basically, a reflection of DMP requirements

16 LHC Experiments’ Data Policies
These are essentially “extended DMPs” that capture the small variations between each experiment Variations in duration of embargo periods, designated communities, fraction of data released A generic “WLCG DMP” exists – just like a generic WLCG TDR (complemented by experiment-specific reports) More detail in talk about CMS experience with data releases at ADMP workshop

17 3.5. Will there be need for an adjustment of the general CERN data policy?
CERN will establish a data policy that is in line with funding agency requirements, including in terms of Open Access (Science). This can be expected to be largely similar to that adopted by the 4 main LHC experiments, with a significant fraction of the data released after a reasonable embargo period. The duration of the embargo period and the fraction of the data to be released would be determined based on experience, resource requirements and scientific, educational and cultural benefits. Given that the total dataset of the (HL-)LHC will be in the Exabyte range, the volume of data to be released will eventually become significant and the appropriate resources must be factored into any planning. 5 November 2015 IT 2016

18 Which Certification Strategy?
“Trusted” or “certified” digital repositories (Also cost recovery for repositories) Several such standards exist: CERN (WLCG) following ISO route Some sites start with DSA, then DIN, then ISO Even DANS! (The originators of DSA) This would not work at CERN… At CERN, the closest thing to a “mission statement” is an Operational Circular This, and other steps required for “certification” could not realistically be repeated as we moved up the ladder…

19 Certification – Current Status
Original idea was to perform Certification in the context of WLCG However: Quite a few of the metrics concern the (CERN) site; Interest also in an OAIS archive for “CERN’s Digital Memory”; The two are linked: policies, strategies, mission statements for the former are part of the latter Some things will be easier in the latter which will in turn help the former  Current thinking: (self-)certify site-wise; “project-specific details” via “Project DMPs”

20 Organisational Infrastructure
ISO metrics Organisational Infrastructure 3.1 Governance & Organisational Viability Mission Statement, Preservation Policy, Implementation plan(s) etc. [ CERN, CERN, project(s) ] 3.2 Organisational Structure & Staffing Duties, staffing, professional development etc. [ APT etc. ] 3.3 Procedural accountability & preservation policy framework Designated communities, knowledge bases, policies & reviews, change management, transparency & accountability etc. [ At least partially projects ] 3.4 Financial sustainability Business planning processes, financial practices and procedures etc 3.5 Contracts, licenses & liabilities For the digital materials preserved… [ CERN? Projects? ]

21 Logical to have an Operational Circular for “Data”
Obviously should include “meta-data” (as per DPHEP SR) Software + environment, documentation etc. Symmetry with OC3 and OC6 Archival material and archiving at CERN CERN scientific documents [ CERN scientific data, s/w, doc + meta-data ] This could address “Mission Statement” and “DP Policy” in ISO (as OC3 does) Complemented by: Data Preservation Plan (inter-departmental) with ~3 year outlook Include also experiment plans or as part of their DMPs? Experiment / Project Data Management Plan Data Policy (extended DMP – à la LHC)

22 Work together on this “PoW” for DP/DM
Logical to have an Operational Circular for “Data” Obviously should include “meta-data” (as per DPHEP SR) Software + environment, documentation etc. Symmetry with OC3 and OC6 Archival material and archiving at CERN CERN scientific documents [ CERN scientific data, s/w, doc + meta-data ] This could address “Mission Statement” and “DP Policy” in ISO 16363 Complemented by: Data Preservation Plan (inter-departmental) with ~3 year outlook Include also experiment plans or as part of their DMPs? Experiment / Project Data Management Plan Data Policy (extended DMP – à la LHC) Work together on this “PoW” for DP/DM

23 Infrastructure & Security Risk Management
5.1 Technical Infrastructure Risk Management [ We do all of this, but is it documented? ] Technology watches, h/w & s/w changes, detection of bit corruption or loss, reporting, security updates, storage media refreshing, change management, critical processes, handling of multiple data copies etc OC5, … 5.2 Security Risk Management [ Do we do all of this, and is it documented? ] Security risks (data, systems, personnel, physical plant), disaster preparedness and recovery plans … OC2, …

24 Covered in section 4 of ISO 16363

25 Helps Address the Goals Below.
Data Preservation & Certification of Trusted Digital Repositories: Helps Address the Goals Below. Data Management Plans: Sharing, Re-Use; Reproducibility of Results F.A.I.R. and Open Data: Requires effort & Resources

26 Concluding Remarks Data Preservation is a Journey – Not a Destination
“Once you stop pedalling, you stop & fall off” Data Preservation is not an Island – it is part of a much bigger picture, including the full data lifecycle You can’t share or re-use data, nor reproduce results, if you haven’t first preserved it

27 (Self-)Certification
Requires us to formalise / document some of our existing practices… (incl. “bit preservation”) To “complete” work in certain areas (e.g. disaster preparedness / recovery) It needs effort / knowledge from a wide range of groups / people We also need to define a “PoW for Preservation” Important milestone: update of the European Strategy for Particle Physics (ESPP): ~

28 How to move forward? In some cases, I know (we all know) who the suspects are Typically “senior” people – ideally should include also some younger people for continuity / knowledge transfer In other cases I do not know: do the GLs? We do not have to address all metrics in parallel (but could do e.g. for section 5 – “Risk Management”) Formal CERN documents, such as OCs, need to be prepared carefully: hopefully few (just one?) of these First step: identify suspects then discuss together (including with EP, SIS & expts) how to address metrics Suspects need not be / become experts in ISO 16363 Some existing thoughts already in DPHEP Wiki (DPHEP-IB e-group) Level of involvement: from a few hours (e.g. if the information exists, e.g. in PowerPoint, but not in a document with a DOI) up (e.g. for disaster recovery)

29 Volunteers Please Step Forward!


Download ppt "Certification of CERN as a Trusted Digital Repository"

Similar presentations


Ads by Google