Presentation is loading. Please wait.

Presentation is loading. Please wait.

Exploring the boundaries of MARC21 — creating a metadata schema for the CERN Open Data Portal Patricia Herterich CERN GS-SIS, Humboldt-Universität zu Berlin.

Similar presentations

Presentation on theme: "Exploring the boundaries of MARC21 — creating a metadata schema for the CERN Open Data Portal Patricia Herterich CERN GS-SIS, Humboldt-Universität zu Berlin."— Presentation transcript:

1 Exploring the boundaries of MARC21 — creating a metadata schema for the CERN Open Data Portal Patricia Herterich CERN GS-SIS, Humboldt-Universität zu Berlin @pherterich – ORCID: 0000-0002-4542-9906 ELAG 2015, Stockholm

2 CERN and High-Energy Physics LHC: 27 km, 4 detectors 10‘000 scientists & engineers from 100 countries Higgs boson discovery in July 2012 Copyright: CERN

3 Research Data in HEP

4 Data sharing in HEP  Data policies available for the 4 LHC experiments:  Data supplementary to publications available through the repository HEPdata  More and more supplementary data integrated into the main information system for HEP, INSPIRE

5 INSPIRE MARC21  Most metadata derived from and linked to publication

6 The CERN Open Data Portal

7 The CERN Open Data Portal  Public access point to data (including software and documentation) produced at CERN  Launched in November 2014  Access to 27 TB of CMS data + educational data from all 4 LHC experiments  Datasets get minted DOIs  Based on Invenio, a digital library software developed at CERN

8 The metadata challenge

9  Metadata input for the portal is through MARCXML  MARC21 had to be extended to host the necessary metadata  Broad interpretation of some fields  Creation of new customised fields

10 Current implementation


12 Going beyond MARC21 And extending the scope of research data management in HEP

13 The CERN Analysis Preservation system  A closed system to preserve data and associated objects and information to allow reproducibility of an analysis  Discovery tool for data and analyses  Integration of structured analysis preservation info into publication approval workflows

14 Current prototype

15 More metadata challenges…  Even more complex metadata than for the CERN Open Data Portal, thus MARC21 is not an option  Invenio uses JSON internally  JSON is used by most of the collaborations’ databases  Chance to create a standard data model for the HEP community to facilitate data and information exchange  Can be extended to JSON-LD


17 An ontology for HEP data analysis?  Kick-Off workshop in May 2015  Collaboration with DASPOS (Data and Software Preservation for Open Science, Notre Dame University, Indiana) and Data Semantics Lab (Wright State University, Ohio)  Work will continue throughout the year to complete the modelling and formalise the ontologies for implementation

18 Example: Detector Final State

19 Next steps  Model mindmap in graphs & formalise ontologies  Have an improved prototype of the CERN Analysis Preservation system ready by the end of the summer

20 Acknowledgements CERN IT J. Cowton, P. Fokianos, J. Kunčar, T. Smith, T. Šimko CERN SIS S. Dallmeier-Tiessen, L. Rueda, S. Mele ALICE M. Gheata, C. Grigoras ATLAS K. Cranmer, L. Heinrich, D. Rousseau, F. Socher CMS A. Calderon, A. Huffman, K. Lassila-Perini, T. McCauley, A. Rao, A. Rodriguez Marrero LHCb S. Amerio, B. Couturier, A. Trisovic CERN CernVM J. Blomer CERN EOS L. Mascetti DASPOS M. Hildreth, C. Vardeman DPHEP F. Berghaus, J. Shiers All the participants of the VoCamp @Notre Dame University in May 2015 Work sponsored by the Wolfgang Gentner Programme of the Federal Ministry of Education and Research

Download ppt "Exploring the boundaries of MARC21 — creating a metadata schema for the CERN Open Data Portal Patricia Herterich CERN GS-SIS, Humboldt-Universität zu Berlin."

Similar presentations

Ads by Google