Download presentation
Presentation is loading. Please wait.
Published byJody Nelson Modified over 9 years ago
1
Exploring the boundaries of MARC21 — creating a metadata schema for the CERN Open Data Portal Patricia Herterich CERN GS-SIS, Humboldt-Universität zu Berlin @pherterich – patricia.herterich@cern.chpatricia.herterich@cern.ch ORCID: 0000-0002-4542-9906 ELAG 2015, Stockholm
2
CERN and High-Energy Physics LHC: 27 km, 4 detectors 10‘000 scientists & engineers from 100 countries Higgs boson discovery in July 2012 Copyright: CERN
3
Research Data in HEP
4
Data sharing in HEP Data policies available for the 4 LHC experiments: http://opendata.cern.ch/collection/data-policies Data supplementary to publications available through the repository HEPdata More and more supplementary data integrated into the main information system for HEP, INSPIRE
5
INSPIRE MARC21 Most metadata derived from and linked to publication
6
The CERN Open Data Portal http://opendata.cern.ch/
7
The CERN Open Data Portal Public access point to data (including software and documentation) produced at CERN Launched in November 2014 Access to 27 TB of CMS data + educational data from all 4 LHC experiments Datasets get minted DOIs Based on Invenio, a digital library software developed at CERN
8
The metadata challenge
9
Metadata input for the portal is through MARCXML MARC21 had to be extended to host the necessary metadata Broad interpretation of some fields Creation of new customised fields
10
Current implementation https://github.com/cernopendata/opendata.cern.ch/tree/production/invenio_opendata/testsuite/data
12
Going beyond MARC21 And extending the scope of research data management in HEP
13
The CERN Analysis Preservation system A closed system to preserve data and associated objects and information to allow reproducibility of an analysis Discovery tool for data and analyses Integration of structured analysis preservation info into publication approval workflows
14
Current prototype
15
More metadata challenges… Even more complex metadata than for the CERN Open Data Portal, thus MARC21 is not an option Invenio uses JSON internally JSON is used by most of the collaborations’ databases Chance to create a standard data model for the HEP community to facilitate data and information exchange Can be extended to JSON-LD
16
https://drive.google.com/file/d/0B9fGRYX4RNaSWktUTjZ0WlpYZzQ/view?usp=sharing
17
An ontology for HEP data analysis? Kick-Off workshop in May 2015 Collaboration with DASPOS (Data and Software Preservation for Open Science, Notre Dame University, Indiana) and Data Semantics Lab (Wright State University, Ohio) Work will continue throughout the year to complete the modelling and formalise the ontologies for implementation
18
Example: Detector Final State
19
Next steps Model mindmap in graphs & formalise ontologies Have an improved prototype of the CERN Analysis Preservation system ready by the end of the summer
20
Acknowledgements CERN IT J. Cowton, P. Fokianos, J. Kunčar, T. Smith, T. Šimko CERN SIS S. Dallmeier-Tiessen, L. Rueda, S. Mele ALICE M. Gheata, C. Grigoras ATLAS K. Cranmer, L. Heinrich, D. Rousseau, F. Socher CMS A. Calderon, A. Huffman, K. Lassila-Perini, T. McCauley, A. Rao, A. Rodriguez Marrero LHCb S. Amerio, B. Couturier, A. Trisovic CERN CernVM J. Blomer CERN EOS L. Mascetti DASPOS M. Hildreth, C. Vardeman DPHEP F. Berghaus, J. Shiers All the participants of the VoCamp @Notre Dame University in May 2015 Work sponsored by the Wolfgang Gentner Programme of the Federal Ministry of Education and Research
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.