CERN – IT Department CH-1211 Genève 23 Switzerland t Data Publishing Tim Smith CERN/IT
Easy, in essence…
Challenging, in practice Bit Rot Media Verification Media Migration Technology tracking
Open Data as a Service REST API REST API OAI- PMH API OAI- PMH API Open Data Pilot
Low Barriers
Beware the False Summit Data Publication Science
Digital Dark Ages Scientific method Propose hypotheses to explain phenomena Test hypotheses predictions through repeatable experiment Share observations and conclusions for independent scrutiny, reproduction and verification Publication: Preparation (standardisation), issuing
Accessible Normalisation
Interpretable Raw Reconstructed Reduced Published Data Reduction / Analysis SW: 10M LoC
Zenodo – GitHub bridge.zenodo.json
Code ↔ Data ↔ Paper
Interpretable Raw Calibrate Filter Transform Reconstructed Reduced Select Published Anonymised Standardised Annotated Data Reduction / Analysis Calibration data Conditions data Formatters Filter/Selection algorithms Statistical Models
Repeatability Capture –Entire workflow –With data, code, statistical models, documentation –Environment, Virtual Machines
Verification and Reproduction Good software development practice: –Code test suite Unit & regression Publish data and analysis code together –Workflow and environment captured –Automated test of the result rerunconfirmed