OPEN DATA Patricia Herterich 16.04.2015. On the way to Open Science… Open Source Open Access Open Data Open Science.

Slides:



Advertisements
Similar presentations
EPrints - Introducing EPrints 3 Software William J Nixon Digital Library Development Manager, University of Glasgow With many thanks to Les Carr and the.
Advertisements

DSpace: the MIT Libraries Institutional Repository MacKenzie Smith, MIT EDUCAUSE 2003, November 5 th Copyright MacKenzie Smith, This work is the.
U.S./DOE Policies and Programs for HEP Data Preservation Amber Boehnlein U.S. Department of Energy Office of High Energy Physics.
Global Health and health Informatics: Serving the underserved Paul Biondich, MD MS Regenstrief Institute & OpenMRS.
DANS is een instituut van KNAW en NWO Data Archiving and Networked Services The Front Office-Back Office model: supporting research data management in.
Web Accessible Virtual Research Environment for Ecosystem Science Community Presentation by Siddeswara Guru.
Data Publishing Workflows: Strategies and Standards
DATA PRESERVATION IN ALICE FEDERICO CARMINATI. MOTIVATION ALICE is a 150 M CHF investment by a large scientific community The ALICE data is unique and.
Open Research Data – the Citizen’s Perspective Improving Data Access and Research Transparency (DART) in Switzerland November 7th 2014 Bern, Hotel Bern.
Sara Bowman Center for Open Science Open Science Framework: Facilitating Transparency and Reproducibility.
Software Cluster Improve Collaboration and Community Engagement Work with diverse communities that contribute to the sustainability of scientific software.
Using the HUB to Study Disaster Events Eric Letvin Director, Disaster and Failure Studies Program Engineering Laboratory National Institute of Standards.
Open Data, Open Source: preparing for Big Data in Metabolomics Rob L Davidson #MetSoc2015 This presentation DOI: /m9.figshare
Key integrating concepts Groups Formal Community Groups Ad-hoc special purpose/ interest groups Fine-grained access control and membership Linked All content.
Good practice in Research Data Management Module 6: Tools, training and support.
The Earth System CoG Collaboration Environment Sylvia Murphy and Cecelia DeLuca (NOAA/CIRES), and Luca Cinquini (NASA/JPL) AGU Ocean Sciences February.
Publish Your Work BIM Curriculum 04. Topics  External Collaboration  Sharing the BIM model  Sharing Documents  Sharing the 3D model  Reviewing 
CERN – IT Department CH-1211 Genève 23 Switzerland t CERN Open Source Collaborative tools: Digital Library Software Tim Smith CERN/IT.
Teaching and Learning with Open Science Grid quarknet.fnal.gov/e-labs/
Updates from EOSDIS -- as they relate to LANCE Kevin Murphy LANCE UWG, 23rd September
Providing Access to Your Data: Tracking Data Usage Robert R. Downs, PhD NASA Socioeconomic Data and Applications Center (SEDAC) Center for International.
Open Data, Open Source: preparing for Big Data in Metabolomics Rob L Davidson #MetSoc2015 This presentation DOI: /m9.figshare
Managing Research Data – The Organisational Challenge at Oxford James A J Wilson Friday 6 th December,
Dimitris Koureas, PhD Natural History Museum London Linking layers of biodiversity data: Informatics challenges for the long tail research RDA - Long Tail.
CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services Job Monitoring for the LHC experiments Irina Sidorova (CERN, JINR) on.
PLoS ONE Application Journal Publishing System (JPS) First application built on Topaz application framework Web 2.0 –Uses a template engine to display.
USGS Metadata in the Broader Picture 1994 Executive Order – Metadata must be created for all Federally-funded research – Federal Geographic Data.
Shaping the practice and measuring the impact of Open Science Gentner Day 2014 Patricia Herterich Humboldt-Universität zu Berlin & CERN.
GBIF Mid Term Meetings 2011 Biodiversity Data Portals for GBIF Participants: The NPT Global Biodiversity Information Facility (GBIF) 3 rd May 2011.
SHARE (SHared Access Research Ecosystem) Tyler Walters Co-Chair, SHARE Steering Group (a joint committee of the ARL, the AAU, and the APLU) Eric Celeste.
ReproZip Packing Experiments for Sharing and Publication Fernando Chirigati, Juliana Freire | NYU-Poly Dennis Shasha | NYU.
Electronic labnotes Mari Wigham COMMIT/. Information WUR  Organising, sharing, finding and reusing data  Expertise in: ● Modelling data.
Symposium on Global Scientific Data Infrastructures Panel Two: Stakeholder Communities in the DWF Ann Wolpert, Massachusetts Institute of Technology Board.
CERN Content Management System Support ATLAS Requirements S. Goldfarb – 19 May 2010 (On behalf of the ATLAS Collaboration)
Firmware - 1 CMS Upgrade Workshop October SLHC CMS Firmware SLHC CMS Firmware Organization, Validation, and Commissioning M. Schulte, University.
Children’s Health Exposure Analysis Resource (CHEAR) CHEAR Center for Data Science Susan Teitelbaum, PhD November 4, 2015.
INFSO-RI Enabling Grids for E-sciencE ARDA Experiment Dashboard Ricardo Rocha (ARDA – CERN) on behalf of the Dashboard Team.
Exploring the boundaries of MARC21 — creating a metadata schema for the CERN Open Data Portal Patricia Herterich CERN GS-SIS, Humboldt-Universität zu Berlin.
| nectar.org.au NECTAR TRAINING Module 2 Virtual Laboratories and eResearch Tools.
U.S. Department of the Interior U.S. Geological Survey Decision Support Tools and USGS Data Management Best Practices Cassandra Ladino USGS Chesapeake.
Earth System Curator and Model Metadata Discovery and Display for CMIP5 Sylvia Murphy and Cecelia Deluca (NOAA/CIRES) Hannah Wilcox (NCAR/CISL) Metafor.
Access to EU microdata for research purposes
Mike Hildreth DASPOS Update Mike Hildreth representing the DASPOS project 1.
11 Researcher practice in data management Margaret Henty.
1 The NORC Data Enclave for Sensitive Microdata Timothy M. Mulcahy Senior Research Scientist, NORC/University of Chicago,
Toward a common data and command representation for quantum chemistry Malcolm Atkinson Director 5 th April 2004.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI EGI strategy and Grand Vision Ludek Matyska EGI Council Chair EGI InSPIRE.
ATLAS Data preservation April 2015 Roger Jones for the ATLAS Collaboration.
SciencePAD Open Software for Open Science Alberto Di Meglio – CERN.
Open Science (publishing) as-a-Service Paolo Manghi (OpenAIRE infrastructure) Institute of Information Science and Technologies Italian Research Council.
Open Science and Research – Services for Research Data Management © 2014 OKM ATT 2014–2017 initiative Licenced under.
SOFTWARE LIFECYCLE. What functions would ISEES perform?
Research Data Management 26 th April 2016 Federica Fina, Data Scientist, University of St Andrews Library.
| 1 Anita de Waard, VP Research Data Collaborations Elsevier RDM Services May 20, 2016 Publishing The Full Research Cycle To Support.
PARTHENOS-project.eu EOSC market demand for art, humanties and cultural heritage Amsterdam– EGI Conference– 7/4/2016 Franco Niccolucci Scientific Coordinator,
Usecases: 1.ISIS Neutron Source 2.DP for HEP Matthew Viljoen STFC, UK APARSEN-EGI workshop: preserving big data for research Amsterdam Science Park 4-6.
ODIN – ORCID and DATACITE Interoperability Network ODIN: Connecting research and researchers Sergio Ruiz - DataCite Funded by The European Union Seventh.
Kathleen Shearer Data management: The new frontier for libraries.
CyVerse Data Store Managing Your ‘Big’ Data. Welcome to the Data Store Manage and share your data across all CyVerse platforms.
INSPIREHEP … and data Sünje Dallmeier-Tiessen (CERN) for many collaborators in GS-SIS and IT-CIS.
What is Open Science and How do I do it?
Vision for CERN IT Department
Graduation Project Kick-off presentation - SET
What, why and best practices in open research
Data management for reproducible research
OpenML Workshop Eindhoven TU/e,
EOSCpilot All Hands Meeting 8 March 2018 Pisa
Research Infrastructures: Ensuring trust and quality of data
Brian Matthews STFC EOSCpilot Brian Matthews STFC
Research data lifecycle²
Presentation transcript:

OPEN DATA Patricia Herterich

On the way to Open Science… Open Source Open Access Open Data Open Science

Benefits of Open Science For society: –Public availability & reusability of scientific data –Public accessibility & transparency of scientific communication For scientific communities: Reproducibility of research results Leveraging web-based tools to facilitate scientific collaboration

Re- producible research

Open Data: incentives Funder policies

Open Data: initiatives

HEP Open Data

Data in High- Energy Physics

But how to make them open?

Release

Some numbers… At the public release: –Serving ~15 GB per hour [usage ~50 times now] –After a day or two was about ~4 GB per hour [~20 times now] Typical day now: –~1000 visitors, out of which about ~10 people download EOS files ~400 people look at detailed record pages resulting in various amounts GB being served

…and other impact We know that the CODP release resulted in: -New collaborations -Re-use of primary datasets for machine learning and “real physics” analysis -New data “mash-ups” -Adaption of code examples for new analysis

Metadata challenges

Our solution

Small scale data

Code

Open Data is just the tip of the RDM iceberg… An analysis capturing and management tool for HEP

Data Analysis Preservation Capture –Entire workflow –With data, code, statistical models, documentation –Environment, Virtual Machines –OAIS compatible Interoperability with –Experiments’ databases –Existing platforms such as the CERN Open Data Portal, INSPIRE

Sources V. Stodden, J. Borwein, and D.H. Bailey. “Setting the default to reproducible". In: computational science research. SIAM News 46 (2013), pp ATLAS Collaboration (2014). ATLAS Data Access Policy. CERN Open Data Portal. DOI: /OPENDATA.ATLAS.T9YR.Y7MZ ALICE Collaboration (2013). ALICE data preservation strategy. CERN Open Data Portal. DOI: /OPENDATA.ALICE.54NE.X2EA /OPENDATA.ALICE.54NE.X2EA CMS Collaboration (2012). CMS data preservation, re-use and open access policy. CERN Open Data Portal. DOI: /OPENDATA.CMS.UDBF.JKR /OPENDATA.CMS.UDBF.JKR9 LHCb Collaboration (2013). LHCb External Data Access Policy. CERN Open Data Portal. DOI: /OPENDATA.LHCb.HKJW.TWSZ /OPENDATA.LHCb.HKJW.TWSZ

Websites ern-makes-public-first-data-lhc-experiments ern-makes-public-first-data-lhc-experiments access-literature-ploss-data-policy/ access-literature-ploss-data-policy/ plan-at-epfl/ plan-at-epfl/

Acknowl- edgements My colleagues from the CODP and DAPF team Work sponsored by the Wolfgang Gentner Programme of the Federal Ministry of Education and Research