The Economics of Data Sharing

Slides:



Advertisements
Similar presentations
A centre of expertise in digital information management UKOLN is supported by: Curating the Scientific Record: The Challenges Ahead Dr.
Advertisements

Making It Happen March 19, 2013 Anita de Waard VP Research Data Collaborations, Elsevier RDS Sustainable Data Preservation and Use.
Making sense of doi: /01/503C303E9B551 Digital Object Identifiers DOIs.
Vivien Bonazzi Ph.D. Program Director: Computational Biology (NHGRI) Co Chair Software Methods & Systems (BD2K) Biomedical Big Data Initiative (BD2K)
COST RECOVERY PATTERNS COST RECOVERY PATTERNS 23 Repositories Interviewed on Income Streams and Cost Models First Quick Scan of Replies Interest Group.
RDA/WDS IG Cost Recovery Models. 2  Welcome and short outline of the goals and activities of the IG  Presentation of the preliminary results of the.
Managing Data: The Long View FORCE15 – 12 January 2015 Amy Friedlander, Ph.D.
Archiving 40+ years of Planetary Mission Data - Lessons Learned and Recommendations K. E. Simmons LASP, University of Colorado, Boulder, CO
Sharing Research Data Globally Alan Blatecky National Science Foundation Board on Research Data and Information.
JOINT SESSION RDA/WDS IG Cost Recovery Models IG Domain Repositories RDA P6, Paris,
Deepcarbon.net Xiaogang (Marshall) Ma, Yu Chen, Han Wang, John Erickson, Patrick West, Peter Fox Tetherless World Constellation Rensselaer Polytechnic.
Choosing Between Data Sharing Repositories for Engineering Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch.
April 14, 2005MIT Libraries Visiting Committee Libraries Strategic Plan Theme III Work to shape the future MacKenzie Smith Associate Director for Technology.
Dataset citation Clickable link to Dataset in the archive Sarah Callaghan (NCAS-BADC) and the NERC Data Citation and Publication team
Federal Funder open data and literature requirements January 15, 2016 RAWG Meeting.
Linking Embargoed Datasets: A Plan for Improving How Research Data Can Be Shared, Linked and Tracked Arlington, VA, November 19, 2015 Anita de Waard VP.
Data Citation Implementation Pilot Workshop
Sustainable Business Models for Data Repositories Dr Simon Hodson Executive Director, CODATA OECD Global Science Forum Meeting OECD, Paris.
OECD Global Science Forum Project on Sustainable Business Models for Data Repositories.
Open Science (publishing) as-a-Service Paolo Manghi (OpenAIRE infrastructure) Institute of Information Science and Technologies Italian Research Council.
ICSU-WDS & RDA Data Publication Services WG. 2 Linking Research Data and the Literature: why? Why link? 1.Increase visibility & discoverability of research.
Open Science and Research – Services for Research Data Management © 2014 OKM ATT 2014–2017 initiative Licenced under.
Brian Nosek University of Virginia -- Center for Open Science -- Improving Openness.
| 1 Anita de Waard, VP Research Data Collaborations Elsevier RDM Services May 20, 2016 Publishing The Full Research Cycle To Support.
NIH – A Vision Through 2020 Philip E. Bourne, PhD, FACMI Associate Director for Data Science
Scholarly Link Exchange WG Wrap up – Wouter Haak
NRF Open Access Statement
Jennie Larkin, PhD Senior Advisor
Introducing SCHOLIX.
Research Data Management From A Publisher’s Perspective
Digital Repository Certification Schema A Pathway for Implementing the GEO Data Sharing and Data Management Principles Robert R. Downs, PhD Sr. Digital.
The OpenAIRE Catalogue of Services
GISELA & CHAIN Workshop Digital Cultural Heritage Network
RDA US Science workshop Arlington VA, Aug 2014 Cees de Laat with many slides from Ed Seidel/Rob Pennington.
OceanDocs Digital Repository of Marine Science Research Outputs
A Publisher’s Perspective
Research Data Management From A Publisher’s Perspective
Data Management and Open Access Requirements for Funded Research
ACS 2016 Moving research forward with persistent identifiers
and Scholarly Communication
Institutional Repository and Friends
Publishing Data Services working group output:
Publishing software and data
VI-SEEM Data Repository
Policy and publishing developments for sharing data and code
(r)evolutions in scholarly publishing
CFI John R Evans Leaders Fund Digital Data Management
Introducing the Publishing Data Services WG
Jay Bhatt Drexel University Libraries
Funding Sustainability and Domain Repositories
Access  Discovery  Compliance  Identification  Preservation
University of Edinburgh
Open Access to your Research Papers and Data
EOSCpilot Skills Landscape & Framework
OpenML Workshop Eindhoven TU/e,
EOSCpilot All Hands Meeting 9 March 2018, Pisa
An Introducation to ResearcherID
This content is available under a Creative Commons Attribution License
Research Data Alliance (RDA) 9th WG/IG Collaboration Meeting: Repository Platforms for Research Data (RPRD) Interest Group 13nd June 2018 Co-Chairs:
Repository Platforms for Research Data Interest Group: Requirements, Gaps, Capabilities, and Progress Robert R. Downs1, 1 NASA.
Brian Matthews STFC EOSCpilot Brian Matthews STFC
EOSCpilot All Hands Meeting 9 March 2018, Pisa
GISELA & CHAIN Workshop Digital Cultural Heritage Network
Jisc Research Data Shared Service (RDSS)
Bird of Feather Session
Reward and punishment mechanism for research data sharing
Presentation of the project and its activities
Research data lifecycle²
Data + Research Elements What Publishers Can Do (and Are Doing) to Facilitate Data Integration and Attribution David Parsons – Lawrence, KS, 13th February.
Supporting Open Research
Presentation transcript:

The Economics of Data Sharing CMMI Workshop February 6, 2016 Anita de Waard 0000-0002-9034-4119 VP Research Data Collaborations Elsevier RDM Services a.dewaard@elsevier.com

and get people to use them? How do we get scientists to share their data? How do we make data repositories sustainable? How do we create effective and sustainable ecosystems for storing, sharing and reusable data— and get people to use them? The economics of science Cost recovery models of data repositories Some examples that work Some thoughts on the future.

Two Economies of Science [1]: Debit Economy (like a pie) Single pile of ‘stuff’ gets divided: Thing can only be for one person at one time “If you get more, I get less” Examples: Money Jobs Samples, equipment, space, etc. Behaviors: Hoarding, secrecy (Cut-throat) competition Winning by owning (and not sharing) Credit Economy (like a song) Credit comes from visibility: The more you give away, the more you benefit “Only if I share do I really own” (“You need me to do you!” JW) Examples: Papers, citations Good ideas (if credited) Skills Behaviors: Open access, citation game Collaboration with top-X Winning by sharing (to enable priority & visibility) <<< DATA ??? [1] Paula Stephan: “How Economics Shapes Science”, Harvard University Press, 2012: http://www.jstor.org/stable/j.ctt2jbqd1

RDA IG Repository Cost Recovery Interviewed 22 repositories, globally Different income streams: Structurally funded Mostly data access charges Mostly data deposit fees Membership fees (for deposits and/or access) Serial project funding Supported by host institution Different new models under considerations: Sponsorships/services for the commercial sector Contracts for specific services offered (hosting, archiving, curation) Expanding the number of affiliated institutions Deposit fees More services for “national memory institutes” Some comments: Some countries structurally fund repositories (not US!) Some repositories embedded in scholarly practice Hard to come up with new models: no time, no skill sets!

Four Types of Repositories: Methods Software Publication Research Question Object of Study Raw Data Processed Data Tables/ Figures Data With Paper Curated Record Method Analysis Curate Deep Blue (Umich): 80k MIT Dspace: 75 k HAL (France): 60 k D-Space Cambr: 1.5 k Of which data: hundreds Institutional/Local Repositories Size: GB Nr of files: Billions Figshare: 1.2 M DataDryad: 3 k Dataverse: 58 k Non-Domain Repositories Size: MB Nr of files: Milliions NOAA: 20 TB/ NASA streaming > 24 PB/day NASA Reverb: 12 PB Data NSSD: > 230 TB of digital data NSIDC: 1 PB data, : 1 PB total ALMA Telescope: 40 TB/day Local Storage/ Instrument Repositories Size: PB Nr of files: Trillions Domain Repositories PetDB: 6 k PDB: 100 k NIST ASD: 170 k Size: kB Nr of files: 100ks

Where is data sharing happening? YES: Astronomy: telescopes High-energy physics: accelerators Earth science: satellites Social science: censuses Medicine (sometimes): patient data in large studies Life science: sequence data NO: Low-temperature physics: cryostats Earth science: samples Materials science: catalysts, microscopes, etc. Social science: interviews Medicine: individual patient data Neuroscience: microscope Big equipment, not a single lab/person can run Can’t do science without it Tools in place to be effective Small equipment, single lab/person can run Can do science without sharing No effective tools in place Communicate Prepare Observe Analyze Ponder

Connecting small science Observations Identify entities from the start Prepare Analyze Communicate Prepare Analyze Communicate

Connecting small science Observations Compare outcome of interactions with these entities Prepare Analyze Communicate Prepare Analyze Communicate

Connecting small science Observations Build a ‘virtual reagent spectrogram’ by comparing how different entities interacted in different experiments Think Prepare Analyze Communicate Prepare Analyze Communicate Reason collectively!

A small change for small science: Urban Legend [2] Encourage data sharing of raw data files + experimental metadata Add metadata to your experiment while you’re performing it Improved data practices made lab more productive and more creative, and enabled effective and novel collaborations Lesson: split the data storage and curation from data sharing! Provide direct reward to storage: now we can find our own data! Enable simple upload to embargo’d data set when owner is ready. [2] Tripathy et al, 2014: http://www.frontiersin.org/10.3389/conf.fninf.2014.18.00077/event_abstract

Addressing the fear of scooping with embargo’s: 4 Funding Agency Researcher creates datasets Researcher writes paper & publishes in journal (Sometimes,) dataset gets posted to repository Researcher reports (post-hoc) to Institution and Funder Institution 2 Journal Paper Researcher 1 Dataset 3 Data Repository

Addressing the fear of scooping with embargo’s: iv. Funders/Institutions informed as an afterthought Funding Agency Institution 4 iii. No links between data and paper 4 i. Too much work for researchers 2 2 Journal Paper Researcher 1 Dataset ii. Data posting not mandatory 3 Data Repository

Addressing the fear of scooping with embargo’s: 4 Funding Agency Institution Researcher creates datasets and posts to repository (under embargo – not publicly viewable) Funder is automatically notified of dataset posting Researcher writes paper & publishes in journal; embargo is lifted and data linked - NB this also allows release of non-used data for negative result and reproducibility 4. Funder and institution get report on publication and embargo lifting 2 3 Journal Paper Researcher 1 Dataset Data Repository

A System for Linking Data Links: Scholix ICSU-WDS/RDA Publishing Data Service Working group, merged with National Data Service pilot Cross-stakeholder – with input from CrossRef, DataCite, OpenAIRE, Europe PubMed Central, ANDS, PANGAEA, Thomson Reuters, Elsevier, and others Proposed long-term architecture and interoperability framework: www.scholix.org Operational prototype at http://dliservice.research-infrastructures.eu/#/api (including 1.4 Million links from various sources) Making links between datasets and articles available could/should encourage data citation and deposition Together with Force11 Data Citation Principles, encourage Research Object citation/credit metrics. IUPAC has recommendations for what word you should use to describe a given property, but the vocabulary itself isn’t very accessible or usable itself, thus is not universally implemented. Each site decides how it wants to label a given property, which hinders indexing and reuse of the data across silos. Structured capture of information using an ELN such as Hivebench enables the researcher to report data using a consistent vocabulary without extra effort.

A System for A New Data Economics: NIH Data Commons The Commons Option: Direct Funding NIH BD2K Provides credits Uses credits in the Commons User Enables Search Indexes Search Engines Phil Bourne, Dec15

Drivers for Data Sharing: A Study in Behavioral Economics Study scholarly reward systems from point of view of economics Develop economic model for entire scholarly rewards ecosystem: career, prestige, tenure, finances, etc Two intended outcomes: Understanding current behavior with respect to data sharing: can we explain what we see, and the differences between different domains? Theoretical foundation for recommendations for policies and practices to stakeholders such as funders, publishers and standards bodies Small group working on it, planning first meeting: Mike Huerta (NLM), Micah Altman (MIT), Fran Berman (RPI), Carol Tenopir (TN), Carole Palmer (UW), Greg Gordon (SSRN). Thoughts, join?

In summary: The Economy of Science: pies vs. songs cyberinfrastucture In summary: The Economy of Science: pies vs. songs RDA Data Repositories Cost Recovery IG: Different types of repositories, different types of science Need to move from ‘small’ to ‘big’ science thinking Some examples of successful data sharing: Online electronic lab notebooks: making it too easy not to use RDA Scholix: linking systems of links using existing technology The NIH Data Commons: enabling a data economy in practice Some things we can do: Embargo pilots: circumvent the fear of scooping Drivers for data sharing report: science is a human endeavor

Thank you! Anita de Waard, a.dewaard@elsevier.com Links: https://www.hivebench.com https://www.elsevier.com/physical-sciences/earth-and-planetary-sciences/the-2015-international-data-rescue-award-in-the-geosciences http://www.journals.elsevier.com/softwarex/ https://www.elsevier.com/books-and-journals/content-innovation/data-base-linking https://rd-alliance.org/groups/rdawds-publishing-data-services-wg.html https://rd-alliance.org/bof-data-search.html https://data.mendeley.com/ https://www.elsevier.com/connect/10-aspects-of-highly-effective-research-data https://www.force11.org/ http://www.nationaldataservice.org/ https://rd-alliance.org/ https://www.elsevier.com/about/open-science/research-data