Archiving Research Data, Dryad,and Publishers Neil Beagrie, Charles Beagrie Ltd Bloomsbury Conference June 2010 With contributions from Julia Chruszcz,

Slides:



Advertisements
Similar presentations
Evaluation at NRCan: Information for Program Managers Strategic Evaluation Division Science & Policy Integration July 2012.
Advertisements

Usage statistics in context - panel discussion on understanding usage, measuring success Peter Shepherd Project Director COUNTER AAP/PSP 9 February 2005.
NIH Public Access Policy What It Means for Authors and for Universities.
NATIONAL LIBRARY OF MEDICINE PubMed Central Edwin Sequeira National Library of Medicine May 26, 2004.
Southampton University Research e-Prints: e-Prints Soton School of Medicine Discussion 19 Jan 2005 Pauline Simpson Elizabeth.
GEOSS Data Sharing Principles. GEOSS 10-Year Implementation Plan 5.4 Data Sharing The societal benefits of Earth observations cannot be achieved without.
David Shotton Image BioInformatics Research Group Department of Zoology University of Oxford, UK The Dryad-UK vision © David Shotton,
Building Support for a Discipline-Based Data Repository Ryan Scherle 1, Sarah Carrier 2, Jane Greenberg 2, Hilmar Lapp 1, Abbey Thompson 2, Todd Vision.
Ryan Scherle and Jane Greenberg. A Repository of Data Underlying Journal Articles.
Evolutionary biology Population genetics Systematics Paleontology Botany and Zoology Genomics Ecology Medicine Agriculture Anthropology Bioinformatics.
The Dryad Data Repository Ryan Scherle 1, Hilmar Lapp 1, Amol Bapat 2, Sarah Carrier 2, Jane Greenberg 2, Peggy Schaeffer 1, Todd Vision 1,3, Hollie White.
Open Access Niamh Brennan Trinity College Dublin DRIVER Summit, Goettingen, January 17th 2008 Local Integration, National Federation TCD-RSS, TARA, IReL-Open,
OpenAccess.se First DRIVER Summit, January 2008 Göttingen Jan Hagerlid, National Library of Sweden, co-ordinator of.
Creating Institutional Repositories Stephen Pinfield.
Open Access - Implications for research funding, management and assessment ARMA Conference 9 th June 2010 Bill Hubbard Centre for Research Communications.
Enlighten: Glasgows Universitys online institutional repository Morag Greig University Library.
Open Access Dr Richard Masterman Director Research Innovation Services.
Keeping Research Data Safe JISC Research Data Digital Preservation Costs Study LIFE Conference London June 2008.
Scholarly Communications in Flux Michael Jubb Director, Research Information Network Bloomsbury Conference on E-Publishing and E-Publications 29 June 2007.
EThOSnet Project JISC Programme Meeting 28 th November 2007.
13 February 2009ESDS – whats in it for librarians? Royal Statistical Society The strange case of the local data librarian - a peculiarly Edinburgh perspective!
Costs, Policy, and Benefits in Long-term Digital Preservation Neil Beagrie Keepit Training Course Southampton Feb 2010.
CURRENT ISSUES Current contents Over 3,000 items open access, 42% reports and working papers, 21% journal articles, 21% conference items, 7% book chapters,
Keeping Research Data Safe JISC Research Data Digital Preservation Costs Study JISC-CNI Belfast July 2008.
UKOLN is supported by: Digital Repositories Roadmap: looking forward The JISC/CNI Meeting, July 2006 Rachel Heery Assistant Director R&D, UKOLN
Costs and Benefits in KRDS and I2S2 Neil Beagrie RAL Feb 2010.
Supporting Engagement in Open Access: a Publishers Perspective
Open Access in the UK Developments since the Finch Report Michael Jubb Research Information Network 5th Conference on Open Access Scholarly Publishing.
Learning Services. edgehill.ac.uk/ls Zoe Clarke and Yvonne Smith The Digital Researcher: Trends in Open Access Publishing.
Open Stirling: Open Access Publishing and Research Data Management at Stirling Monday 25 th March 2013 Michael White, Information Services STORRE Co-Manager/RMS.
JISC Collections 04 September 2014 | Presentation to PRATT-SILS MA Summer School | Slide 1 JISC Collections.
Current status Todd Vision (overview) Elena Feinstein (curation) Ryan Scherle (demo) 7/23/12Dryad Board of Directors1.
OpenAIRE & OA in H2020 Open Access Infrastructure for Research In Europe Inge Van Nieuwerburgh Gwen Franck.
Data archiving in evolutionary biology Michael Whitlock.
On the launch of UK PubMed Central Frontiers in Information Management for the Bio- and environmental Sciences Novartis Foundation, 25 Jan 07 Adam Bostanci.
Ensuring a Journal’s Economic Sustainability, While Increasing Access to Knowledge.
PubMed Central ANCHASL Spring Meeting April 1, 2005 Robert James Associate Director of Public Services Duke University.
Electronic publishing: issues and future trends Anne Bell.
Introduction to the Dryad Digital Repository A nonprofit repository for data underlying the international scientific and medical literature. April 2013.
New business models for open research Todd Vision Jared Lyle Mark Hahnel 12-June-2014Open Repositories1.
Supplementary Data and Publishers Neil Beagrie, Julia Chruszcz, and Peter Williams Charles Beagrie Ltd Dryad UK April 2010.
"Keeping alert: issues to know today for long-term digital preservation with repositories" Neil Beagrie Fedora Users Group Open Repositories Southampton.
Archiving Data. Essential stuff to know Why deposit? Digital repositories ADS Guidelines Deposit evaluation & requirements Deposit checklist & template.
1 NIH PUBLIC ACCESS POLICY Overview Office of Research & Sponsored Programs Compliance Subgroup 1, 2 & 3 Meeting April 1, 2008.
Keeping Research Data Safe JISC Research Data Digital Preservation Costs Study Oxford Workshop June 2008.
Committee Charges Identify and implement local actions in response to the scholarly communications issues raised by the committee. Consider actions that.
Open Access: a Biomedical Science Perspective Gerald M. Kidder, Ph.D. Associate Vice-President (Research) and Professor of Physiology Schulich School of.
Presented by Ansie van der Westhuizen Unisa Institutional Repository: Sharing knowledge to advance research
Selecting journals for digitisation Piecing together the puzzle to create a European model Dr Hazel Woodward Cranfield University, UK
Literature/data integration and Ryan Scherle Data Repository Architect Dryad Digital Repository HighWire Fall Publishers’ Meeting November 20, 2013 You.
Project-soap.eu Income sources as underlying business models’ attributes for scholarly journals: preliminary findings from analysing open access journals’
OpenAIRE - supporting Open Access for FP7 and ERC funded projects Inge Van Nieuwerburgh – Ghent University Library.
Software Sustainability Institute Dealing with software: the research data issues 26 August.
Supporting scientific communities by publishing data Dryad Digital Repository Peggy Schaeffer OpenAIRE/LIBER Workshop May 28, 2013 Ghent, Belgium.
Editorial Strategies and Developments Richard Delahunty Managing Editor Politics and International Relations UKSG Seminar, Oxford, 21st January Web:
Scientific Data and Electronic Publishing Renze Brandsma, Head, Digital Production Centre University of Amsterdam Maarten Hoogerwerf, Project Manager,
Data archiving and curation Ryan Scherle Data Repository Architect Dryad Digital Repository CurateGear January 8, 2014 You may reuse any of the original.
References Prof. Nikos Siafakas MD,PhD. University of Crete.
BMJ and Data Sharing Claire Bower, Digital Communications
Evolving a Community Digital Repository: Lessons from Dryad Making data underlying scientific publications discoverable, freely reusable, and citable Bill.
Managing Access at the University of Oregon : a Case Study of Scholars’ Bank by Carol Hixson Head, Metadata and Digital Library Services
Open Access Initiatives Memorial University Libraries Lisa Goddard Scholarly Communications Librarian April 2011.
NRF Open Access Statement
Towards REF 2020 What we know and think we know about the next Research Excellence Framework Dr. Tim Brooks, Research Policy & REF Manager, RDCS Anglia.
How to Apply for Open Access
Data publishing from the viewpoint of a biodiversity publisher
Open Access to your Research Papers and Data
Research Data Management
Ethics & Data Management
Data + Research Elements What Publishers Can Do (and Are Doing) to Facilitate Data Integration and Attribution David Parsons – Lawrence, KS, 13th February.
Presentation transcript:

Archiving Research Data, Dryad,and Publishers Neil Beagrie, Charles Beagrie Ltd Bloomsbury Conference June 2010 With contributions from Julia Chruszcz, Peter Williams, and Todd Vision

Overview The Challenge; The Dryad Consortium; Supplementary Data and Publishers; Research Data Preservation Costs (KRDS); The Future.

The Challenge

4 PRC Global Study n=3759 n=2940 n=1262 n=1653 n=2989 n=2118 n=1294 n=2565 n=1868 n=2273 n=841 n=2362 Source: PRC global study (forthcoming)

Requesting Data Wicherts et al. (2006 Am. Psychol. 61, 726) requested data from the 141 most recent articles in American Psychological Association (APA) journals. 6 months later, after … 400 s, [sending] detailed descriptions of our study aims, approvals of our ethical committee, signed assurances not to share data with others, and even our full resumes… Only 27% of authors shared their data

The Dryad Consortium of Scholarly Societies and publishers (and libraries)

Archiving at publication Avoids loss, corruption, obsolescence of data files; The point in time when authors are best able to ensure the correctness of data and metadata; Authors have incentive to deposit their data in order to complete the publication process; Journals are best able to monitor compliance with policy; In short, the Genbank model works.

Incentives to authors Access to colleagues data Visibility and citability –Another way for work to have high impact Integration –Combinability with other data adds value Long-term preservation –Including data format migration Ad hoc data sharing can be burdensome –Deposition to multiple specialized repositories –Fulfilling individual requests for data takes effort

Joint Data Archiving Policy DEPOSIT AT PUBLICATION –As a condition for publication, all data used in the paper should be archived in an appropriate public archive. REPEATABILITY –Data should be given with sufficient detail so that together with the paper content, each result in the published paper may be re-created. EMBARGO –Authors may elect to have the data publicly available at time of publication, or if the archive allows opt to embargo access to the data. EXCEPTIONS –Exceptions may be granted at the discretion of the editor, especially for sensitive information such as the location of endangered species. COORDINATION –The aim is for the Dryad consortium of journals to adopt this policy simultaneously.

Thats all well and good, but wheres this appropriate public archive?

A mosaic of specialized databases There are a growing number to which deposition is encouraged/required (Genbank, Treebase) –And others are emerging A world in which every datatype had its own required database, each with its own submission system: –Would be a huge burden on authors –Would inevitably leave some data orphaned –Might never be financially possible

Overcoming the submission burden Integrating journal submission and data submission –Prepopulating bibliographic metadata –Handshaking with specialized repositories Enhancing low-quality author-provided metadata –Human curation –Machine assisted metadata enhancement

The Dryad Digital Repository

The Repository Dryad is a repository (at Duke) for datasets underlying scientific research articles; ƒIts initial focus has been evolution and ecology; ƒParticipating journals subscribe to the Joint Data Archiving Policy; ƒDryad datasets will have (DOIs), and Creative Commons CC-Zero licenses; Project ƒFunded by the National Science Foundation ; Sustainability plan a key deliverable.

Supplementary Data and Publishers

Overview Consultancy for Dryad Sustainability: covered areas of draft business plan and sustainability for Dryad Presenting one of the contributions(publishers) to section on Comparators and Costs Outcomes from desk research and 12 interviews with publishers/data publishers + some additional input drawn from Keeping Research Data Safe Very brief presentation – article in preparation for Learned Publishing Oct 2010 issue….KRDS2 available from JISC

Interviewees Journal of Clinical Investigation Journal of the American Medical Association Molecular Phylogenetics and Evolution (Elsevier) Journal of Heredity (OUP) Ecological Society of America Wiley-Blackwell + Ecology Letters Royal Society Federation of American Societies for Experimental Biology OECD Publishing Internet Archaeology and Archaeology Data Service Pangaea: Publishing Network for Geoscientific & Environmental Data Dataverse Network (Social Sciences, Harvard)

Some Findings: growth Many interviewees stated that supplementary data and materials are showings rapid growth 3 gave figures: from 32 articles in 2000, to 251 in 2009 – an increase of 784%; from 6% in 2005 to 38% in 2009; from 2% a decade ago to 87% in 2009.

Some Findings: workflow supplementary data have grown organically at the various journals investigated (author driven); Both the work and the costs being absorbed into the daily running of journals; in 4 cases minimal impact on work duties; in 5 others there was a significant but often unquantified impact (two of these might be considered data publications with a focus on publishing data papers or datasets); and in 3 cases the information was not available or unknown; can be explained in terms of level of effort or importance applied : the greatest levels of effort are associated with copy editing, format migration, addition of metadata, etc, whilst the least effort is required for simply hosting the material; and/or high-levels of automation in the workflow.

Some Findings: costs These were in most cases unknown or only partially known; Costs mentioned but usually not quantified include: digital storage costs, salary costs of journal staff; and long term preservation costs; detailed cost information was really only available from Internet Archaeology via Archaeology Data Service which had participated in an activity based costing study (KRDS2); Internet Archaeology archiving costs reflect those for a dataset publisher so only a comparator for part of Dryads content – large datasets.

Some Findings: revenue only author fees and journal subscription fees were mentioned as current revenue sources for the supplementary materials in journals; 3 journals interviewed have author charges for supplementary materials (see next slide); The data archiving and sharing organisations interviewed relied primarily on (uncertain) research grants and temporary or re-current core funding, but one had access to a small endowment and another has a charging policy for some depositors.

Some Findings: author charges Journal of Clinical Investigation - authors are charged $300 for supplemental data to appear online with accepted articles; Ecological Archives - submission of appendices and supplements is free up to 10MB. Above this, there is a fee of $250 for the first 1 GB and $50 for each subsequent GB. The fee for publication of a data paper is $250 for publication of the abstract in the relevant journal plus publication of up to 10 MB in Ecological Archives. An additional $250 is charged for data sets between 10MB and 1GB, and for larger datasets there is an additional $50 per GB fee; The Federation of American Societies for Experimental Biology (FASEB) charges $100 for each Supplemental file.

Keeping Research Data Safe (KRDS1 & KRDS2): JISC-funded studies of Research Data Preservation Costs (separate Dryad costing project by Lori Eakin- Richards based on KRDS approach)

KRDS: what did we learn? Whole of Service costing/Seeing theBig Picture Selection of 2009 Allocation of UKDA Activity Costs Acquisition5.8% Ingest21.5% A. Storage +Pres. Planning3.1% Access16.9%

KRDS:Implications Changing view of digital preservation costs: –getting stuff in and out costs much higher than keeping it (bit preservation + migration); –Staff costs c.70% of total costs; –Importance of economies of scale and automation; –Findings of KRDS and Dryad Repositorys own activity costing projections fed into Dryad sustainability planning.

Future Plans Dryad sustainability plan being put to Dryad member societies and publishers; Dryad extending consortium to new members –achieving economies of scale; Bid to JISC to establish Dryad-UK; Extending KRDS research and implementations.

Further Information Dryad see Keeping Research Data Safe2 (KRDS2) webpage at KRDS2 report available from JISC website 10/keepingresearchdatasafe2.aspx#downlo ads 10/keepingresearchdatasafe2.aspx#downlo ads