JISC 2005 OPEN DATA! Peter Murray-Rust University of Cambridge Power corrupts; Powerpoint* corrupts absolutely (Tufte) (*under duress: these slides are machine-unfriendly) [©Peter Murray-Rust: Reusable under Creative Commons]
Need for Open Data Machine-understandable data Vital for eScience and Semantic Web Often micropublished (i.e. many independent articles) Open capture highly variable Bioinformatics – fairly successful Chemistry – completely unsuccessful [©Peter Murray-Rust 2005: Reusable under Creative Commons]
Problems for Open Data Additional to Open Access Requires explicit effort for Openness Not recognised as a serious problem Need culture change in funders, authors, editors, publishers (1 0 and 2 0 ) Major problems are apathy and unclear or antagonistic licenses [©Peter Murray-Rust 2005: Reusable under Creative Commons]
Machine understandability ALL spectra are originally like this … hundreds more lines … but humans require pictures … [©Peter Murray-Rust 2005: Reusable under Creative Commons]
Destruction by publication This was once machine-understandable data; the publisher has destroyed it [©Peter Murray-Rust 2005: Text reusable under Creative Commons]
“Open Access” is not enough Once machine-understandable but no longer. Now only human-readable [©Peter Murray-Rust 2005: Reusable under Creative Commons]
Photons may be good enough for HUMANS… but MACHINES need DATA [©Peter Murray-Rust 2005: Reusable under Creative Commons]
Open Access and Open Data Many OA evangelists don’t understand data OA only guarantees access to human retinas Much current OA says nothing about re-use and redistribution of text or data NOT PDF, NOT Word, NOT Powerpoint [©Peter Murray-Rust 2005: Reusable under Creative Commons]
Publishers Destroy Scientific Data 80-99% of high-quality scientific data never leaves the laboratory Few publishers support the publication of eData. They trash it Few support robotic extraction Many actively forbid it Others sell it back to the originators [©Peter Murray-Rust 2005: Reusable under Creative Commons]
Non-OpenData policy (ACS) What is important to realize is that a subscription to an STM journal is no longer [...] a subscription; in fact, it is an access fee to a database maintained by the publisher RUDY M. BAUM, Editor-in-Chief, C&E News, September Volume 82, Number 38 p. 7 [©Peter Murray-Rust 2005: Reusable under Creative Commons]
ACS Copyright on Supplemental Data Electronic Supporting Information files * are available without a subscription to ACS Web Editions. All files are copyrighted by the American Chemical Society. Files may be downloaded for personal use; users are not permitted to reproduce, republish, redistribute, or resell any Supporting Information, either in whole or in part, in either machine- readable form or any other form. For permission to reproduce this material, contact the ACS Copyright Office … (*i.e. scientific FACTS provided by the author – PM-R) [©Peter Murray-Rust 2005: Reusable under Creative Commons]
Recommendations for OpenData Funders must promote publication Authors should use Creative Commons Editors to promote publication at source Publishers should provide data licenses Citation junkies and RAE to credit data Institutional repositories to encourage [©Peter Murray-Rust 2005: Reusable under Creative Commons]
Human readability Humans want spectra like this but publishers require… [©Peter Murray-Rust 2005: Reusable under Creative Commons]