Data archiving in evolutionary biology Michael Whitlock
Why publicly archive data? Error checking Meta-analysis New uses Increase citations
Why publicly archive data? Error checking Meta-analysis New uses Increase citations More than half of published papers contain statistical errors. 5-10% of papers contain errors that change the conclusions. Gore et al. 1977, Kantoer and Taylor 1994, McGuigan 1995, Hurlbert and White 1993
Why publicly archive data? Error checking Meta-analysis New uses Increase citations
Why publicly archive data? Error checking Meta-analysis New uses Increase citations Bumpus' (1898) data has been used numerous times, in ways he never imagined.
Why publicly archive data? Error checking Meta-analysis New uses Increase citations " Publicly available data was significantly (p = 0.006) associated with a 69% increase in citations." - Piwowar et al. (2007) PLOS One
Why publicly archive data? Error checking Meta-analysis New uses Increase citations Teaching and learning
Why publicly archive data? Error checking Meta-analysis New uses Increase citations Data security and Back-ups
Data sharing/archiving in ecology and evolution: Previous policies Some types of archiving required –(e.g. DNA sequences in GenBank, phylogenies in TREEBase) Data sharing already required by many journals and by most major funding agencies
Most data is lost to science very quickly.... through loss of files, loss of researchers, loss of context, etc.
Most evolutionary biologists want data archiving 95% of scientists in evolution and ecology think that data should be publicly archived (S. Carrier, J. Greenberg, H. Lapp, R. Scherle, A. Thompson, T. Vision, and H. White, unpublished manuscript) 78% of editorial board members voted for mandatory archiving (11% against) Supported by executive councils of each society
Evolutionary biology journals adopting data archiving policies The American Naturalist Evolution Journal of Evolutionary Biology Molecular Ecology Evolutionary Applications Genetics Heredity Molecular Biology and Evolution Systematic Biology Paleobiology BMC Evolutionary Biology
Joint data archiving policy " This journal requires, as a condition for publication, that data supporting the results in the paper should be archived in an appropriate public archive, such as GenBank, TreeBASE, Dryad, or the Knowledge Network for Biocomplexity. Data are important products of the scientific enterprise, and they should be preserved and usable for decades in the future. Authors may elect to have the data publicly available at time of publication, or, if the technology of the archive allows, may opt to embargo access to the data for a period up to a year after publication. Exceptions may be granted at the discretion of the editor, especially for sensitive information such as human subject data or the location of endangered species. "
Joint data archiving policy " This journal requires, as a condition for publication, that data supporting the results in the paper should be archived in an appropriate public archive, such as GenBank, TreeBASE, Dryad, or the Knowledge Network for Biocomplexity. Data are important products of the scientific enterprise, and they should be preserved and usable for decades in the future. Authors may elect to have the data publicly available at time of publication, or, if the technology of the archive allows, may opt to embargo access to the data for a period up to a year after publication. Exceptions may be granted at the discretion of the editor, especially for sensitive information such as human subject data or the location of endangered species. "
Joint data archiving policy " This journal requires, as a condition for publication, that data supporting the results in the paper should be archived in an appropriate public archive, such as GenBank, TreeBASE, Dryad, or the Knowledge Network for Biocomplexity. Data are important products of the scientific enterprise, and they should be preserved and usable for decades in the future. Authors may elect to have the data publicly available at time of publication, or, if the technology of the archive allows, may opt to embargo access to the data for a period up to a year after publication. Exceptions may be granted at the discretion of the editor, especially for sensitive information such as human subject data or the location of endangered species. "
Joint data archiving policy " This journal requires, as a condition for publication, that data supporting the results in the paper should be archived in an appropriate public archive, such as GenBank, TreeBASE, Dryad, or the Knowledge Network for Biocomplexity. Data are important products of the scientific enterprise, and they should be preserved and usable for decades in the future. Authors may elect to have the data publicly available at time of publication, or, if the technology of the archive allows, may opt to embargo access to the data for a period up to a year after publication. Exceptions may be granted at the discretion of the editor, especially for sensitive information such as human subject data or the location of endangered species. "
Buffering author’s IP concerns Embargos allow time for further use. Archiving only required for data used in paper. Archived data should be cited fairly-- encourage citation of original paper, not accession numbers. Required by funding bodies in any case
What to archive? Raw data at individual level –Sufficient to re-create results in paper –Not necessarily the whole dataset from the project.readme file that explains any missing details – header names, units, etc.
Data archiving: Preserving our legacy