Data Citation Proposal Based on work by: Mark A. Parsons and the ESIP Preservation and Stewardship Cluster, esp. Ruth Duerr, Curt Tilmes, and Bruce Barkstrom.
2 Purpose of Data Citation Credit for data creators and stewards Allow data creators to see how researchers are using their data Track impact of data set Provides accountability for creators and stewards Aids reproducibility through unambiguous connection to the precise data used From Parsons, modified by Lynnes
3 How “data citation” is currently done 1.Not mentioned, just used, e.g., in tables or figures 2.Reference to name or source of data in text 3.URL in text (with variable degrees of specificity) 4.Citation of related paper (e.g. CRU Temp. records recommend citing two old journal articles which do not contain the actual data or full description of methods) 5.Citation of actual data set typically using recommended citation given by data center 6.Citation of data set including a persistent identifier/locator, typically a DOI From Parsons, et al.
4 Current GES DISC Policy CITING OUR DATA GES DISC Data Use Acknowledgment Distribution of GES DISC data sets is funded by NASA's Science Mission Directorate (SMD). The data are not copyrighted and are open to all for both commercial and non-commercial uses. If you used GES DISC data for a publication (research or otherwise), or for any other purpose, we request that you include the following acknowledgment: "The data used in this effort were acquired as part of the activities of NASA's Science Mission Directorate, and are archived and distributed by the Goddard Earth Sciences (GES) Data and Information Services Center (DISC)." We would appreciate receiving a copy of your publication, which can be be forwarded to...
5 Basic data citation form and content Author(s). Year. Title, [version]. [editor(s)]. Publisher. Location. [date accessed]. [subset used]. From: Parsons, Mark A., Ruth Duerr, and Jean-Bernard Minster Data citation and peer-review. Eos, Trans. AGU 91 (34): doi: /2010EO
An Example Citation Cline, D., R. Armstrong, R. Davis, K. Elder, and G. Liston. 2002, Updated CLPX-Ground: ISA snow depth transects and related measurements. Edited by M. Parsons and M. J. Brodzik. Boulder, CO: National Snow and Ice Data Center. Data set accessed at Authors: intellectual effort going into the dataset: i.e., algorithm developers Year: year data were produced Title: Data Set Long Name Editor(s): People that have added significant value to the dataset City and Publisher: Greenbelt, MD: Goddard Earth Sciences Data and Information Services Center Data access date and location From Parsons, et al.
Implementation Store information in GCMD entry, under “Data Set Citation” Requested “Dataset Editor” field from GCMD Generate stable, toplevel locations for each dataset, e.g., Generate individualized citations for each dataset, e.g.,: Chung-Lin Shie, Long Chiu, Robert Adler, I-I Lin, Eric J. Nelkin, and Joe Ardizzone, Surface Turbulent Fluxes, 1x1 deg Monthly Grid, Set1 and Set2. Edited by A. Savtchenko. Greenbelt, MD: Goddard Earth Sciences Data and Information Services Center, Accessed at Add to READMEs at the top OR add a special file to URL set for download Present within Mirador at Checkout stage 7
Backup Slides 8
9 “We found that few policies recommend robust data citation practices: in our preliminary evaluation, only one-third of repositories (n=26), 6% of journals (n=307), and 1 of 53 funders suggested a best practice for data citation. We manually reviewed 500 papers published between 2000 and 2010 across six journals; of the 198 papers that reused datasets, only 14% reported a unique dataset identifier in their dataset attribution, and a partially-overlapping 12% mentioned the author name and repository name. Few citations to datasets themselves were made in the article references section.” “Data Citation in the Wild” Valerie Enriquez, Sarah Walker Judson, Nicholas M. Weber, Suzie Allard, Robert B. Cook, Heather A. Piwowar, Robert J. Sandusky, Todd J. Vision, Bruce Wilson From Parsons, et al.
11 Tracking citation “Tracking Dataset Citations Using Common Citation Tracking Tools Doesn’t Work” —Heather Pinowar, DataONE Traditional fields such as author and date too imprecise Web of Science, Scopus, and other tools don’t handle identifiers From Parsons, et al.
12 Accountability A new standard of accountability in a post-climategate world Data “publication” needs to be tied to promotion, tenure, etc. Implies peer review— See AGU Position Statement on Data What is peer-review? An assertion of accuracy or validity? An audit of complete documentation and sound practice? Related to but different than QA. How does it overlap with curation and stewardship? Earth System Science Data one approach, but not universally applicable. Open or informal review or usage comments within the metadata Versioning and transparency are essential From Parsons, et al.
Author Cline, D., R. Armstrong, R. Davis, K. Elder, and G. Liston. 2002, Updated CLPX-Ground: ISA snow depth transects and related measurements. Edited by M. Parsons and M. J. Brodzik. Boulder, CO: National Snow and Ice Data Center. Data set accessed at From Parsons, et al.
Year Cline, D., R. Armstrong, R. Davis, K. Elder, and G. Liston. 2002, Updated CLPX-Ground: ISA snow depth transects and related measurements. Edited by M. Parsons and M. J. Brodzik. Boulder, CO: National Snow and Ice Data Center. Data set accessed at From Parsons, et al.
Title Cline, D., R. Armstrong, R. Davis, K. Elder, and G. Liston. 2002, Updated CLPX-Ground: ISA snow depth transects and related measurements. Edited by M. Parsons and M. J. Brodzik. Boulder, CO: National Snow and Ice Data Center. Data set accessed at From Parsons, et al.
Editor Cline, D., R. Armstrong, R. Davis, K. Elder, and G. Liston. 2002, Updated CLPX-Ground: ISA snow depth transects and related measurements. Edited by M. Parsons and M. J. Brodzik. Boulder, CO: National Snow and Ice Data Center. Data set accessed at From Parsons, et al.
Publisher Cline, D., R. Armstrong, R. Davis, K. Elder, and G. Liston. 2002, Updated CLPX-Ground: ISA snow depth transects and related measurements. Edited by M. Parsons and M. J. Brodzik. Boulder, CO: National Snow and Ice Data Center. Data set accessed at From Parsons, et al.
Location Cline, D., R. Armstrong, R. Davis, K. Elder, and G. Liston. 2002, Updated CLPX-Ground: ISA snow depth transects and related measurements. Edited by M. Parsons and M. J. Brodzik. Boulder, CO: National Snow and Ice Data Center. Data set accessed at From Parsons, et al.
Location Gary King; Langche Zeng, 2006, "Replication Data Set for 'When Can History be Our Guide? The Pitfalls of Counterfactual Inference'" hdl:1902.1/DXRXCFAWPK UNF:3:DaYlT6QSX9r0D50ye+tXpA== Murray Research Archive [distributor] From Parsons, et al.
Location König-Langlo, Gert and Hatwig Gernandt Compilation of radiosonde data from the Antarctic Georg- Forster station of the German Democratic Republic from 1985 to Bremerhaven, Germany: Alfred Wegener Institute for Polar and Marine Research Data set accessed doi: /PANGAEA From Parsons, et al.
21 Doing it as best we can... Hall, Dorothy K., George A. Riggs, and Vincent V. Salomonson. 2007, updated daily. MODIS/Aqua Snow Cover Daily L3 Global 500m Grid V005.3, Oct Sep. 2008, 84°N, 75°W; 44°N, 10°W. Boulder, Colorado USA: National Snow and Ice Data Center. Data set accessed at doi: /xxx. Hall, Dorothy K., George A. Riggs, and Vincent V. Salomonson. 2007, updated daily. MODIS/Aqua Snow Cover Daily L3 Global 500m Grid V005.3, Oct Sep. 2008, Tiles (15,2; 16,0;16,1;16,2;17,0;17,1). Boulder, Colorado USA: National Snow and Ice Data Center. Data set accessed at doi: /xxx. Cline, D., R. Armstrong, R. Davis, K. Elder, and G. Liston. 2002, Updated CLPX-Ground: ISA snow depth transects and related measurements, Version 2.0, shapefiles. Edited by M. Parsons and M. J. Brodzik. Boulder, CO: National Snow and Ice Data Center. Data set accessed at doi: /xxx. From Parsons, et al.
Thank You Much of this talk comes from: Parsons, Mark A., Ruth Duerr, and Jean-Bernard Minster Data citation and peer-review. Eos, Trans. AGU 91 (34): doi: /2010EO Duerr, Ruth E., Robert R. Downs, Curt Tilmes, Bruce Barkstrom, W. Christopher Lenhardt, Joe Glassy, Luis E. Bermudez, and Peter Slaughter (submitted). On the utility of identification schemes for digital Earth science data: An assessment and recommendations. Earth Science Informatics. A lot of discussion at: photo courtesy NOAA From Parsons, et al.