Presentation is loading. Please wait.

Presentation is loading. Please wait.

Responsible Data Use and Local Data Management Ruth Duerr National Snow and Ice Data Center.

Similar presentations


Presentation on theme: "Responsible Data Use and Local Data Management Ruth Duerr National Snow and Ice Data Center."— Presentation transcript:

1 Responsible Data Use and Local Data Management Ruth Duerr National Snow and Ice Data Center

2 Responsible Data Use and Local Data Management; Presented 8 Nov 2011, APECS Webinar Overview Responsible Data Use Fair access and use Data restrictions Citation and credit Providing feedback Local Data Management File names Directory structures Backing up your data Data formats Documentation and metadata

3 Responsible Data Use and Local Data Management; Presented 8 Nov 2011, APECS Webinar Responsible Data Use (or what should you do if you find yourself re-using someone else’s data)

4 Responsible Data Use and Local Data Management; Presented 8 Nov 2011, APECS Webinar Your Responsibilities as a Data User Determining the suitability of data for your purposes Following applicable data access and use policies Giving credit to archives and data creators Providing the data source with feedback about any errors or limitations with the data discovered

5 Responsible Data Use and Local Data Management; Presented 8 Nov 2011, APECS Webinar Just because it is “good” data, doesn’t mean that it is right for your project! Corollary Just because it isn’t right for your project, doesn’t mean that it is “bad” data!

6 Responsible Data Use and Local Data Management; Presented 8 Nov 2011, APECS Webinar Hints for Determining Data Suitability Read any papers, documentation and metadata provided – it is there for a reason! See http://nsidc.org/data/mod10a1v5.html for an example of a fairly well documented data sethttp://nsidc.org/data/mod10a1v5.html If you still have questions, assess support availability and if acceptable ask! See http://nsidc.org/data/g02199.html for an example of a poorly documented data set with an extremely low level of available supporthttp://nsidc.org/data/g02199.html Be aware that due to documentation and support limitations, the best data for your purposes may not be available to or usable by you

7 Responsible Data Use and Local Data Management; Presented 8 Nov 2011, APECS Webinar A few words about data access and use The trend in many disciplines is towards greater data sharing, but… Norms vary by discipline (and country), for example you may need to Submit an application for access Sign a data transfer and usage agreement Travel to the repository to obtain access Moreover there are legitimate reasons for restricting access, for example: To protect the confidentiality of human subjects To protect the rights of local and traditional knowledge holders To protect information that if released may cause harm (e.g., location of endangered species, sacred sites, etc.) 1 It is your responsibility to understand and follow the norms for the data your are using 1 see IPY Data Policy at classic.ipy.org/Subcommittees/final_ipy_data_policy.pdf

8 Responsible Data Use and Local Data Management; Presented 8 Nov 2011, APECS Webinar Would you share your data if you didn’t know that you were going to be given credit for your work? So cite the data you use!

9 Responsible Data Use and Local Data Management; Presented 8 Nov 2011, APECS Webinar Data Citation – Now Currently data citation standards and requirements vary 1.From journal to journal 2.From repository to repository 3.From discipline to discipline 4.Some times from author to author Do your best to honor these existing norms What might a data citation look like? Zwally, H.J., R. Schutz, C. Bentley, J. Bufton, T. Herring, J. Minster, J. Spinhirne, and R. Thomas. 2003. GLAS/ICESat L1A Global Altimetry Data V018, 15 October to 18 November 2003. National Snow and Ice Data Center. Data set accessed 2011-07-21 at doi:10.3334/NSIDC/gla01.

10 Responsible Data Use and Local Data Management; Presented 8 Nov 2011, APECS Webinar Data Citation – In the Near Future DataCite and other groups are working to make data citation a normal part of the scientific process For example, as of this year Thompson-Reuters Web of Science and Web of Knowledge include published data sets (i.e., that have a DOI)

11 Responsible Data Use and Local Data Management; Presented 8 Nov 2011, APECS Webinar Why provide feedback? Prevent other users from repeating your mistakes Improve the data or their documentation Better science, perhaps even new results, papers, and collaborators

12 Responsible Data Use and Local Data Management; Presented 8 Nov 2011, APECS Webinar A few words about providing feedback Feedback to a PI Your reasons for using someone else’s data are likely different than their reasons for acquiring it in the first place So, they probably weren’t thinking of your needs when they acquired, documented and made it available Yet, if they thought their data would be useful to a community they probably would be eager to help Diplomacy and tact may be called for (especially if you really think you’ve found an error not just a documentation problem) Feedback to a data center is almost always welcome

13 Responsible Data Use and Local Data Management; Presented 8 Nov 2011, APECS Webinar Local Data Management (or managing your own data)

14 Responsible Data Use and Local Data Management; Presented 8 Nov 2011, APECS Webinar The 5 P’s matter! (prior planning prevents poor performance)

15 Responsible Data Use and Local Data Management; Presented 8 Nov 2011, APECS Webinar Local Data Management File names Directory structures Backing up your data Data formats Documentation and metadata

16 Responsible Data Use and Local Data Management; Presented 8 Nov 2011, APECS Webinar Dilbert’s file naming convention

17 Responsible Data Use and Local Data Management; Presented 8 Nov 2011, APECS Webinar Assign descriptive file names File names should be unique and reflect the file contents Bad file names Mydata 2001_data A better file name might be bigfoot_agro_2000_gpp.tif BigFoot is the project name Agro is the field site name 2000 is the calendar year GPP represents Gross Primary Productivity data tif is the file type – GeoTIFF But only if you document the naming convention!

18 Responsible Data Use and Local Data Management; Presented 8 Nov 2011, APECS Webinar

19 Biodiversity Lake Experiments Field work Grassland Biodiv_H20_heatExp_2005_2008.csv Biodiv_H20_predatorExp_2001_2003.csv Biodiv_H20_planktonCount_start2001_active.csv Biodiv_H20_chla_profiles_2003.csv … … Organize files logically Make sure your file system is logical and efficient Courtesy of S. Hampton, UC-Santa Barbara

20 Responsible Data Use and Local Data Management; Presented 8 Nov 2011, APECS Webinar Backup Your Data!!!

21 Responsible Data Use and Local Data Management; Presented 8 Nov 2011, APECS Webinar Why? Broken DVD A burned up memory stick A crashed hard drive A drowned laptop You think it's easy to recover data off a

22 Responsible Data Use and Local Data Management; Presented 8 Nov 2011, APECS Webinar Backing up your data files Create back-up copies often Ideally three copies original, one on-site (external), and one off-site Frequency based on need / risk Higher value data should be backed up more often Sensor data collected at high frequency should be backed up more frequently Ensure that all backup copies are identical to the original files Use checksums or file comparisons

23 Responsible Data Use and Local Data Management; Presented 8 Nov 2011, APECS Webinar Test your backups Automatically test backup copies of files frequently to ensure they are viable Media degrade over time Test copies using check sum or file compare Be certain that you can recover from a data loss Periodically test your ability to restore information (at least once a year) Simulate an actual loss, by trying to recover solely from the backed up copies

24 Responsible Data Use and Local Data Management; Presented 8 Nov 2011, APECS Webinar Data Formats – Best Practices Don’t use a proprietary format! These have a short shelf life and will probably become unreadable after a few years Don’t invent your own format! No one but you will have the tools to read it Use open source, well-documented, community-based standard formats where ever possible especially if they are self-describing

25 Responsible Data Use and Local Data Management; Presented 8 Nov 2011, APECS Webinar Self-describing data formats Information describing the data contents of the file are embedded within the data file itself: Names for various fields Data types – Standardized, portable, machine independent Pointers to various fields, making it efficient to extract the particular fields you want without reading the entire file Attributes and flags related to the primary fields with extra information such as units, fill values, etc. Include a standard API and portable data access libraries in a variety of languages There are tools that can open and work with arbitrary files, using the embedded descriptions to interpret the data.

26 Responsible Data Use and Local Data Management; Presented 8 Nov 2011, APECS Webinar Some example self-describing formats HDF – Hierarchical Data Format HDF4 and HDF5 versions are in use today A NASA variant called HDF-EOS is used within the Earth Observing System program. NetCDF – Network Common Data Form Widely used by agencies including NASA and NOAA Climate and forecast (CF) metadata conventions help standardize some things into NetCDF in a common manner.

27 Responsible Data Use and Local Data Management; Presented 8 Nov 2011, APECS Webinar Documentation (metadata)

28 Responsible Data Use and Local Data Management; Presented 8 Nov 2011, APECS Webinar Poor data practice results in loss of information Information Content Time Time of publication Specific details General details Accident Retirement or career change Death (Michener et al. 1997)

29 Responsible Data Use and Local Data Management; Presented 8 Nov 2011, APECS Webinar Don't you think it would be more efficient If you didn't have to remember the name of that file? and the directory where you put it? the units those measurements were taken in? which sample site was which? etc.

30 Responsible Data Use and Local Data Management; Presented 8 Nov 2011, APECS Webinar Making Your Research Easier and Cheaper Write it down!

31 Responsible Data Use and Local Data Management; Presented 8 Nov 2011, APECS Webinar Needed Documentation What: Title of Data Set and Keywords Describing the Data Set Why: Description and Purpose of the Data Set When: Temporal Coverage of the Data Set Who: Data Set Creator and Contact Where: Geographic Extent and Location of Data Set Coverage How: How the Data Set was Created and How to Access the Data

32 Responsible Data Use and Local Data Management; Presented 8 Nov 2011, APECS Webinar Documentation Best Practices Document your conventions as they’re established Revise documentation contemporaneously, not “after the fact” This work is also the basis for end-user or reviewer documentation What should you document? Everything! Data import, manipulation, QC procedures, special flags and encoding Naming conventions, layouts, headings, units and abbreviations Does TEMP mean “temporary,” “air temperature at time of observation,” or ? Formulae and constants

33 Responsible Data Use and Local Data Management; Presented 8 Nov 2011, APECS Webinar References and Resources Michener, W.K., Brunt, J.W., Helly, J.J., Kirchner, T.B. and S. G. Stafford. 1997. “Nongeospatial metadata for the ecological sciences.” Ecological Applications 7(1):330-342. Data management training materials in development are available at http://wiki.esipfed.org/index.php/Data_Management_Course _Outline http://wiki.esipfed.org/index.php/Data_Management_Course _Outline A short list of data management related resources available on the web can be found at http://wiki.esipfed.org/index.php/Data_Management_Resour ces http://wiki.esipfed.org/index.php/Data_Management_Resour ces


Download ppt "Responsible Data Use and Local Data Management Ruth Duerr National Snow and Ice Data Center."

Similar presentations


Ads by Google