Smart and Simple Data Management

Slides:



Advertisements
Similar presentations
Electronic theses and copyright Janet Aucock Head of Repository services March 2014.
Advertisements

Selecting a Data Sharing Repository. 2 Why Share Data? Enabling others to replicate and verify results as part of the scientific process Allows researchers.
Documenting the Resource Malcolm Polfreman
Open Exeter Project Team
EPSRC expectations on research data: What researchers need to know 12/03/2015 Masud Khokhar and Hardy Schwamm.
Social Science Data and ETDs: Issues and Challenges Joan Cheverie Georgetown University Myron Gutmann ICPSR – University of Michigan Austin McLean ProQuest.
Open for ^ Business Research Data Services & Data Management Planning Ryan Schryver Wendt Commons is our.
U.S. Department of the Interior U.S. Geological Survey Planning for Data Management Creating data management plans for your project.
Libra: Thesis and Dissertation Submission. What is Libra? UVA’s institutional repository, providing online archiving and access for the scholarly output.
Team working in distributed environments M253 Project Logs Faculty of Computer Studies Arab Open University Kuwait Branch 9/19/20151Kwuait Branch.
Recordkeeping for Good Governance Toolkit Digital Recordkeeping Guidance Funafuti, Tuvalu – June 2013.
ACCESS for VALIDITY ACCESS for INNOVATION. Starting January 2011 for NEW proposals Not voluntary – “integral part” of proposal and FastLane Required for.
Elements of a Data Management Plan Bill Michener University Libraries University of New Mexico Data Management Practices for.
Thesis Format and Submission
Choosing Between Data Sharing Repositories for Engineering Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch.
How Not to Be the Only One Who Knows About Your Research Sharing and Archiving for Posterity Melanie Radik and Raphael Fennimore Library & Technology Services.
DalSpace A content repository for Dalhousie community members.
DOE Data Management Plan Requirements
Information Management and the Departing Employee.
Using the DMPTool for data management plans Kathleen Fear February 27, 2014.
Writing a Data Management Plan with the DMPTool Kathleen Fear January 15, 2015.
Writing a successful data management plan Kathleen Fear October 17, 2013.
Research Data Management in the Humanities: an Introduction to the Basics Open Exeter Project Team.
Writing a data management plan (DMP) Stephen Grace and David McElroy Writing a DMP workshop, UEL 5 March 2015.
PhD-course Research Data Management (RDM) Expert Centre Research Data.
Data Rescue! Data Management Series: Workshop 1 HUMANS RESEARCH DATA SERVICE.
Data Workflow Data Management Workshop ELIZABETH WICKES, DATA CURATOR HEIDI IMKER, DIRECTOR RESEARCH DATA SERVICE.
Introduction to Metadata
Screening for Patients’ Health Insurance and Confidentiality Needs
Technology Transfer Office
Slides Template for Module 3 Contextual details needed to make data meaningful to others CC BY-NC.
Documenting and automating your work
Protein Sequences in a Major Scientific Database
Open Exeter Project Team
Pre-Course Assignment
Do You Copy That? A Presentation about Copyright
Slides Template for Module 5
Copyright and Open Licensing
Copyright and Open Licensing
Data Management What? Why? How?.
Opening a World of Opportunity
An Overview of Data-PASS Shared Catalog
Publishing software and data
Institutional role in supporting open access, open science, open data
Year 7 E-Me Web design.
General Finnish DMP Guidance
Getting Started with Data Management
Data Management: Documentation & Metadata
Open Access to your Research Papers and Data
SFU Open Access Policy Endorsed by Senate January 9, 2017
Introduction to Research Data Management
Unit 10: MATH The Unit Organizer Betchan , Strauss, Gonzales
Planning and Writing your Thesis
Back to Table of Contents
Protecting Yourself from Fraud including Identity Theft
Data Management Planning
Research Data Management
What is BankMobile? A process to select how to receive student refunds and student payroll payments It is fast, secure, and convenient. Go to:
Research Data Management
September 18th – September 20th
INTRODUCTION TO PUBLIC DISCLOSURE RESPONSE
Crowdfunding Let’s Grow State Getting Started
Online Safety: Rights and Responsibilities
Policy Frameworks: building a firm foundation for your IR
CARL Guide to Author Rights
Copyright and Open Licensing
Data + Research Elements What Publishers Can Do (and Are Doing) to Facilitate Data Integration and Attribution David Parsons – Lawrence, KS, 13th February.
Getting Started with Data Management & DMPTool
Research Data Dr Aoife Coffey, Research Data Coordinator
TALKING POINTS Introduce yourself
Presentation transcript:

Smart and Simple Data Management

What data? Data is defined, consistent with OMB circular A-110, as: the digital recorded factual material commonly accepted in the scientific community as necessary to validate research findings including data sets used to support scholarly publications but generally does not include ... lab notebooks, preliminary analyses, drafts of scientific papers, plans for future research, peer review reports, communications with colleagues, physical objects, such as lab specimens

National Data Policy - 2013 OSTP Memo: Increasing Access to the Results of Federally Funded Scientific Research “requiring researchers to better account for and manage the digital data resulting from federally funded scientific research”   Data management plans will be come compulsory Providing public access to data will become more routine http://www.whitehouse.gov/blog/2013/02/22/expanding-public-access-results-federally-funded-research

Data Management Practices complete records find understand back-up migrate

Data Management Reality I’ve lost two patent claims because we didn’t have complete records in lab notebooks. I panic every time I try to find my students’ data. I know I won’t be able to publish anything after my grad student leave because it’s too hard to understand what he/she did. We don’t have the assay results with substrate X, because my student didn’t back-up the computer. And we’re out of substrate X. The only reason I was able to do the analysis was because on a whim I decided to migrate some of my data off of floppy disks to CDs back in the 90s. I wish I had done all of it.

What are the elements of good data management?

Plan Quantitative or Qualitative? Instrument data? Experiment data? Observation data? Ethnographic data? Survey data?

Data Management Plans Usually short (2 page) documents submitted as part of a grant application Required by many federal funding agencies (more expected) and some funding announcements require them already regardless of agency policy When required, generally subject to panel review Currently DMP expectations are variable

Where to start with a DMP? On a tight deadline for a grant? RDS Review via researchdata@library.illinois.edu or DMPtool.org Revisit post-submission to refine Have more time? Craft a DMP that you can implement and reuse Ten Simple Rules for a Good DMP: Michener (2015) PLOS Comp Bio DOI: 10.1371/journal.pcbi.1004525 We’re still happy to review! researchdata@library.illinois.edu

Organize

Organization Distinguish Raw Data Be Consistent in File Naming and Directory Structures Define and Exercise Version Control Used with permission from “Piled Higher and Deeper” by Jorge Cham http://www.phdcomics.com/comics/archive.php?comicid=1323. Copyright 2010 Jorge Cham.

Consistent File and Folder Naming For quickly finding and sorting files and folders, the names should be consistent but unique. Avoid special characters. project name/acronym experiment/instrument type site location information (if applicable) researcher initials date (consistently formatted, i.e. YYYY-MM-DD) version number (w/ leading zeros)

Then Make it Clear and Known

Date Tip vs. BAM Co-Exp Run 01 2014-09-04.txt Run 1 B anth meth Sept 4 .txt BAM Rxn 2 2014_09_04.txt 20140904_meth_3.txt

Document

Describing Data Enables someone unfamiliar with your data to find, evaluate, understand, and reuse your data Helps YOU to remember details about your data after time has passed Examples Read Me File: http://hdl.handle.net/2142/28689 Standardized Metadata: https://www.icpsr.umich.edu/icpsrweb/neutral/ddi25/studies/35383 Codebook: https://www.icpsr.umich.edu/icpsrweb/ICPSR/studies/35383 Data Dictionaries: Page 4-1 of this User Guide http://goo.gl/Ugqiqo

Basic Documentation Types of Documentation: Descriptive (e.g., creator, title, keywords) Structural (e.g., relation to other files) Administrative (e.g., software & hardware requirements, rights information) If you know the data will be deposited in a repository, understand the documentation requirements early in the process

Data Documentation Continuum Informal ReadMe Formal Schema Low-Barrier Low-Quality High-Quality High-Barrier Fast Easy Irregular Incomplete Standardized Rich Slow Skilled

Controlled Vocabulary Take the guess work out of choosing between: a preferred spelling behavior vs behaviour a scientific or popular term pig vs porcine vs Sus scrofa domesticus determining which synonym to use record vs entry determining which abbreviation to use (if you have to) USA vs US

Storage & Backup

Storage & Backup Number of copies Location of copies Schedule Robustness Efficacy Photograph used under a CC-By license. Photography taken by Gail Steinhart and available at http://www.flickr.com/photos/gailst/7824341752/

The 3-2-1 Rule 3: copies of data 2: kinds of media 1: copy remote “There are two kinds of people in the world - those who have had a hard drive failure, and those who will.” 3: copies of data 2: kinds of media 1: copy remote

Data Security All research data is a theft target (not just military) Risk extends beyond just “getting hacked” Confidentiality (premature disclosure or violation of law/NDA) Integrity (data tampered with, not trustworthy) Availability (data/system disappears, cannot work for days or weeks) For human subject research, IRB can offer advice on security Tech Services Security Office welcomes questions (securitysupport@illinois.edu)  http://go.illinois.edu/uoficloud for a guide to selecting a cloud platform

Workflow mapping activity We’ve given you a ton of information on approaches to organization Getting started can be overwhelming Workflow mapping can help At the start…to have a plan In the middle…to double check At the end…to know what you’ve done Think of it as retracing your steps

Workflow mapping activity Take the post-it notes Lay out five in a row Think of a specific project, or generalize Write out your activities for that project, from start to finish Go high level – try to just use five

Workflow mapping activity Once you’ve written out the activities… Think of any data, scripts, or other files that you touch, both items you use and items you make. Write down what they are Their file names and/or location And if/where they’re backed up

Workflow mapping activity And now you have a pretty solid piece of documentation! Number your stickies Take a cell phone picture, redraw this in PowerPoint, or just stick them in your handout

Archiving and Sharing

Anticipate the Long Term What happens to your data in… 2 years? 5 years? 20 years? Will the formats be valid? Will the hardware be defunct? Will anyone know where it is?

 Anticipate the Long Term Keep Private Make Public More Control More Responsibility Less Control Higher Potential Impact Keep Private Make Public 

Data “Publication” Making datasets (in and of themselves) publically accessible improves transparency and reproducibility of research save time by reducing duplication of effort (yours or theirs) makes the data itself independently discoverable   another way to expose to your work maybe you’ll need that data again some day Save time: repositories often have specific requirements for data structure, documentation/metadata, file content and organization, and redaction of confidential data. If you choose a repository in advance, you can follow their guidelines from the start, avoiding extra processing/cleanup later Save Money- Some repositories charge an up-front deposit fee to the researcher. Better to know from the start if you will have to pay to use a specific repository- you can often ask for an advance cost estimate from the repository, and build the cost into funding requests and your project’s budget, and/or do a cost comparison between repositories. MEET FUNDER/PUBLISHER REQUIREMENTS- As we’ve mentioned, more and more funders and publishers are requiring data management plans or data sharing statements, explaining how you will make your data publicly available (or justify why you will not). Assessing your project’s data sharing concerns and selecting a repository in advance that meets funder/publisher requirements will help you do this. Also if you’ve done your homework, you can be more confident that you’ll actually be able to follow through on this plan. -PROVIDE PEACE OF MIND – Can you open files you created 5 years ago? 10 years ago? Do you know where the data from your first research project is? Would you be able to understand it now? No more worries about floppy disks, CDs, misplaced/corrupted or unintelligible data. You will be able to find and use your data later, and so will others.

Limitations for Data Publication protecting confidentiality and personal privacy recognizing proprietary interests, business confidential information, and intellectual property rights preserving the balance between the relative value of long-term preservation and access and the associated cost and administrative burden Intellectual property is a concept that includes copyright law Facts are not copyrightable, but a particular expression of facts (i.e. a table of statistics) can be copyrighted Intellectual property: who owns the data? Who controls it? The PI for the grant that pays your salary? The company that contracted with your advisor for a specific project? The university? “The results of research or development carried on at the University by any of its faculty, employees, students, or other users of its facilities and having the expenses thereof paid from university funds or from funds under the control of the University, belong to the University and are to be used and controlled in ways to produce the greatest benefit to the University and to the public.” University Statutes, Article XII Section 3. The company or organization that gave you the data? Did you sign a contract with them? If using third party data, know the licensing and reuse contract stipulations on that data, it will affect your ability to share/publish, and under what license (usually cannot be a more permissive license than the source data’s license, and may have other requirements/restrictions). If you don’t know- ask!

Illinois Data Bank (databank.illinois.edu) A self-serve publishing platform that centralizes, preserves, and provides persistent and reliable access to Illinois research. can be linked to related materials, such as articles, theses, code, and other datasets can include files of any format and self-deposit sizes up to 15 GB/file via Box.com, larger sizes welcome by arrangement can be deposited for immediate release or temporarily embargoed receive a stable, unique identifier (DOI) for persistent access and ease of citation are registered in a central, world-wide catalog for better discovery are professionally managed and curated by the Research Data Service staff at the University Library are preserved for a minimum of 5 years

website: researchdataservice.illinois.edu Contact visit: 310-312 Main Library call: (217) 300-3513 website: researchdataservice.illinois.edu

Data Management Workshop Series Intro to Data Management Hand-on activities: data survey, inventory, and catalog Data Documentation Hands-on activities: using documentation and documentation building Data Sharing Hands-on activities: exploring repositories and considerations for sharing Data Workflows More detailed workflow mapping activity, great for large and small project teams Check your handout for the schedule and registration info!

Useful. Tell your friends and colleagues Useful? Tell your friends and colleagues! Our goal is for every graduate student and faculty member on campus to see this presentation. Not Useful? Tell us!