Data Workflow Data Management Workshop ELIZABETH WICKES, DATA CURATOR HEIDI IMKER, DIRECTOR RESEARCH DATA SERVICE.

Slides:



Advertisements
Similar presentations
Research Data Service at the IT Pro Forum HEIDI IMKER, DIRECTOR.
Advertisements

August 14, 2015 Research data management – an introduction Slides provided by the DaMaRO Project, University of Oxford Research Services.
EZID (easy-eye-dee) is a service that makes it simple for digital object producers (researchers and others) to obtain and manage long-term identifiers.
Information guide.
EPSRC expectations on research data: What researchers need to know 12/03/2015 Masud Khokhar and Hardy Schwamm.
Career Services Center Employer Training. This is the main login page. The link can be found at Employers.
5-7 November 2014 DR Workflow Practical Digital Content Management from Digital Libraries & Archives Perspective.
Go to your school’s web locker site school name.schoolweblockers.com) Your user name is the first letter of your first name, the first 4.
32. 2 “The Obama Administration is committed to the proposition that citizens deserve easy access to the results of scientific research their tax dollars.
October 24, 2015 Research data management – a brief introduction Slides provided by the DaMaRO Project, University of Oxford Research Services.
June 3, 2016 Research data management – an introduction Slides provided by the DaMaRO Project, University of Oxford Research Services.
Dataset citation Clickable link to Dataset in the archive Sarah Callaghan (NCAS-BADC) and the NERC Data Citation and Publication team
1 Taking Notes. 2 STOP! Have I checked all your Source cards yet? Do they have a yellow highlighter mark on them? If not, you need to finish your Source.
JORDANHILL SCHOOL WEBSITE AND PODCAST TRAINING SESSION How to use ICT to enhance teaching and learning.
Data Grids, Digital Libraries and Persistent Archives: An Integrated Approach to Publishing, Sharing and Archiving Data. Written By: R. Moore, A. Rajasekar,
Data Rescue! Data Management Series: Workshop 1 HUMANS RESEARCH DATA SERVICE.
Science Notebook Guide Who needs a Science Notebook? What materials do I need to make a Science Notebook? When is it due? Where will I keep it? Why do.
TRELLO. WHAT IS TRELLO!? Trello is a project management tool that makes collaboration easy. This visual list tool can do so much more, whether you're.
Jacynthe Touchette, MSI JGH Health Sciences Library
Author’s Name/s Goes Here, Author’s Name/s Goes Here
NRF Open Access Statement
Hidden Slide for Instructor
Interactive Science Notebook (Ack: Mrs Ryan, AZ science teacher)
Development Environment
Protein Sequences in a Major Scientific Database
Open Exeter Project Team
Incorporating W3C’s DQV and PROV in CISER’s Data Quality Review and
BIO1130 Lab 2 Scientific literature
Unit 2 – Data Collection & Problem Solving
How to open source your Puppet configuration
SAA-COSA Annual Meeting Session 301: Capstone Officials & Public Records Arian D. Ravanbakhsh Office of the Chief Records Officer for the U.S. Government.
Center for Open Science: Practical Steps for Increasing Openness
How to set up Composition book and use Cornell Notes
Data Workflows Elizabeth Wickes, Data Curator Research data service
Data Workflow Data Management Workshop
Data Documentation Data Management Series: Workshop 2
Publishing software and data
Institutional role in supporting open access, open science, open data
Unit 2 – Data Collection & Problem Solving
Adding Assignments and Learning Units to Your TSS Course
Jay Bhatt Drexel University Libraries
Matthew Harp Arizona State University
Interactive Science Notebook (Ack: Mrs Ryan, AZ science teacher)
What is Google Classroom?
How to publish your research
Module 6: Preparing for RDA ...
Open Access to your Research Papers and Data
UNITY TEAM PROJECT TOPICS: [1]. Unity Collaborate
How to set up Composition book and use Cornell Notes
If you haven’t downloaded the fonts yet, go back to the file that says “START HERE.” It will look way better once you do!
Integrating Google Classroom into Middle School and High School Education Reed Peterson.
Logbook A logbook should be started BEfORE anything is done on a project — before the problem has been selected, and before the details have been worked.
Lesson 12.
Building Skills for High School & College Success
denblogs.com/jendorman
Go to =>
Repository Platforms for Research Data Interest Group: Requirements, Gaps, Capabilities, and Progress Robert R. Downs1, 1 NASA.
Blackboard Tutorial (Student)
I've Got To Write A Research Paper ! ! !.
Tonga Institute of Higher Education IT 141: Information Systems
USER MANUAL - WORLDSCINET
CMSC201 Computer Science I for Majors Final Exam Information
Digital Stewardship Curriculum
BIT 115: Introduction To Programming
Research Data Management
Software Development Life Cycle (SDLC)
Tonga Institute of Higher Education IT 141: Information Systems
Mrs. Shivers Classroom Crockett Intermediate School Paris, TX
Project management Initiative & Self Learning Confidence in leadership
USER MANUAL - WORLDSCINET
Presentation transcript:

Data Workflow Data Management Workshop ELIZABETH WICKES, DATA CURATOR HEIDI IMKER, DIRECTOR RESEARCH DATA SERVICE

Knowledge around data policies, resources, archiving, & preservation Consultation for data management planning & implementation Workshops on data management, documentation, and data publishing Data Management Plan reviews and DOI minting services Solutions for public access to research data Centralized, private storage for active (“working”) data (with NCSA) Research Data Service (RDS) The Research Data Service provides the Illinois research community with expertise, tools, and infrastructure to manage and steward research data. visit: researchdataservice.illinois.edu or

Expertise Knowledge around data policies, tools, resources, archiving, and preservation Consultation and workshops for data management planning and implementation Tools Data Management Plan creation wizard (DMPTool.org) Tools for data citation (DOI minting) Infrastructure (in progress) Illinois Data Bank (self-deposit institutional data repository) What do we do?

Workflow Workshop Goals Know the tools you use the data you use where it all lives where it all goes Learn How your data workflow works Points where you need clarification How your collaboration with others could be improved Practice Mapping out your workflow

Required Materials 1 color, 3x3 post it note 4 colors, smaller post it notes 1 color the same as the 3x3 note Large paper, divided into 3 horizontal sections Handout with cheat sheet Pen or pencil

Why is documentation important (in the short term)? Because other humans exist Those humans will need to make use of your stuff Many of those humans are sitting next to you right now But humans move around, particularly students So at some point, there will be humans in this room, using your stuff, and have no idea who you are.

Documentation Content ProjectDatasetData fileDatum Detail Workflow Experimental Procedures Transformations Workflows Analysis

Like retracing your steps… When you lose your phone or keys, one of the best ways to figure out where it might be is retracing your steps. You think about what you’ve done, step by step, and in theory you can recreate everything you’ve touched in the process. Thus, workflow mapping can be a useful first step in identifying where you may have tucked data away.

What data do you have? Input Source data Data from other people Process Temporary files Intermediate datasets Output Output data Data for other people Data that goes into reports or other final products

What do you do to that data? Input Ingest Process Clean Train Test Output Analysis Write up Backup

So how do you science? Input data Output data investigate data make test data clean the data train a model test the model get other data in check the algorithm write some scripts make some charts join in other data save stats analysis clean the data again

So how do you science? Input data Output data investigate data make test data clean the data train a model test the model get other data in check the algorithm write some scripts make some charts join in other data save stats analysis clean the data again SCIENCE.

So how do you science? Input data Output data investigate data make test data clean the data train a model test the model get other data in check the algorithm write some scripts make some charts join in other data save stats analysis clean the data again SCIENCE. Publications Don’t forget about us!

So you’ve got stuff When working in large teams, it is particularly important to know: What you receive from others (input data and materials) What you make for others

Activity: Workflow Map This will be our main activity today, so feel free to take your time on these steps and ask questions The intention is not to capture every detail of your workflow, but to help you get a feel for the big picture and points where you may need clarification or other help.

Example workflows

The Board Inputs Outputs Data Products Activities Scripts, Software, & Tools Data Input/Outp ut (from others) Data (NOT for others) Tools Used Notes or Annotations Workflow activities

Make this your own You know what you do best Use your own voice and words Just be sure you’ll be able to understand them later

Step 1: Identify your inputs Take the pink post-its and write “input” in the top left corner Think of the project you want to map in this workflow. Write the input data for this project on the pink post-it. This can be raw data This can be data you get from someone else, and write down their name as well! If you get data from more than one source, you can make more than one post-it. Just be sure to label them all “input.” Put the pink post-it to the left side of the center section of your sheet of paper.

Data Input (from others)

Step 1: Start working on the steps Take 4 regular yellow post-its and line them up in the middle of your paper, after the input post-it. Add more as necessary, but try to think at a very high level. Use the smaller yellow post-its to add annotations. Examples: ingest data make training set train model test model etc.

Data Input (from others) Activity 1…Activity 2…Activity 3…Activity 4… Loop here until model fits

Step 2: How do you process the data? Look at the individual steps you’ve written down: Is there a script, software package, reference resource, or other tool you need in order to complete the step? If so, write it on one of the darker blue post-its. Are there temporary, scratch, or other data files created for during this step? If so, make note of them on a lighter blue post-it Are you making data for other people in this step? Use a pink post-it Write down their name(s) on it as well. Continue this process with each of the steps of your workflow

Data Input (from others) Activity 1…Activity 2…Activity 3…Activity 4… script1.pyanalysis.r some other software test dataset training dataset model visualizations for paper sqlite database Loop here until model fits

Step 3: What are my outputs? When you get to steps where you create data that is handed off to other people or that you need someone else’s help to complete, note that information down on a pink post-it and label it “output.” Put output post-its at the end of your workflow line, or above the step where they’re generated if they’re produced before your workflow is complete.

Data Input (from others) Activity 1…Activity 2…Activity 3…Activity 4… report for developer model fitting script for dev script1.pyanalysis.r some other software test dataset training dataset model visualizations for paper sqlite database Loop here until model fits

Step 4: Where were there problems? Did you run into something you don’t know, need to look up, or need to finish? Make a note on a red post-it and place it by the appropriate point in your workflow.

Activity discussion What did we learn from this? Are there points where you need more interaction with your team than you realized? What data is for your personal use only? Where is it stored and how do you manage it? What data needs to go to other people? How are you sharing it? Are you keeping backups of it as well? Where? Homework: Take a picture of this workflow back to your team. Where do their workflows hook into yours? Where do yours hook into theirs? Are there ways you can improve data sharing within the team?

Why is documentation important (for the long term)?

National Data Policy OSTP MEMO: INCREASING ACCESS TO THE RESULTS OF FEDERALLY FUNDED SCIENTIFIC RESEARCH “requiring researchers to better account for and manage the digital data resulting from federally funded scientific research” Data management plans will be come compulsory Providing public access to data will become more routine federally-funded-research

Agency Policies

Publisher Policies

Bioinformatics “All data on which the conclusions given in the publication are based must be publicly available…” Genome Research “Genome Research will not publish manuscripts where data used and/or reported in the paper is not freely available in either a public database or on the Genome Research website. There are no exceptions.”

(Discussions of) Mistakes Are Public

Data “Publication” Making datasets (in and of themselves) publically accessible improves transparency and reproducibility of research save time by reducing duplication of effort (yours or theirs) makes the data itself independently discoverable another way to expose to your work maybe you’ll need that data again some day

In Action – Warnow Paper Citations as of Sept 26th 45 S. Mirarab, M. S. Bayzid, B. Boussau, T. Warnow, Science 346, (2014).

In Action – Warnow Data S. Mirarab, M. S. Bayzid, B. Boussau, T. Warnow, IDEALS (2014). Downloads as of Sept 26th: 1522

Illinois Data Bank (databank.illinois.edu) can be linked to related materials, such as articles, theses, code, and other datasets can include files of any format and sizes up to 15 GB/file via Box.com can be deposited for immediate release or temporarily embargoed receive a stable, unique identifier (DOI) for persistent access and ease of citation are registered in a central, world-wide catalog for better discovery are professionally managed and curated by the Research Data Service staff at the University Library are preserved for a minimum of 5 years A self-serve publishing platform that centralizes, preserves, and provides persistent and reliable access to Illinois research.