Data Workflows Elizabeth Wickes, Data Curator Research data service

Slides:



Advertisements
Similar presentations
Time Management Tips. 1. Estimate – Figure out realistic times for how long things take you and allow yourself enough time to complete them. If you find.
Advertisements

Interactive Journals My Journal. Key Ideas  Interactive journaling will make a difference!  Students are actively engaged in thinking and communicating.
The Function Design Recipe CS 5010 Program Design Paradigms “Bootcamp” Lesson 1.1 TexPoint fonts used in EMF. Read the TexPoint manual before you delete.
Take Out Pencil Eraser. When finished with quiz… Copy down “Good Data” rules from whiteboard onto the bottom of page 26 Conventions for Good Data:
Label in the Margin System
Data Management: Documentation & Metadata Types of Documentation.
DESIGN PROCESS. DESIGN Every design starts from research and early concept.
Or Where do I begin?. Review the key Features from the Mind map last week Read the handout on Essay Writing Now work in pairs Mind map an outline for.
Study Tips for COP 4531 Ashok Srinivasan Computer Science, Florida State University Aim: To suggest learning techniques that will help you do well in this.
Mrs. Buonomo Interactive Science Notebooks. J. Buonomo What are Interactive Science Notebooks?  A student thinking tool  An organizer for inquiry questions.
Take Out Pencil Catalyst Sheet Science notebook. Catalyst 1. Why is it important to set up a consistent interval on both the X and Y axes before making.
Staying Organized 9/18/2013. Learning Target I can use my iPad to stay organized as a student. Success Criteria: – I can access my files via WebDAV. –
PLoS ONE Application Journal Publishing System (JPS) First application built on Topaz application framework Web 2.0 –Uses a template engine to display.
EQ: How can we learn the basics of formatting a college research paper in Microsoft Word? Mini Unit: Typing a Paper Diogene Date: 4/20/2015 Course: ELA-Grade.
11/25/2015Slide 1 Scripts are short programs that repeat sequences of SPSS commands. SPSS includes a computer language called Sax Basic for the creation.
Setting Up your Hosting Account and Installing WordPress and Omeka CCC America Advanced Omeka Training.
Animal Shelter Activity 2.
Developing Flipped Learning Experiences Date: Time Instructor Name Click the microphone icon at the top of the Audio & Video window to enter the Setup.
Introduction to Computer Programming - Project 2 Intro to Digital Technology.
Math 252: Math Modeling Eli Goldwyn Introduction to MATLAB.
The Writing Process Five Steps to Writing it Right Spend time on each step for A great finished product!
1 Taking Notes. 2 STOP! Have I checked all your Source cards yet? Do they have a yellow highlighter mark on them? If not, you need to finish your Source.
The Writing Process Five Steps to Writing it Right Spend time on each step for A great finished product!
Interactive Science Notebooks. What are Interactive Science Notebooks?  A student thinking tool  And organizer for inquiry questions and what I learned…
Post Card Painting Today we will create a design for a postcard.
Good morning and thank you for coming to the Summer Institute!
Project 1 to 3. Project 1 (10 pts) (use the word document to enter results and answers – Save this file as Lname_BUA350_Cohort#_projects#.doc Go to Total.
Cold Reads: A HOT TOPIC FOR IMPROVING READING COMPREHENSION.
Data Workflow Data Management Workshop ELIZABETH WICKES, DATA CURATOR HEIDI IMKER, DIRECTOR RESEARCH DATA SERVICE.
Advanced Higher Computing Science
Interactive Science Notebooks
Implementation Process
Registering for Easy Bib and Creating a Works Cited Page
Super3 Mini~ Page Project.
Vocabulary byte - The technical term for 8 bits of data.
Vocabulary byte - The technical term for 8 bits of data.
Instructional Supervisor for K-5 Math & K-8 Science
Data Workflow Data Management Workshop
Using the Excel Creation Template to Create a Variable Parameter Problem (Macro Enabled “Alpha 1.4.2”) Getting started – Example 1 Note – You should be.
What the text is MOSTLY about.
Vocabulary byte - The technical term for 8 bits of data.
Vocabulary byte - The technical term for 8 bits of data.
COMPREHENSION Tool Kit K-3 1 1
Managing Your Literature Search Using Zotero
What is Google Classroom?
JCL Standards #2 Company Name
Performance Task Overview
Macros/VBA Project Modules and Creating Add-Ins on the Toolbar
Agricultural Microeconomics Lesson 7: Intro to Spreadsheet Modeling
UNITY TEAM PROJECT TOPICS: [1]. Unity Collaborate
Final Project Details Note: To print these slides in grayscale (e.g., on a laser printer), first change the Theme Background to “Style 1” (i.e., dark.
I CAN: compare the phases of mitosis and meiosis.
Business Communication
Interactive Science Notebooks
TOOLS FOR SUCCESS Mr. McKee.
Citation Map Visualizing citation data in the Web of Science
Daily Vocab & What Makes a “Good” Poster?
Comic Book/Story Board
Digital Stewardship Curriculum
Lecture 5: Writing Page
The Display Board
Overview of Workflows: Why Use Them?
Getting started – Example 1
Instructions for ASI Report Cover Template
Instructions for ASI Report Cover Template
Exploring Microsoft® Office 2016 Series Editor Mary Anne Poatsy
CS334: Logisim program lab6
Technical Aspects of the Data
The Research Paper: Part 2
Presentation transcript:

Data Workflows Elizabeth Wickes, Data Curator Research data service University of Illinois Urbana Champaign

Workflow Workshop Goals Know the tools you use the stuff you use where it all lives where it all goes Learn How your project workflow works Points where you need clarification How your collaboration with others could be improved Practice Mapping out your workflow

Materials Preferred: Minimally: Alternatively: A few pieces of paper A pencil and/or pens in several colors Post it notes in as many colors as you can find Minimally: A piece of paper and a writing instrument Alternatively: Your imagination

What data do you have? Input Process Output Source data Data from other people Process Temporary files Intermediate datasets Output Output data Data for other people Data that goes into reports or other final products

And what do you do to it? Input Process Output Ingest Clean Train Test Analysis Write up Backup

So how do you science? make some charts join in other data investigate data get other data in check the algorithm Input data Output data clean the data again Input data Output data clean the data test the model write some scripts Input data Output data make test data save stats analysis train a model

So how do you science? SCIENCE. make some charts join in other data investigate data get other data in check the algorithm Input data Output data clean the data again Input data Output data clean the data test the model write some scripts Input data Output data make test data save stats analysis train a model

So how do you science? SCIENCE. Don’t forget about us! make some charts join in other data Publications SCIENCE. investigate data get other data in check the algorithm Input data Output data clean the data again Input data Output data clean the data test the model write some scripts Input data Output data make test data save stats analysis train a model

But what do I do? We’re going to cover an activity to help you think about your projects Can be used prospectively to help plan Or retrospectively to pick up the pieces

Choose a project Something you’re just wrapping up? Something you’re in the middle of? Something you’re planning for next year?

Activity: Workflow Map The intention is not to capture every detail of your workflow, but to help you get a feel for the big picture and points where you may need clarification or other help. Default to thinking very high level and generalized Remember to use specific, short, and meaningful names you’ll understand 6 months from now

Approaching an initial workflow Think about these 3 questions: What kind of evidence will help answer your research question? Be as specific as possible, but don’t be afraid to generalize at this stage. What will you do? Use verbs: read, write, script, compute, process, document, etc. What will you make? Use nouns or named entities: numbers, words, data, graphics, articles, metadata, databases, etc.

The Board & the Pieces What you make What you do What you use Digital objects Objects for you What you make or Physical objects Objects for others Activity/Action What you do Source object/data Tool you use What you use

Make this your own You know what you do best Use your own voice and words Just be sure you’ll be able to understand them later So document your changes, maybe?

Extract desired values Start with your activities: lay out about 5-7 big yellow stickies in a row in the center, and write down what you will do – action statements please Harvest data Split data pkgs up Explore data & QA Extract desired values Do SCIENCE! & math Fine to be very general about activities. The point is to note that you’ll do them! Also fine to end your workflow at a meaningful breakpoint.

Extract desired values Harvest data Split data pkgs up Explore data & QA Extract desired values Do SCIENCE! & math Then think about order, location, etc. Reorder them as necessary. Write down any data sources or other errata that would be helpful context.

Each activity note makes a column resources that are made Each activity note makes a column Harvest data Split data pkgs up Explore data & QA Extract desired values Do SCIENCE! & math We’ll do the resources used first and resources that are used.

Extract desired values Think first about the data resources you’ll be using for each activity, and place a small yellow sticky in the associated column naming either the data source or the data file used in the process. Harvest data Split data pkgs up Explore data & QA Extract desired values Do SCIENCE! & math OAI-PMH datastore Data pkgs from  Split data files Split data files My clean data???? You might be unsure about the resource or there might not be a resource

Extract desired values Second, use a small pink sticky note to note the tool you use. Examples might be a database system, a script you have, a module, or a software package Harvest data Split data pkgs up Explore data & QA Extract desired values Do SCIENCE! & math OAI-PMH datastore Data pkgs from  Split data files Split data files My clean data???? scrape.py lxml Split.py lxml pandas pandas R??

Extract desired values Use as many as you need. Okay to repeat! Harvest data Split data pkgs up Explore data & QA Extract desired values Do SCIENCE! & math OAI-PMH datastore Data pkgs from  Split data files Split data files My clean data???? scrape.py lxml Split.py lxml pandas pandas R??

Extract desired values XML chunk files Note the data products that you’ll be making Access metadata Use another color to distinguish another kind of data type or purpose (e.g.if that data will go to another human) Harvest data Split data pkgs up Explore data & QA Extract desired values Do SCIENCE! & math OAI-PMH datastore Data pkgs from  Split data files Split data files My clean data???? scrape.py lxml Split.py lxml pandas pandas R??

??? Make a note if you’re unsure Indiv. XML files Jupyter notebook XML chunk files Indiv. XML files Jupyter notebook Aggreg. data file ??? Make a note if you’re unsure Access metadata My notes Docume-ntation notes Harvest data Split data pkgs up Explore data & QA Extract desired values Do SCIENCE! & math OAI-PMH datastore Data pkgs from  Split data files Split data files My clean data???? scrape.py lxml Split.py lxml pandas pandas R??

Extract desired values XML chunk files Indiv. XML files Jupyter notebook Aggreg. data file ??? Access metadata My notes Docume-ntation notes Harvest data Split data pkgs up Explore data & QA Extract desired values Do SCIENCE! & math Start out very general if you need OAI-PMH datastore Data pkgs from  Split data files Split data files My clean data???? scrape.py lxml Split.py lxml pandas pandas R??

??? Use the red stickes to note any pain points or questions XML chunk files Indiv. XML files Jupyter notebook Aggreg. data file ??? Use the red stickes to note any pain points or questions How do I write documentation? Access metadata My notes Docume-ntation notes Then add who can help or answer your question. Not sure what it’ll be Which stats? Harvest data Split data pkgs up Explore data & QA Extract desired values Do SCIENCE! & math Do I need an API? OAI-PMH datastore Will my computer have enough space? Data pkgs from  Split data files Split data files My clean data???? How bad is it to switch platforms? scrape.py lxml Split.py lxml pandas pandas R??

??? Indiv. XML files Jupyter notebook Aggreg. data file XML chunk files Indiv. XML files Jupyter notebook Aggreg. data file ??? How do I write documentation? Access metadata My notes Docume-ntation notes Not sure what it’ll be Which stats? Harvest data Split data pkgs up Explore data & QA Extract desired values Do SCIENCE! & math Do I need an API? OAI-PMH datastore Will my computer have enough space? Data pkgs from  Split data files Split data files My clean data???? How bad is it to switch platforms? scrape.py lxml Split.py lxml pandas pandas R??

Now take another look Are there deadlines you can trace back and add? Looking at the stuff that you are making: What folders do you need? Where should those folders be? What should your file names be? Looking at the tools you use: What documentation do you need about them to understand your project in a few years or for another person to take it up? Do you need to save/backup the software or scripts to include as a reference in a future project? Add annotations to your board to indicate this. Use the back of your worksheet to document the folder structure.