Data Rescue! Data Management Series: Workshop 1 HUMANS RESEARCH DATA SERVICE.

Slides:



Advertisements
Similar presentations
Copyright © 2014 Pearson Education, Inc. Publishing as Prentice Hall
Advertisements

Documenting the Resource Malcolm Polfreman
Content and Systems Week 3. Today’s goals Obtaining, describing, indexing content –XML –Metadata Preparing for the installation of Dspace –Computers available.
Mark Evans, Tessella Digital Preservation Boot Camp – PASIG meeting, Washington DC, 22 nd May 2013 PREMIS Practical Strategies For Preservation Metadata.
John Porter Why this presentation? The forms data take for analysis are often different than the forms data take for archival storage Spreadsheets are.
Research Data Service at the IT Pro Forum HEIDI IMKER, DIRECTOR.
Database Software Application
WEB DESIGNING Prof. Jesse A. Role Ph. D TM UEAB 2010.
Section 2.1 Compare the Internet and the Web Identify Web browser components Compare Web sites and Web pages Describe types of Web sites Section 2.2 Identify.
XP New Perspectives on Introducing Microsoft Office XP Tutorial 1 1 Introducing Microsoft Office XP Tutorial 1.
1 Web Basics Section 1.1 Compare the Internet and the Web Compare Web sites and Web pages Identify Web browser components Describe types of Web sites Section.
Metadata Considerations Implementing Administrative and Descriptive Metadata for your digital images 1.
Copyright © 2008 Pearson Prentice Hall. All rights reserved Copyright © 2008 Prentice-Hall. All rights reserved. Committed to Shaping the Next.
Content and Computer Platforms Week 3. Today’s goals Obtaining, describing, indexing content –XML –Metadata Preparing for the installation of Dspace –Computers.
Access 2013 Microsoft Access 2013 is a database application that is ideal for gathering and understanding data that’s been collected on just about anything.
Chapter 17 Creating a Database.
Introduction to Omeka. What is Omeka? - An Open Source web publishing platform - Used by libraries, archives, museums, and scholars through a set of commonly.
Financial Data Warehouse Training (last updated: 11/4/2010) “Some day, on the corporate balance sheet, there will be an entry which reads "Information";
+ Information Systems and Databases 2.2 Organisation.
Content and Systems Week 3. Today’s goals Obtaining, describing, indexing content –XML –Metadata Preparing for the installation of Dspace –Computers available.
Metadata and Meta tag. What is metadata? What does metadata do? Metadata schemes What is meta tag? Meta tag example Table of Content.
2004/051 >> Supply Chain Solutions That Deliver Users.
Google Classroom Getting started with Google LMS..
Standards and the digital life cycle NOF Digitisation Workshops September 2000 Alice Grant Consulting Including additional notes and.
introductionwhyexamples What is a Web site? A web site is: a presentation tool; a way to communicate; a learning tool; a teaching tool; a marketing important.
1 Microsoft Office 2010 Basics and the Internet Microsoft Office 2010 Introductory Pasewark & Pasewark.
Global Rangelands Data Entry Guidelines March 23, 2015.
Database Design: Solving Problems Before they Start! Ed Pollack Database Administrator CommerceHub.
Data Workflow Data Management Workshop ELIZABETH WICKES, DATA CURATOR HEIDI IMKER, DIRECTOR RESEARCH DATA SERVICE.
Jacynthe Touchette, MSI JGH Health Sciences Library
ECDL ECDL is an important building block, equipping you with the digital skills needed to progress to further education and employment. ECDL teaches you.
Slides Template for Module 3 Contextual details needed to make data meaningful to others CC BY-NC.
Documenting and automating your work
INTRODUCTION TO DATABASES (MICROSOFT ACCESS)
Smart and Simple Data Management
Good Morning  Please be sure to take care of your belongings.
Microsoft Office 2010 Basics and the Internet
Microsoft Office 2010 Basics and the Internet
Open Exeter Project Team
Objective % Select and utilize tools to design and develop websites.
Project Management: Messages
Good Morning  Please be sure to take care of your belongings.
GO! with Microsoft Office 2016
Lecture 4 Web Design. Part 1.
Databases Chapter 16.
Practical Office 2007 Chapter 10
15 Most Important Beyond Basics for Me
Computing Fundamentals
The importance of being Connected
BASIC INFORMATION ABOUT DATABASE MANAGEMENT SOFTWARE
Good Morning  Please be sure to take care of your belongings.
Data Workflow Data Management Workshop
Good Morning  Please be sure to take care of your belongings.
Data Documentation Data Management Series: Workshop 2
GO! with Microsoft Access 2016
European Computer Driving Licence
Building A Web-based University Archive
Product Retrieval Statistics Canada / Statistique Canada Title page
Objective % Select and utilize tools to design and develop websites.
Recognition The following information was provided, in part, by the PGME office at Dalhousie University. We thank them for allowing us to share this.
Data Management: Documentation & Metadata
Attributes and Values Describing Entities.
Storage Basic recommendations:
Back to Table of Contents
Objective Understand web-based digital media production methods, software, and hardware. Course Weight : 10%
Welcome ! Excel 2013/2016 Data Consolidation (Lab Format)
Medusa at the University of Illinois
Lecture 5: Writing Page
Attributes and Values Describing Entities.
Presentation transcript:

Data Rescue! Data Management Series: Workshop 1 HUMANS RESEARCH DATA SERVICE

Introductions!

Knowledge around data policies, resources, archiving, & preservation Consultation for data management planning & implementation Workshops on data management, documentation, and data publishing Data Management Plan reviews and DOI minting services Solutions for public access to research data Centralized, private storage for active (“working”) data (with NCSA) Research Data Service (RDS) The Research Data Service provides the Illinois research community with expertise, tools, and infrastructure to manage and steward research data. visit: researchdataservice.illinois.edu or

Expertise Knowledge around data policies, tools, resources, archiving, and preservation Consultation and workshops for data management planning and implementation Tools Data Management Plan creation wizard (DMPTool.org) Tools for data citation (DOI minting) Infrastructure Illinois Data Bank (self-deposit institutional data repository) What do we do?

Workshop goals Know what you have where it lives how to access it Learn Organization strategies File naming pointers Types of documentation Practice Identifying and grouping your data Not a digitization workshop – but you may find some of this useful in thinking about digitization planning.

Everyone should have Handout 1 x Red post it to write things that were confusing or questions 1 x Blue post it to write things that were helpful to indicate that you’re done with an activity 1 blank pad of post it notes Pen or pencil

What data do you have? Research data… Research projectsAdministrativeStudentsRecordingsSpecimensClasses

Activity 1: Mini survey Please take a few minutes to complete a mini assessment survey. You will not be turning this survey into us, so feel free to be as honest as you’d like. Change the wording on the questions to better speak to your data or research objects as you like.

Activity 2: Data inventory You came here to rescue something, right? Or to get a fresh start for new projects. Take a moment to think of those things. Step 1 column: Identify the data/projects you work with. E.g. specimens, government databases, instrument data, images, etc. You may choose to answer this in terms of all the data your lab have, just the data you work with, or other files your projects depend on. Write the name of this data or project in the rows under the Step 1 column.

Activity 2: Data inventory What type of project is this data used for? Examples: class, grant, personal, student projects, etc. What type of data is this? e.g. file formats, content areas, etc. Where is the data located? e.g. file path, cabinet location, shelves, office, etc. How do you access this data? e.g. specific software, hardware, physical access, etc. Is the data backed up and how? e.g. cloud, external hard drive, flash drive, or not at all  Optional: Scratch these questions out and add something of your own if one of these doesn’t make sense. Step 2 Step 3

Activity 2: Data inventory Step 4: Inventory assessment Turn your sheet back over and look at your initial survey answers. Do you want to change anything? More data than you expected? More independent/dependent than you expected?

Before we go on… Before we move on to the last part of our activity… Let’s take a moment to discuss some essential elements of organizing data files or projects in a digital platform.

Consistent File and Folder Naming For quickly finding and sorting files and folders, the names should be consistent but unique. Avoid special characters. project name/acronym experiment/instrument type site location information (if applicable) researcher initials date (consistently formatted, i.e. YYYYMMDD) version number (w/ leading zeros) General theme: Scale ruins all informality – Think ahead.

Date Tip BAM Co-Exp Run txt BAM Co-Exp Run txt BAM Co-Exp Run txt Run 1 B anth meth Sept 4.txt BAM Rxn _09_04.txt _meth_3.txt vs.

Take the guess work out of choosing between: a preferred spelling behavior vs behaviour a scientific or popular term pig vs porcine vs Sus scrofa domesticus determining which synonym to use record vs entry determining which abbreviation to use (if you have to) USA vs US Controlled Vocabulary

Some examples Putting this all together, we can look at an example project.

Noble (2009)’s Bioinformatics project structure Noble WS (2009) A Quick Guide to Organizing Computational Biology Projects. PLoS Comput Biol 5(7): e doi: /journal.pcbi

An example data project JeopardyHTMLPlayers/ Player-1.html Player-2.html Player-3.html etc… playerDataFiles/ Player-1.csv Player-2.csv Player-3.csv playerdata.csv jeopardy_scrape.ipynb jeopardy_dataprep.ipynb jeopardy_analysis.rmd jeopardy_analysis.html readme.txt visualizations/ bystate.png byregion.png kenjennings.png etc… Code to produce these graphs is stored & documented in the rmd file. Storing one distinct entity type per file. The semantic link between the contents of these files is encoded in the file extensions, which are the unique entity IDs. Document the meaning of these ID numbers. Separate folders mean I don’t have to filter from a giant list of files. Make folders as large amounts of similar files are created, but not always required. Separate scripts by purpose to keep code from being cluttered.

Basic Documentation Types of Documentation: ◦Descriptive (e.g., creator, title, keywords) ◦Structural (e.g., relation to other files) ◦Administrative (e.g., software & hardware requirements, rights information) If you know the data will be deposited in a repository, understand the documentation requirements early in the process

Data Documentation Continuum Low-Barrier Fast Easy Irregular Incomplete Low-QualityHigh-Barrier Slow Skilled Standardized Rich High-Quality Informal ReadMe Formal Schema

Activity 3: Arranging your local data catalog Step 1: Write the name of each data/project from Activity 2 onto a post-it note. We’re now going to group these items in a few different ways. Use your worksheet to take notes on how effective each strategy is. Group 1: type of project Group 2: type of data Group 3: method of access Can you think of other combinations?

Activity 4: Workflow mapping Think of your normal analysis workflow or pick one that you commonly perform. Determine some of the core steps or actions that you take. Write each step on a post it note and place them in order. Start very general and add detail as needed. When one of these actions involves data: Draw a line in the middle of the post-it Write down where that data is located and backed up Draw a diagram of your workflow on your worksheet

Activity discussion/wrapup What did we learn from this? Which grouping made the most sense? How do these groups compare to how things are currently stored at your home institution? Are any better or worse? Homework: take 5 minutes to sketch out a new structure of organizing your data files. Is this possible to implement? Is it possible to maintain over time?