IDigBio is funded by a grant from the National Science Foundation’s Advancing Digitization of Biodiversity Collections Program (Cooperative Agreement EF-1115210).

Slides:



Advertisements
Similar presentations
Data Entry and Manipulation Lesson 4: Data Collection Entry and Manipulation CC image by Cobalt123 on Flickr.
Advertisements

Sunday Business Systems Using Access More Efficiently Tips and tricks to make things easy.
Unit 27 Spreadsheet Modelling
Chapter 5 Database Concepts. Why Study Databases? Databases have incredible value to business. Probably the most important technology for supporting operations.
Chapter 5 Database Concepts. Why Study Databases? Databases have incredible value to businesses. Very important technology for supporting operations.
John Porter Why this presentation? The forms data take for analysis are often different than the forms data take for archival storage Spreadsheets are.
Chapter 5 Database Concepts.
Spreadsheet in excel o Spreadsheet in excel o Uses of spreadsheet o Advantages Prepared by: Yusra Waseem 8 th C.
Basic Concept of Data Coding Codes, Variables, and File Structures.
Database Design IST 7-10 Presented by Miss Egan and Miss Richards.
MS Access 2007 IT User Services - University of Delaware.
Introduction to Data Management and Relational Databases.
U.S. Department of the Interior U.S. Geological Survey Tutorials on Data Management Lesson 2.1: Data Collection Entry and Manipulation CC image by Cobalt123.
Microsoft Access Database software. What is a database? … a database is an organized collection of data. A collection of data of similar information compiled.
MS Access: Database Concepts Instructor: Vicki Weidler.
IDigBio is funded by a grant from the National Science Foundation’s Advancing Digitization of Biodiversity Collections Program (Cooperative Agreement EF ).
 A databases is a collection of data organized to make it easy to search and easy to retrieve in a useful, usable form.
Introduction to SPSS Edward A. Greenberg, PhD
Best Practices for Preparing Data Sets Non-CO2 Synthesis Workshop Boulder, Colorado October 2008 Compiled by: A. Dayalu, Harvard University Adapted.
© Paradigm Publishing, Inc Access 2010 Level 2 Unit 1Advanced Tables, Relationships, Queries, and Forms Chapter 1Designing the Structure of Tables.
Administrative Software Chapter 7 Teaching and Learning with Technology.
CC&E Best Data Management Practices, April 19, 2015 Please take the Workshop Survey 1.
IDigBio is funded by a grant from the National Science Foundation’s Advancing Digitization of Biodiversity Collections Program (Cooperative Agreement EF ).
Open Your Mind to Open Source MPDO’s & EOPR’s Centre for IT & eGovernance AMR-APARD Hyderabad Welcome!
With Microsoft Office 2007 Intermediate© 2008 Pearson Prentice Hall1 PowerPoint Presentation to Accompany GO! with Microsoft ® Office 2007 Intermediate.
DATABASES Southern Region CEO Wednesday 13 th October 2010.
Why use a Database B8 B8 1.
Presented By: Gail Rose-Innes Camps Bay High School ICT & CAT Department Microsoft Access 2010.
IDigBio is funded by a grant from the National Science Foundation’s Advancing Digitization of Biodiversity Collections Program (Cooperative Agreement EF ).
WHAT IS A DATABASE? A DATABASE IS A COLLECTION OF DATA RELATED TO A PARTICULAR TOPIC OR PURPOSE OR TO PUT IT SIMPLY A GENERAL PURPOSE CONTAINER FOR STORING.
Database Management Systems.  Database management system (DBMS)  Store large collections of data  Organize the data  Becomes a data storage system.
DATABASE MANAGEMENT SYSTEMS CMAM301. Introduction to database management systems  What is Database?  What is Database Systems?  Types of Database.
Copyright 2006 Prentice-Hall, Inc. Essentials of Systems Analysis and Design Third Edition Joseph S. Valacich Joey F. George Jeffrey A. Hoffer Chapter.
Microsoft Office XP Illustrated Introductory, Enhanced Tables and Queries Using.
1/62 Introduction to and Using MS Access Database Management and Analysis Yunho Song.
A Simple Guide to Using SPSS ( Statistical Package for the Social Sciences) for Windows.
SQL Fundamentals  SQL: Structured Query Language is a simple and powerful language used to create, access, and manipulate data and structure in the database.
Managing Your Data: Assign Descriptive File Names Robert Cook Oak Ridge National Laboratory Section: Local Data Management Version 1.0 October 2012.
Microsoft Access.  What is Data ?  Data vs. Information  Database History.  What is a Database?  Examples for Small and Large Databases.  Types.
IDigBio is funded by a grant from the National Science Foundation’s Advancing Digitization of Biodiversity Collections Program (Cooperative Agreement EF ).
Databases and Speadsheets
Building Dashboards SharePoint and Business Intelligence.
Microsoft Access is a database program to manage sort retrieve group filter for certain records.
DBMS Using Access Note: If using software other that Access, consult manufacturer’s manual.
IDigBio is funded by a grant from the National Science Foundation’s Advancing Digitization of Biodiversity Collections Program (Cooperative Agreement EF ).
Data Entry and Assembly. Data Acquisition  Best Practices for Creating Data  Data Entry Options  Data Manipulation Options  Gathering Existing Data.
How Not to Lose Track of Your Research Organization and Planning Resources at Brandeis Melanie Radik and Raphael Fennimore Library & Technology Services.
Introduction to Information and Computer Science
Lesson 13 Databases Unit 2—Using the Computer. Computer Concepts BASICS - 22 Objectives Define the purpose and function of database software. Identify.
Chapter 10 Designing Databases. Objectives:  Define key database design terms.  Explain the role of database design in the IS development process. 
Data Organization Quality Assurance and Transformations.
1 PEER Session 02/04/15. 2  Multiple good data management software options exist – quantitative (e.g., SPSS), qualitative (e.g, atlas.ti), mixed (e.g.,
Managing Your Data: Assign Descriptive File Names Robert Cook Oak Ridge National Laboratory Version 1.0 Review Date.
Data Management Practices for Early Career Scientists: Closing Robert Cook Environmental Sciences Division Oak Ridge National Laboratory Oak Ridge, TN.
PREPARED BY: PN. SITI HADIJAH BINTI NORSANI. LEARNING OUTCOMES: Upon completion of this course, students should be able to: 1. Understand the structure.
 A collection of data organized in a way that allows:  Easy access  Easy retrieval of specific data  Easy use of the data.
Connecting to External Data. Financial data can be obtained from a number of different data sources.
Database Concepts and Applications in HRIS
Database (Microsoft Access). Database A database is an organized collection of related data about a specific topic or purpose. Examples of databases include:
Todd Quinn – Business & Economics Librarian
Tutorials on Data Management
Data Management: The Data Repatriation Re-integration Step or …
Microsoft Excel 2007 – Level 2
Agenda About Excel/Calc Spreadsheets Key Features
Manipulating and Sharing Data in a Database
Using Access More Efficiently
Spreadsheets, Modelling & Databases
Introduction to Access
This material is based upon work supported by the National Science Foundation under Grant #XXXXXX. Any opinions, findings, and conclusions or recommendations.
Presentation transcript:

iDigBio is funded by a grant from the National Science Foundation’s Advancing Digitization of Biodiversity Collections Program (Cooperative Agreement EF ). Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation. Best Practices for Data Entry and Data Manipulation Amber Budden, DataONE Edward Gilbert, Symbiota September 15 – 17, 2015 Arizona State University (ASU) Biodiversity Knowledge Integration Center (BioKIC) Tri-Trophic TCN, DataONE, ASU, iDigBio, Symbiota, and NCEAS Brought to you by: Tri-Trophic TCN, DataONE, ASU, iDigBio, Symbiota, and NCEAS

2 Topics Best Practices for Creating Data Files Data Entry Options Data Manipulation Options CC image by JISC on Flickr

3 Desired outcomes Recognize inconsistencies that can make a dataset difficult to understand and/or manipulate Describe characteristics of stable data formats and list reasons for using these formats Identify data entry tools Identify validation measures that can be performed as data is entered Describe the basic components of a relational database

4 Goals of Data Entry Create data sets that are: o Valid o Organized to support ease of use CC image by Travis S on Flickr

5 Example: Poor Data Entry Inconsistency between data collection events – Location of Date information – Inconsistent Date format – Column names – Order of columns

6 Software used in addition to Excel

7 Frequency and Purpose of Excel use

8 Practices when using Excel

9 Example: Poor Data Entry Inconsistency between data collection events – Different site spellings, capitalization, spaces in site names—hard to filter – Codes used for site names for some data, but spelled out for others – Mean1 value is in Weight column – Text and numbers in same column – what is the mean of 12, “escaped < 15”, and 91?

10 Best Practices Columns of data are consistent: only numbers, dates, or text Consistent Names, Codes, Formats (date) used in each column Data are all in one table, which is much easier for a statistical program to work with than multiple small tables which each require human intervention

11 Best Practices Create descriptive column names without spaces or special characters o Soil T30  Soil_Temp_30cm o Species-Code  Species_Code (avoid using -,+,*,^ in column names. Some software may interpret these symbols as an operator) Use a descriptive file name. For instance, a file named SEV_SmallMammalData_v csv indicates the project the data is associated with (SEV), the theme of the data (SmallMammalData) and also when this version of the data was created (v ). This name is much more helpful than a file named mydata.xls.

12 Best Practices Missing data o Preferably leave field empty (NULL = no value) o In numeric fields, use a distinct value such as 9999 to indicate a missing value o In text fields, use NA (“Not Applicable” or “Not Available”) o Use Data flags in a separate column to qualify missing value DateTimeNO3_N_ConcNO3_N_Conc_Flag M E1 M1 = missing; no sample collected E1 = estimated from grab sample

13 Best Practices Enter complete lines of data Sorting an Excel file with empty cells is not a good idea!

14 Best Practices For the long term, store data in a consistent format that can be read well in to the future and that can be used by any application now or in the future Appropriate file types include: o Non-proprietary: Open, documented standard o Common usage by research community: Standard representation (Unicode aka UTF-8) o Unencrypted o Uncompressed (DwC-A zip files) CSV (comma seperated) for tabular data o txt

15 References 1.Best Practices for Preparing Environmental Data Sets to Share and Archive. September Les A. Hook, Suresh K. Santhana Vannan, Tammy W. Beaty, Robert B. Cook, and Bruce E. Wilson Wieczorek et al (2012) Darwin Core: An Evolving Community-Developed Biodiversity Data Standard PLoSONE, 7:1, e29715

16 Data Entry Tools Googledocs Forms Spreadsheets

17 Googledocs Forms

18

19 Data Entry Tools: Excel

20 Excel: Data Validation 20

21 Spreadsheet vs. Relational Database Great for charts, graphs, calculations Flexible about cell content type—cells in same column can contain numbers or text Lack record integrity-- can sort a column independently of all others) Easy to use – but harder to maintain as complexity and size of data grows Easy to query to select portions of data Data fields are typed – For example, only integers are allowed in integer fields Columns cannot be sorted independently of each other Steeper learning curve than a spreadsheet