U.S. Department of the Interior U.S. Geological Survey Tutorials on Data Management Lesson 2.1: Data Collection Entry and Manipulation CC image by Cobalt123.

Slides:



Advertisements
Similar presentations
Data Entry and Manipulation Lesson 4: Data Collection Entry and Manipulation CC image by Cobalt123 on Flickr.
Advertisements

Database Ed Milne. Theme An introduction to databases Using the Base component of LibreOffice LibreOffice.
Computer Concepts BASICS 4th Edition
C6 Databases.
Introduction to Databases
Chapter 5 Database Concepts. Why Study Databases? Databases have incredible value to businesses. Very important technology for supporting operations.
John Porter Why this presentation? The forms data take for analysis are often different than the forms data take for archival storage Spreadsheets are.
Managing Data Resources
Lecture Microsoft Access and Relational Database Basics.
1 Basic DB Terms Data: Meaningful facts, text, graphics, images, sound, video segments –A collection of individual responses from a marketing research.
File Systems and Databases
Database Management: Getting Data Together Chapter 14.
1 Introduction The Database Environment. 2 Web Links Google General Database Search Database News Access Forums Google Database Books O’Reilly Books Oracle.
3-1 Chapter 3 Data and Knowledge Management
Chapter 5 Database Concepts.
McGraw-Hill/Irwin Copyright © 2008, The McGraw-Hill Companies, Inc. All rights reserved.McGraw-Hill/Irwin Copyright © 2008 The McGraw-Hill Companies, Inc.
A Guide to SQL, Seventh Edition. Objectives Understand the concepts and terminology associated with relational databases Create and run SQL commands in.
BUSINESS DRIVEN TECHNOLOGY
Mgt 20600: IT Management & Applications Databases Tuesday April 4, 2006.
Chapter 1: The Database Environment
Chapter 4 Relational Databases Copyright © 2012 Pearson Education 4-1.
Basic Concept of Data Coding Codes, Variables, and File Structures.
XP New Perspectives on Microsoft Access 2002 Tutorial 71 Microsoft Access 2002 Tutorial 7 – Integrating Access With the Web and With Other Programs.
MS Access 2007 IT User Services - University of Delaware.
Microsoft Access Database software. What is a database? … a database is an organized collection of data. A collection of data of similar information compiled.
MS Access: Database Concepts Instructor: Vicki Weidler.
Systems Analysis – Analyzing Requirements.  Analyzing requirement stage identifies user information needs and new systems requirements  IS dev team.
Week 1 Lecture MSCD 600 Database Architecture Samuel ConnSamuel Conn, Asst. Professor Suggestions for using the Lecture Slides.
6-1 DATABASE FUNDAMENTALS Information is everywhere in an organization Information is stored in databases –Database – maintains information about various.
ASP.NET Programming with C# and SQL Server First Edition
Concepts of Database Management, Fifth Edition Chapter 1: Introduction to Database Management.
MIS 385/MBA 664 Systems Implementation with DBMS/ Database Management Dave Salisbury ( )
SPSS Presented by Chabalala Chabalala Lebohang Kompi Balone Ndaba.
Concepts of Database Management, Fifth Edition Chapter 1: Introduction to Database Management.
Introduction to SAS. What is SAS? SAS originally stood for “Statistical Analysis System”. SAS is a computer software system that provides all the tools.
Lecturer: Gareth Jones. How does a relational database organise data? What are the principles of a database management system? What are the principal.
Introduction to Computers Lesson 10B. home Database A collection of related data or facts.
Introduction to Computers Lesson 10B. home Database A collection of related data or facts.
C6 Databases. 2 Traditional file environment Data Redundancy and Inconsistency: –Data redundancy: The presence of duplicate data in multiple data files.
5-1 McGraw-Hill/Irwin Copyright © 2007 by The McGraw-Hill Companies, Inc. All rights reserved.
Microsoft Access Designing and creating tables and populating data.
5 - 1 Copyright © 2006, The McGraw-Hill Companies, Inc. All rights reserved.
McGraw-Hill/Irwin © 2008 The McGraw-Hill Companies, All Rights Reserved Chapter 7 Storing Organizational Information - Databases.
Chapter 1 1 Lecture # 1 & 2 Chapter # 1 Databases and Database Users Muhammad Emran Database Systems.
6.1 © 2010 by Prentice Hall 6 Chapter Foundations of Business Intelligence: Databases and Information Management.
DATABASE MANAGEMENT SYSTEMS CMAM301. Introduction to database management systems  What is Database?  What is Database Systems?  Types of Database.
MANAGING DATA RESOURCES ~ pertemuan 7 ~ Oleh: Ir. Abdul Hayat, MTI.
1/62 Introduction to and Using MS Access Database Management and Analysis Yunho Song.
C OMPUTING E SSENTIALS Timothy J. O’Leary Linda I. O’Leary Presentations by: Fred Bounds.
IDigBio is funded by a grant from the National Science Foundation’s Advancing Digitization of Biodiversity Collections Program (Cooperative Agreement EF ).
Data Entry and Assembly. Data Acquisition  Best Practices for Creating Data  Data Entry Options  Data Manipulation Options  Gathering Existing Data.
Introduction to Information and Computer Science
Data Organization Quality Assurance and Transformations.
PREPARED BY: PN. SITI HADIJAH BINTI NORSANI. LEARNING OUTCOMES: Upon completion of this course, students should be able to: 1. Understand the structure.
1 Management Information Systems M Agung Ali Fikri, SE. MM.
Managing Data Resources File Organization and databases for business information systems.
Todd Quinn – Business & Economics Librarian
Microsoft Office Access 2010 Lab 1
Tutorials on Data Management
CHAPTER 2 Computer Software.
MANAGING DATA RESOURCES
File Systems and Databases
Chapter 5 Data Resource Management.
Data Model.
Tutorial 7 – Integrating Access With the Web and With Other Programs
The ultimate in data organization
MIS 385/MBA 664 Systems Implementation with DBMS/ Database Management
Data Entry & Manipulation
Presentation transcript:

U.S. Department of the Interior U.S. Geological Survey Tutorials on Data Management Lesson 2.1: Data Collection Entry and Manipulation CC image by Cobalt123 on Flickr

Data Entry and Manipulation Provided by DataONE Lesson Topics  Best Practices for Creating Data Files  Data Entry Options  Data Manipulation Options CC image by JISC on Flickr

Data Entry and Manipulation Provided by DataONE Learning Objectives  Recognize inconsistencies that can make a data set difficult to understand and/or manipulate  Describe characteristics of stable data formats and list reasons for using these formats  Identify data entry tools  Identify validation measures that can be performed as data was entered  Describe the basic components of a relational database

Data Entry and Manipulation Provided by DataONE The Data Lifecycle

Data Entry and Manipulation Provided by DataONE Goals of Data Entry  Create data sets that are:  Valid  Organized to support ease of use CC image by Travis S on Flickr

Data Entry and Manipulation Provided by DataONE Example: Poor Data Entry Inconsistency between data collection events –Location of Date information –Inconsistent Date format –Column names –Order of columns

Data Entry and Manipulation Provided by DataONE Example: Poor Data Entry Inconsistency between data collection events – Different site spellings, capitalization, spaces in site names—hard to filter – Codes used for site names for some data, but spelled out for others – Mean1 value is in Weight column – Text and numbers in same column – what is the mean of 12, “escaped < 15”, and 91?

Data Entry and Manipulation Provided by DataONE Best Practices  Columns of data are consistent: only numbers, dates, or text  Consistent Names, Codes, Formats (date) used in each column  Data are all in one table, which is much easier for a statistical program to work with than multiple small tables which each require human intervention

Data Entry and Manipulation Provided by DataONE Best Practices  Create descriptive column names without spaces or special characters  Soil T30 -> Soil_Temp_30cm  Species-Code -> Species_Code (avoid using -,+,*,^ in column names. Some software may interpret these symbols as an operator)  Use a descriptive file name. For instance, a file named SEV_SmallMammalData_v csv indicates the project the data is associated with (SEV), the theme of the data (SmallMammalData) and also when this version of the data were created (v ). This name is much more helpful than a file named mydata.xls.

Data Entry and Manipulation Provided by DataONE  Missing data  Preferably leave field empty (NULL = no value)  In numeric fields, use a distinct value such as 9999 to indicate a missing value  In text fields, use NA (“Not Applicable” or “Not Available”)  Use Data flags in a separate column to qualify missing value Best Practices DateTimeNO3_N_ConcNO3_N_Conc_Flag M E1 M1 = missing; no sample collected E1 = estimated from grab sample

Data Entry and Manipulation Provided by DataONE Best Practices  Enter complete lines of data Sorting an Excel file with empty cells is not a good idea!

Data Entry and Manipulation Provided by DataONE Best Practices  For the long term, store data in a consistent format that can be read well in to the future and that can be used by any application now or in the future  Appropriate file types include:  Non-proprietary: Open, documented standard  Common usage by research community: Standard representation (ASCII, Unicode)  Unencrypted  Uncompressed  ASCII formatted files will be readable into the future  Use ASCII (comma-separated) for tabular data

Data Entry and Manipulation Provided by DataONE Entity  Is a Noun or Object  What to keep information on:  people, places, events, concept  Symbolized by a square or rectangle  Types of Entities:  Core  aka Fundamental, Kernel, Standard  Attributive  aka Dependent, Characteristic  Associative  aka Junction table  Reference  aka Validation, Code or Type Provided by Tom Chatfield, BLM

Data Entry and Manipulation Provided by DataONE Attribute  Data or information about an entity; describes an entity  Adjective  Belongs to the entity Employee Employee First Name Employee Last Name Social Security Number Birth Date Gender Type Code Provided by Tom Chatfield, BLM

Data Entry and Manipulation Provided by DataONE Data Naming Conventions – Logical vs. Physical Names  Logical Name is the real name of the attribute  Physical Name is the name used in the software application  If possible, should match Provided by Tom Chatfield, BLM

Data Entry and Manipulation Provided by DataONE Naming Conventions Entity  A Noun or Noun Phrase  Singular  Examples:  Course  Student Provided by Tom Chatfield, BLM

Data Entry and Manipulation Provided by DataONE Attribute Names  An Attribute name should be composed of:  OBJECT CLASS TERM(ENTITY WORD)  PROPERTY TERM  QUALIFYING TERM (optional)  REPRESENTATIVE TERM aka CLASS Provided by Tom Chatfield, BLM

Data Entry and Manipulation Provided by DataONE Definitions  Definition is:  Statement of the meaning of a word or phrase  The act of making clear or distinct  Entity – Defines the Entity  Concise  Unambiguous  Attribute - Defines the attribute  NOT the entity that the attribute contains information about  NOT the uses of the data (where, when, how, or by whom)  NOT the codes and values the codes represent Provided by Tom Chatfield, BLM

Data Entry and Manipulation Provided by DataONE References 1. Best Practices for Preparing Environmental Data Sets to Share and Archive. September Les A. Hook, Suresh K. Santhana Vannan, Tammy W. Beaty, Robert B. Cook, and Bruce E. Wilson.

Data Entry and Manipulation Provided by DataONE Data Entry Tools  Googledocs Forms  Spreadsheets "Any use of trade, product, or firm names is for descriptive purposes only and does not imply endorsement by the U.S. Government.”

Data Entry and Manipulation Provided by DataONE Googledocs Forms "Any use of trade, product, or firm names is for descriptive purposes only and does not imply endorsement by the U.S. Government.”

Data Entry and Manipulation Provided by DataONE "Any use of trade, product, or firm names is for descriptive purposes only and does not imply endorsement by the U.S. Government.”

Data Entry and Manipulation Provided by DataONE Data Entry Tools: Excel "Any use of trade, product, or firm names is for descriptive purposes only and does not imply endorsement by the U.S. Government.”

Data Entry and Manipulation Provided by DataONE Excel: Data Validation 20 "Any use of trade, product, or firm names is for descriptive purposes only and does not imply endorsement by the U.S. Government.”

Data Entry and Manipulation Provided by DataONE Spreadsheet vs. Relational Database  Great for charts, graphs, calculations  Flexible about cell content type—cells in same column can contain numbers or text  Lack record integrity--can sort a column independently of all others)  Easy to use – but harder to maintain as complexity and size of data grows  Easy to query to select portions of data  Data fields are typed – For example, only integers are allowed in integer fields  Columns cannot be sorted independently of each other  Steeper learning curve than a spreadsheet

Data Entry and Manipulation Provided by DataONE What is a relational database? A set of tables Relationships A command language *siteID site_name latitude longitude description *siteID site_name latitude longitude description Sample sites *speciesID species_name common_name family order *speciesID species_name common_name family order Species s s *sampleID siteID sample_date speciesID height flowering flag comments *sampleID siteID sample_date speciesID height flowering flag comments Samples

Data Entry and Manipulation Provided by DataONE Database Features: Explicit control over data types DateSiteHeightFlowering Advantages quality control performance Advantages quality control performance

Data Entry and Manipulation Provided by DataONE Relationships are defined between tables DateSiteSpeciesFlowering? 2/13/2010ABOGR2y 2/13/2010BHODRy 4/15/2010BBOER4y 4/15/2010CPLJAn SiteLatitudeLongitude A B C DateSiteSpeciesFlowering?LatitudeLongitude 2/13/2010ABOGR2y /13/2010BHODRy /15/2010BBOER4y /15/2010CPLJAn Mix and Match data on the fly

Data Entry and Manipulation Provided by DataONE Powerful Command Language called Structured Query Language (SQL) DatePlotTreatmentSensorDepthSoil_Temperature CR BC CR AN015.1 SQL examples: Select Date, Plot, Treatment, SensorDepth, Soil_Temperature from SoilTemp where Date = ‘ ’ DatePlotTreatmentSensorDepthSoil_Temperature CR BC DatePlotTreatmentSensorDepthSoil_Temperature AN015.1 This table is called SoilTemp Select * from SoilTemp where Treatment=‘N’ and SensorDepth=‘0’

Data Entry and Manipulation Provided by DataONE Data Entry with a Database  Forms can be created that make entering data in to a relational database as easy as entering it in to Excel. The screenshot below shows embedded forms that were quickly generated in MS Access for adding data to three tables in a database of plant cover measurements "Any use of trade, product, or firm names is for descriptive purposes only and does not imply endorsement by the U.S. Government.”

Data Entry and Manipulation Provided by DataONE Conclusion  Be aware of Best Practices when designing data file structures  Choose a data entry method that allows some validation of data as it is entered  Consider investing time in learning how to use a database if data sets are large or complex CC image by fo.ol on Flickr

Data Entry and Manipulation Provided by DataONE If you want to try a database:  Consider trying one of these:  Personal, single-user databases can be developed in MS Access, which is stored as a file on the user’s computer. MS Access comes with easy GUI tools to create databases, run queries, and write reports.  A more robust database that is free, accommodates multiple users and will run on Windows or Linux is MySQL. GUI interfaces for MySQL include phpMyadmin (free) and Navicat (inexpensive).

Data Entry and Manipulation Provided by DataONE To learn more about designing a relational database:  Database Design for Mere Mortals: A Hands- On Guide to Relational Database Design (2nd Edition) by Michael J. Hernandez. Addison- Wesley

Data Entry and Manipulation Provided by DataONE Data Manipulation  Useful for analyzing, subsetting and transforming data  Can be used to quality assure data  Options include SAS, SPSS, R, and Matlab  Not Free  SAS: Has outstanding support  SPSS: Has a user-friendly GUI  Matlab: Analysis and Visualization platform that has “toolboxes” available for different disciplines, such as modeling or genomic analyses

Data Entry and Manipulation Provided by DataONE R  Free (  Produces publication quality graphics  Lots of forums from which to get help  Software (such as Kepler for developing workflows) will integrate analytical components written in R