John Porter Why this presentation? The forms data take for analysis are often different than the forms data take for archival storage Spreadsheets are.

Slides:



Advertisements
Similar presentations
Working with Statisticians At some point, a statistician is likely to be asked to analyze your data. This can lead to much unhappiness.
Advertisements

More on Cell and Range References. n A reference identifies a cell or a range of cells on a worksheet and tells Microsoft Excel where to look for the.
Microsoft ® Access ® 2010 Training Create Reports for a Database If a yellow security bar appears at the top of the screen in PowerPoint, click Enable.
SPSS 1: An Introduction to the Statistical Package SPSS Suzie Cro MRC Clinical Trials Unit.
Spreadsheet in excel o Spreadsheet in excel o Uses of spreadsheet o Advantages Prepared by: Yusra Waseem 8 th C.
Basic Concept of Data Coding Codes, Variables, and File Structures.
COMPREHENSIVE Excel Tutorial 8 Developing an Excel Application.
MS Access: Database Concepts Instructor: Vicki Weidler.
How to Fill Out the CARD Form (Course Assessment Reporting Data Form)
CHAPTER 14 Formatting a Workbook Part 1. Learning Objectives Format text, numbers, dates, and time Format cells and ranges CMPTR Chapter 14: Formatting.
Working with Worksheet
Coding for Excel Analysis Optional Exercise Map Your Hazards! Module, Unit 2 Map Your Hazards! Combining Natural Hazards with Societal Issues.
CO1552 – Web Application Development Lists, Special Characters, and Tables.
Spreadsheets and Microsoft Excel. Introduction n A spreadsheet (called a worksheet in Excel) is a two-dimensional array of cells containing data to be.
Data Organization Data Collection and Spreadsheets.
Introduction to SPSS Edward A. Greenberg, PhD
Lesson No:9 MS-Word Tools, Mail Merge and working with Tables CHBT-01 Basic Micro process & Computer Operation.
Miscellaneous Excel Combining Excel and Access. – Importing, exporting and linking Parsing and manipulating data. 1.
Ranjeet Department of Physics & Astrophysics University of Delhi Working with Origin.
Introduction to Microsoft Access 2003 Mr. A. Craig Dixon CIS 100: Introduction to Computers Spring 2006.
CHAPTER 13 Creating a Workbook Part 1. Learning Objectives Understand spreadsheets and Excel Enter data in cells Edit cell content Work with columns and.
To enhance learning, service, and research through an advanced information technology environment. Our Mission:To enhance learning, service,and research.
1 Performing Spreadsheet What-If Analysis Applications of Spreadsheets.
Excel. Spreadsheet Software  What Is a Spreadsheet, and How Does It Work? A spreadsheet program allows users to perform simple and complex sorting. It.
Return To Index Excel Microsoft Excel Basics Lesson 05 Creating & Saving A Simple Formula Force = Mass * AccelerationForce = Mass * Acceleration.
Math 3400 Computer Applications of Statistics Lecture 1 Introduction and SAS Overview.
Colleague, Excel & Word Best of Friends Presented by: Joan Kaun & Yvonne Nelson College of the Rockies.
Using Technology to ease the administration burden CAPS Recording Mark Sheets.
Information Processing Notes for beginning our Excel Unit.
WHAT IS A DATABASE? A DATABASE IS A COLLECTION OF DATA RELATED TO A PARTICULAR TOPIC OR PURPOSE OR TO PUT IT SIMPLY A GENERAL PURPOSE CONTAINER FOR STORING.
XP Chapter 2 Succeeding in Business with Microsoft Office Access 2003: A Problem-Solving Approach 1 Building The Database Chapter 2 “It is only the farmer.
Microsoft ® Office Excel 2003 Training Using XML in Excel SynAppSys Educational Services presents:
A Simple Guide to Using SPSS ( Statistical Package for the Social Sciences) for Windows.
C OMPUTING E SSENTIALS Timothy J. O’Leary Linda I. O’Leary Presentations by: Fred Bounds.
IDigBio is funded by a grant from the National Science Foundation’s Advancing Digitization of Biodiversity Collections Program (Cooperative Agreement EF ).
Databases and Speadsheets
1. 2 Word Processing Word Processing is writing words and sentences on the computer. It is easy to change or move text in a word document. People use.
WHAT IS EXCEL? Excel is a tool to organize, calculate, and display numerical data. Excel is kind of like a combination of WORD and a high-tech calculator.
Data Entry and Assembly. Data Acquisition  Best Practices for Creating Data  Data Entry Options  Data Manipulation Options  Gathering Existing Data.
Spreadsheets What is Excel?. Objectives 1. Identify the parts of the Excel Screen 2. Identify the functions of a spreadsheet 3. Identify how spreadsheets.
The Excel model for information processing The Excel model is a grid of cells in which items of information are stored and processed. Any information that.
Excel and Data Analysis. Excel can be a powerful tool for analysis Excel provides many tools for analyzing data –Filtering –Sorting –Formulas –Charts.
John Porter Sheng Shan Lu M. Gastil Gastil-Buhl With special thanks to Chau-Chin Lin and Chi-Wen Hsaio.
Using SPSS Next. An Introduction SPSS (the Statistical Package for the Social Sciences)
HRP Copyright © Leland Stanford Junior University. All rights reserved. Warning: This presentation is protected by copyright law and.
1 PEER Session 02/04/15. 2  Multiple good data management software options exist – quantitative (e.g., SPSS), qualitative (e.g, atlas.ti), mixed (e.g.,
 The term “spreadsheet” covers a wide variety of elements useful for quantitative analysis of all kinds. Essentially, a spreadsheet is a simple tool.
Groups of cells labeled with letters that go up and down (vertical)
Lesson 1: What is Excel. Microsoft Office Excel 2003  Excel is a powerful spreadsheet programs that allows users to organize data, complete calculations,
SOC 305, Southeastern Louisiana University Prof. Robert Martin.
How to graph your stock project performance vs. the 3 major indexes.
Copyright 2007, Paradigm Publishing Inc. EXCEL 2007 Chapter 8 BACKNEXTEND 8-1 LINKS TO OBJECTIVES Import data from Access, a Web site, or a CSV text file.
Excel Tutorial 8 Developing an Excel Application
Practical Office 2007 Chapter 10
MS-EXCEL SUMMARY.
TU170 Learning online and computing with confidence
Exporting & Formatting Budgets from NextGen o Excel
TRAINING OF FOCAL POINTS ON THE CountrySTAT/FENIX SYSTEM
TRAINING OF FOCAL POINTS on the CountrySTAT SYSTEM based on FENIX
Microsoft Excel 2007 – Level 2
Spreadsheets, Modelling & Databases
Amos Introduction In this tutorial, you will be briefly introduced to the student version of the SEM software known as Amos. You should download the current.
Computer Science 10 & ICT 9 EXCEL
Unit G: Using Complex Formulas, Functions, and Tables
Excel Tips & Tricks July 18, 2019.
Day 1: Getting Started with Microsoft Excel 2010
Presentation transcript:

John Porter

Why this presentation? The forms data take for analysis are often different than the forms data take for archival storage Spreadsheets are widely used for simple analyses But they have poor archival qualities Different versions over time are not compatible Formulas are hard to capture or display They allow (encourage) users to structure data in ways that are hard to use with other software Our goal with archived data is to store the data in ways that it can be used in automated ways, with minimal human intervention

Data that can be automated Below is a picture of a data spreadsheet that could NOT be easily automated….. Why not?

Ugly Data Problems Dates are not stored consistently Sometimes date is stored with a label (e.g., “Date:5/23/2005”) sometimes in its own cell (10/2/2005) Values are labeled inconsistently Sometimes “Conductivity Top” others “conductivity_top” For Salinity sometimes two cells are used for top and bottom, in others they are combined in one cell Data coding is inconsistent Sometimes YSI_Model_30, sometimes “YSI Model 30” Tide State is sometimes a text description, sometimes a number The order of values in the “mini-table” for a given sampling date are different “Meter Type” comes first in the 5/23 table and second in the 10/2 table

Ugly Data Additional problems Confusion between numbers and text For most software 39% or <30 are considered TEXT not numbers (what is the average of 349 and <30?) Different types of data are stored in the same columns Many software products require that a single column contain either TEXT or NUMBERS (but not both!) The spreadsheet loses interpretability if it is sorted Dates are related to a set of attributes only by their position in the file. Once sorted that relationship is lost.

Best Practices We’ve seen that a spreadsheet or word processor can create datasets that can only be interpreted by human intervention The “ugly spreadsheet” example would be hard to analyze even in a spreadsheet, except with lots case-by- case human decisions But what are some principles that characterize good archival data? Keep in mind that good data formats for data and sharing may not be the ones you prefer for viewing or analysis!

Best Practices for Archiving Many of these are taken from Cook et. al Best Practices for Preparing Ecological Data Sets to Share and Archive, Ecological Bulletin 2001.Cook et. al Data formats should be consistent over time Adding new data should not add new columns to a data table, only new rows Columns of data should be consistent. Each column should include only a single kind of data Data columns should include the same type of data Text or “string” data Integer numbers Floating point or real numbers

Best Practices for Archiving Lines or rows of data should be complete Designed to be machine readable, not human readable Don’t Ever Sort this!!!!!! Complete lines are OK to Sort

Best Practices for Archiving Data are easiest to archive if each column in a data table has a single line at the top labeling the columns with a descriptive name Column names should start with a letter and not include SPACES or Symbols (other than an underscore (e.g., My_Data) to take the place of a space) +,-,*,&,^ are often treated as operators, and so should not be used in column names, because doing so causes confusion Some software uses spaces to identify different columns BADGOODComment Species NameSpecies_nameNo spaces! Age-ClassAge_Class or AgeClassNo symbols! 30cm deep temperatureTemp_30_cm_deepStart with a Letter and put what is being measured first

Best Practices for Archiving Avoid storing data for separate sites or dates in separate worksheets within a single spreadsheet Each worksheet is a separate data table, and so needs to be documented and processed separately, greatly increasing the work needed Usually you can just add a column for “site” and combine all the data into a single (large) data table OK because computers are good and dealing with large, consistent blocks of data They don’t work as well with esoteric, segmented data

Special Issues What should be done about Missing Data? Often missing data values can be left blank, or a special value (e.g., 9999) inserted Sometimes there are other special issues: Example, in meteorological data, days that had some precipitation, but not enough to measure, were marked with a “T” for “trace” in data sheets Problem – mixes numbers (rain amounts) with a text string What is the average of 10, 5, T and 3? Solution(s) Substitute a small amount (e.g., ½ of measureable value) so T becomes if you can measure 0.01 cm of rain Leave the rain column blank or 0, but add an additional column that contains “T” for trace, “N” for none, or “M” for measured This is an example of using a “Data Flag” – a column that helps describe or qualify the data in another column

Summary of Best Practices Descriptive column labels with no spaces or symbols Stable columns each containing a single type of data Complete lines of data in each row Consistent codes are used for weather, tidestate etc. Missing values left blank or filled with special codes Use data flags to qualify or describe data when needed All the data is in a single data table

Transforming Data Because often the best way to enter or view the data are not the best for archiving, we need to do transformations Example – You have a set of permanent plots where you are tallying the ground cover of plants You have set up a spreadsheet with a row for each date/plot with columns for cover of each of the species

Good Columns each with one kind of data Good column labels Problems What happens if you encounter a new species in a plot? The data format/structure needs to change by adding a new column – something we’d like to avoid!

An Alternative Way Uses a “species” column to indicate the species Adding a new species is just a question of adding a new data code, rather than a new column keeps data structures the same) Optionally, you also don’t need data lines for species for which cover was zero. VS

Transformations Fortunately, these two formats are relatively easy to interconvert using Pivot Tables in Excel, or using external software (e.g., SPSS vars-to-cases)Pivot Tables

Other Tools There are other tools, such as Database Management Systems (DBMS) and statistical software (e.g., SAS, SPSS) that can also be used for managing data Unlike spreadsheets the software enforces many of the best practices that are the responsibility of the user in spreadsheets They have additional tools that help with quality control and quality assurance They are more trouble to set up initially than a spreadsheet, but in the long run can save time and trouble

Getting Help Feel free to contact me to discuss your data and how it might best be formatted for archiving Once the data is ready to be archived the next step is to prepare Metadata – but that is another topic!