General good advice on data handling Peter Shaw. Introduction n We have spent the last 11 weeks engaged in picking up some technical details about various.

Slides:



Advertisements
Similar presentations
Quantitative and Scientific Reasoning Standard n Students must demonstrate the math skills needed to enter the working world right out of high school or.
Advertisements

Testing Relational Database
Chapter 2 The Process of Experimentation
Maintaining data quality: fundamental steps
Animal, Plant & Soil Science
W HAT YOU S HOULD B E DOING F OR Y OUR S CIENCE F AIR P ROJECT.
COMPUTER PROGRAMMING I Essential Standard 5.02 Understand Breakpoint, Watch Window, and Try And Catch to Find Errors.
Learning objectives You should be able to: –Identify the requirements for the Data Collection and Processing section of the Internal Assessment –Collect.
Before Heading to the Field…  Decide how you will record the data  Test out data sheets –Look for obvious errors –Have crew try them out on pilot plots.
Jenny Havens Ozark Christian College Learning Center
Decision Errors and Power
Programming Logic and Design, Third Edition Comprehensive
Nature of Science More science fair fun!. Writing Background Research Questions You should have at least 1 background question for each of the following.
System Design and Analysis
Spreadsheets in Finance and Forecasting Project Session 3a The Next Step: Planning Your Visits.
1 Psych 5500/6500 The t Test for a Single Group Mean (Part 5): Outliers Fall, 2008.
PPA 502 – Program Evaluation Lecture 5b – Collecting Data from Agency Records.
Chapter 1 Program Design
Validation and Verification Today will look at: The difference between accuracy and validity Explaining sources of errors and how they could be overcome.
Basic Concept of Data Coding Codes, Variables, and File Structures.
Multivariate Statistics for the Environmental Sciences Peter J. A. Shaw Chapter 1 Introduction.
Higher Biology Unit 1: Cell biology Unit 2: Genetics & Adaptations
Writing Articles. Articles take a considered view of events, including opinions and sometimes refer to related issues. Reports are more immediate and.
Validation and Verification
Coming up in Math 110: Today: Section 8.2 (Quadratic formula)
McGraw-Hill/Irwin © 2004 by The McGraw-Hill Companies, Inc. All rights reserved. Chapter 9 Processing the Data.
Introduction to Systems Analysis and Design Trisha Cummings.
Solutions Summit 2014 Discrepancy Processing & Resolution Terri Sullivan.
Testing. Definition From the dictionary- the means by which the presence, quality, or genuineness of anything is determined; a means of trial. For software.
Chapter 8: Systems analysis and design
How FACILITY CMIS and E-Portal are used within the organisation
A SIR web based leave/absence management system. By Dave Doulton University of Southampton.
Presented By: Tehmina Farrukh Topic: Long Report Writing
Term 2, 2011 Week 6. CONTENTS Validating data Formats and conventions – Text – Numerical information – Graphics Testing techniques – Completeness testing.
Observation & Analysis. Observation Field Research In the fields of social science, psychology and medicine, amongst others, observational study is an.
Personal Development Plan PDP. PDPs  A really straight forward way to start planning for your future success.  Also useful if you are working hard but.
How to start Milestone 1 CSSE 371 Project Info There are only 8 easy steps…
Collecting Data Types, coding, accuracy, file formats and the effect of data loss.
1 Archiving Michael J. Levin Harvard Center for Population and Development Studies
How to read a scientific paper
Systems Life Cycle. Know why it is necessary to evaluate a new system Understand the need to evaluate in terms of ease-of- use, appropriateness and efficiency.
Different approaches an analysis might use when investigating a system including: – Questionnaires – Interviews – Document gathering and analysis.
How to Read a Text book Or How to get the most out of a text book.
Managing and Curating Data Chapter 8. Introduction Data organization Data management Data curation Raw data is required to repeat a scientific study Any.
Because nobody gets it perfectly right.  When an experiment is done, it is pretty much certain that the answer is not going to be exactly correct. 
DATA ERRORS. Introduction The processing of incorrect data can produce ridiculous and embarrassing output. Errors can take time to sort out and can be.
Section 10.1 Confidence Intervals
Task Analysis Methods IST 331. March 16 th
The Software Development Process
Good Audit Documentation Mike Bishop. What can go wrong What can go wrong How we might fix it How we might fix it Brainstorm Brainstorm.
RESEARCH METHODS Lecture 29. DATA ANALYSIS Data Analysis Data processing and analysis is part of research design – decisions already made. During analysis.
Verification & Validation. Batch processing In a batch processing system, documents such as sales orders are collected into batches of typically 50 documents.
Scientific Debugging. Errors in Software Errors are unexpected behaviors or outputs in programs As long as software is developed by humans, it will contain.
TIMOTHY SERVINSKY PROJECT MANAGER CENTER FOR SURVEY RESEARCH Data Preparation: An Introduction to Getting Data Ready for Analysis.
Sampling Design and Analysis MTH 494 Ossam Chohan Assistant Professor CIIT Abbottabad.
The Research Paper Created by A. Smith, T. Giffen & G. AuCoin Prince Andrew High School, January 2008.
Usability Testing Instructions. Why is usability testing important? In a perfect world, we would always user test instructions before we set them loose.
Research Methodology & Design. Research: from theory to practice PhilosophyParadigm Theoretical approach Information collection approach Information collection.
HRP Copyright © Leland Stanford Junior University. All rights reserved. Warning: This presentation is protected by copyright law and.
Validation & Verification Today will look at: The difference between accuracy and validity Explaining sources of errors and how they could be overcome.
Laboratory Safety, Safety is the highest priority Read the “Safety in the laboratory” section of the lab manual before you go to the lab this week. Safe.
Individual observations need to be checked to see if they are: –outliers; or –influential observations Outliers are defined as observations that differ.
Welcome. Contents: 1.Organization’s Policies & Procedure 2.Internal Controls 3.Manager’s Financial Role 4.Procurement Process 5.Monthly Financial Report.
Data Screening. What is it? Data screening is very important to make sure you’ve met all your assumptions, outliers, and error problems. Each type of.
SAS ® 101 Based on Learning SAS by Example: A Programmer’s Guide Chapters 5 & 6 By Ravi Mandal.
Pick Ups & Job Management
Testing a Solution.
Data quality 1: Individual records
Introduction to Systems Analysis and Design
Biological Science Applications in Agriculture
Presentation transcript:

General good advice on data handling Peter Shaw

Introduction n We have spent the last 11 weeks engaged in picking up some technical details about various aspects of data handling and analysis. n This week I do not intend any new names or techniques (unless you specifically ask..), just to round off with a few unifying thoughts and snippets of good advice.

Project design n Get it right before you start!! n It is not hard to get a balanced design, though you may well have to make some sacrifices about the number of treatments / sites / replicates etc. n Check your design with staff - that’s what they’re paid for! n It can be impossible to fix a bad design: Rothamsted once had to throw away 50 years of meticulously collected data because the faulty experimental design made the data useless.

Data Collection n Keep a notebook, and write things down as you go along (dating each entry). n This is best done on the spot - by the time you get home you will have forgotten some important details. n Often you have to fall back on Operational Taxonomic Units (OTUs: Sp1, Sp2, small pink thin species, etc). Fine - this is more honest than trying to shoehorn an unfamiliar specimen into a known species. n Make sure that you keep such specimens carefully for ID, and that these Ids are recorded in the relevant lab/field notebook. Take it from me - trying to fathom out how to decode entries like “?blue-brown oddity: 2 specimens” after a year’s absence is playing Russian roulette with your datamatrix!

Once data are written down.. n You need to transcribe them into a PC. n This procedure is easy to skimp on, as you look forward to the analyses ahead! n GIGO - Garbage In, Garbage Out! If you allow errors to creep in at this stage, all subsequent analyses will be suspect if not downright invalidated. n Entering species data into spreadsheets is particularly tedious due to the predominance of zeroes

Metadata n These are data about data: information the set the actual measurements in context. n Some forms of metadata are essential for analysis and must be held within the datamatrix: date, depth, sample number, time, observer, plot number etc. n Others are immaterial for the analyses but crucial for write-ups and replicability: details of methods used, site location etc. The notebooks that hold these data are essential documents in archives. Metadata site, date, plot etc Raw site data pH, elevation etc Raw species data Log-trans data etc 6ish 4-10ish10-100

Debugging and verifying n Once data are in, go back and check every entry against the notebook. n I find it helpful to photocopy notebook pages, so I can cross out or highlight entries once validated against the data file. n Even then, don’t believe the data! Use boxplots to check for outliers. What are your units? Often you need to convert raw data into a derived format (densities per unit area, mg 100g -1 etc). Don’t change source data but create new variables, and ensure that each variable is unambiguously labelled.

Outliers 1 n These are datapoints which “clearly” lie outside the range of the rest of the dataset, and show up on boxplots or scattergraphs. n Always eyeball the data, and check outliers. Usually they result from a typing mistake and are easily remedied. n Sometimes they are clearly an error in the notebook - how you sort this out depends on your judgement, experience and intuition. If in doubt ask!

Outliers 2 n Then you get the awkward sort! The notebook is adamant and the entry looks plausible, but the datapoint looks odd. Now what? n It is legitimate to exclude such points from further analysis, although you should record this fact in your methods section. Be careful, as you may be removing the most interesting observation!

Multivariate techniques.. n Are especially sensitive to outliers: watch as one data point has its decimal place entered one place out:

Missing data n These are sadly common. You knocked the tube on the floor, you lost the sample… n Don’t put a zero (-1, etc) there! This is tantamount to saying that you actually measured this value. n SPSS has a specific solution to missing data - enter a “.” (full stop, decimal place etc). That data point will be excluded from analyses. n Check the options in each technique used to see how missing values are handled. They cause insurmountable difficultties for many analyses, and either the variable or the observation will have to be excluded.