Managing and Curating Data Chapter 8. Introduction Data organization Data management Data curation Raw data is required to repeat a scientific study Any.

Slides:



Advertisements
Similar presentations
Enter the data range in the Input range box
Advertisements

Obtaining Summary Statistics and Plots by Treatment Groups in EXCEL.
Descriptive Measures MARE 250 Dr. Jason Turner.
Computer Programming (TKK-2144) 13/14 Semester 1 Instructor: Rama Oktavian Office Hr.: T.12-14, Th
1 QUANTITATIVE DESIGN AND ANALYSIS MARK 2048 Instructor: Armand Gervais
General good advice on data handling Peter Shaw. Introduction n We have spent the last 11 weeks engaged in picking up some technical details about various.
Chapter 1 Data Presentation Statistics and Data Measurement Levels Summarizing Data Symmetry and Skewness.
WINKS SDA Statistical Data Analysis (Windows Kwikstat) Getting Started Guide.
McGraw-Hill/Irwin McGraw-Hill/Irwin Copyright © 2009 by The McGraw-Hill Companies, Inc. All rights reserved.
A Simple Guide to Using SPSS© for Windows
Spreadsheet design an overview of further issues Research Methods Group Wim Buysse – ICRAF-ILRI Research Methods Group October 2004.
Engineering Probability and Statistics - SE-205 -Chap 1 By S. O. Duffuaa.
For males and females in varying countries. Where are these places? World Map - Political - Physical.
Quantifying Data.
CHAPTER 1: Picturing Distributions with Graphs
FEBRUARY, 2013 BY: ABDUL-RAUF A TRAINING WORKSHOP ON STATISTICAL AND PRESENTATIONAL SYSTEM SOFTWARE (SPSS) 18.0 WINDOWS.
Exploratory Data Analysis. Height and Weight 1.Data checking, identifying problems and characteristics Data exploration and Statistical analysis.
L Berkley Davis Copyright 2009 MER301: Engineering Reliability1 LECTURE 2: Chapter 1: Role of Statistics in Engineering Chapter 2: Data Summary and Presentation.
© The Catholic University of America Dept of Biomedical Engineering ENGR 104: Lecture 2 Statistical Analysis Using Matlab Lecturers: Dr. Binh Tran.
Range, Variance, and Standard Deviation in SPSS. Get the Frequency first! Step 1. Frequency Distribution  After reviewing the data  Start with the “Analyze”
© Copyright McGraw-Hill CHAPTER 3 Data Description.
10a. Univariate Analysis Part 1 CSCI N207 Data Analysis Using Spreadsheet Lingma Acheson Department of Computer and Information Science,
Module 6. Data Management Plans  Definitions ◦ Quality assurance ◦ Quality control ◦ Data contamination ◦ Error Types ◦ Error Handling  QA/QC best practices.
Descriptive Statistics becoming familiar with the data.
What is SPSS  SPSS is a program software used for statistical analysis.  Statistical Package for Social Sciences.
Chapter 2: Descriptive Statistics Adding MegaStat in Microsoft Excel Measures of Central Tendency Mode: The most.
T T03-01 Calculate Descriptive Statistics Purpose Allows the analyst to analyze quantitative data by summarizing it in sorted format, scattergram.
Introduction to Quantitative Research Analysis and SPSS SW242 – Session 6 Slides.
A Simple Guide to Using SPSS ( Statistical Package for the Social Sciences) for Windows.
Analyses using SPSS version 19
Chapter 8 Making Sense of Data in Six Sigma and Lean
REGRESSION DIAGNOSTICS Fall 2013 Dec 12/13. WHY REGRESSION DIAGNOSTICS? The validity of a regression model is based on a set of assumptions. Violation.
Engineering Statistics KANCHALA SUDTACHAT. Statistics  Deals with  Collection  Presentation  Analysis and use of data to make decision  Solve problems.
Multivariate Data Analysis Chapter 2 – Examining Your Data
4.4 OUTLIERS AND DOT PLOTS. WHAT IS AN OUTLIER? Sometimes, distributions are characterized by extreme values that differ greatly from the other observations.
Created by: Tonya Jagoe. Measures of Central Tendency mean median mode.
Mr. Magdi Morsi Statistician Department of Research and Studies, MOH
The field of statistics deals with the collection,
Lesson 25 Finding measures of central tendency and dispersion.
Engineering Probability and Statistics - SE-205 -Chap 1 By S. O. Duffuaa.
Chapter 1 Introduction to Statistics. Section 1.1 Fundamental Statistical Concepts.
1 PEER Session 02/04/15. 2  Multiple good data management software options exist – quantitative (e.g., SPSS), qualitative (e.g, atlas.ti), mixed (e.g.,
Elementary Analysis Richard LeGates URBS 492. Univariate Analysis Distributions –SPSS Command Statistics | Summarize | Frequencies Presents label, total.
Sumukh Deshpande n Lecturer College of Applied Medical Sciences Statistics = Skills for life. BIOSTATISTICS (BST 211) Lecture 2.
Chapter 0: Why Study Statistics? Chapter 1: An Introduction to Statistics and Statistical Inference 1
24 Nov 2007Data Management and Exploratory Data Analysis 1 Exploratory Data Analysis Exploratory Data Analysis (EDA) is an Approach that Employs a Variety.
The Data Collection and Statistical Analysis in IB Biology John Gasparini The Munich International School Part II – Basic Stats, Standard Deviation and.
AP Statistics Review Day 1 Chapters 1-4. AP Exam Exploring Data accounts for 20%-30% of the material covered on the AP Exam. “Exploratory analysis of.
Box and Whisker Plot Chapter 3.5. Box and Whisker Plot A Box-and-Whisker Plot or Box plot is a visual device that uses a 5-number summary to reveal the.
Created by: Tonya Jagoe. Measures of Central Tendency & Spread Input the data for these test scores into your calculator to find.
Criminal Justice and Criminology Research Methods, Second Edition Kraska / Neuman © 2012 by Pearson Higher Education, Inc Upper Saddle River, New Jersey.
AP Statistics. Chapter 1 Think – Where are you going, and why? Show – Calculate and display. Tell – What have you learned? Without this step, you’re never.
Statistics Descriptive Statistics. Statistics Introduction Descriptive Statistics Collections, organizations, summary and presentation of data Inferential.
EMPA Statistical Analysis
STATISTICS FOR SCIENCE RESEARCH
Probability and Statistics for Engineers
CHAPTER 3 Data Description 9/17/2018 Kasturiarachi.
Description of Data (Summary and Variability measures)
Probability and Statistics for Engineers
Chapter 3 Section 4 Measures of Position.
Probability and Statistics for Engineers
Probability and Statistics for Engineers
Probability and Statistics for Engineers
Mean As A Balancing Point
Lecture 1: Descriptive Statistics and Exploratory
Probability and Statistics for Engineers
Probability and Statistics for Engineers
DESIGN OF EXPERIMENT (DOE)
Probability and Statistics for Engineers
Presentation transcript:

Managing and Curating Data Chapter 8

Introduction Data organization Data management Data curation Raw data is required to repeat a scientific study Any data supported by public funds is legally required to be available for other scientists and the public

Step 1: Managing Raw Data Various sources of data –Data loggers –Handwritten notes This data must be transferred to an organized format, checked and analyzed

Spreadsheets Row: single observation Column: single measured or observed variable Enter data ASAP! –Detect mistakes –Memory (doesn’t last long) –2 copies –Timely analysis Proofread the data Check it NumberBiomass Carrots Peppers Broccoli Garden Yield

Metadata: Data about data “Must have” metadata: –Name and contact info of collector –Location of data collection –Name of study –Source of funding –Description of the organization of the data file Methods used to collect Types of experimental units Description of abbreviations Explicit description of data in columns and rows May be created before in some cases Very important to assemble because it’s easily forgotten

Step 3: Checking the Data Outliers: values of measurements or observations that are outside the range of the bulk of the data Values beyond the upper or lower deciles (the 90% or the 10%) Outliers increase the variance in data and increase the chance of a Type II error

How to deal with outliers Do not delete them; this could be considered fraud Only delete if an error or the data no longer are valid Think about them –Interesting hypotheses –A large body of science is devoted to outliers –What type of distribution does your data have?

Errors and Missing Data Errors are often outliers and can be identified Sources: Mistyping (decimal points), instrument, field entry Checking data can reduce errors Never leave blank cells in spreadsheets; enter a zero or NA (not available)

Detecting Outliers and Errors Three techniques –Calculating column statistics –Checking ranges and precision of column values –Graphical exploratory data analysis

Detecting Outliers and Errors cont. Column stats: –Mean, median, standard deviation, variance Logical functions to check your columns Range checking your data Carrot Id # lengthBiomass Mean Median1812 St Dev Variance Min105 Max26118

Graphical Exploratory Data Analysis Box plots (univariate) Stem-and-leaf plots (univariate) Scatterplots (bivariate or multivariate)

Stem-and-leaf plots Example: Vegetable biomass: 7,15, 35,36,37,23,27,21,42, ,3,7 3 5,6,

Scatter plots Use to see how traits relate to one another

Creating an Audit Trail Examining data for outliers and errors is a QA/QC for research Document how you perform QA/QC in your metadata Your audit trail allows others to reanalyze and recreate your results May be required for legal documentation