Experimental design and analysis Graphical Exploration of Data  Gerry Quinn & Mick Keough, 1998 Do not copy or distribute without permission of authors.

Slides:



Advertisements
Similar presentations
SADC Course in Statistics Modelling ideas in general – an appreciation (Session 20)
Advertisements

Chapter 3, Numerical Descriptive Measures
AP Statistics Course Review.
Describing Quantitative Variables
Transformations & Data Cleaning
Descriptive Measures MARE 250 Dr. Jason Turner.
Lecture 2 Summarizing the Sample. WARNING: Today’s lecture may bore some of you… It’s (sort of) not my fault…I’m required to teach you about what we’re.
Experimental design and analysis Multiple linear regression  Gerry Quinn & Mick Keough, 1998 Do not copy or distribute without permission of authors.
Statistics 100 Lecture Set 6. Re-cap Last day, looked at a variety of plots For categorical variables, most useful plots were bar charts and pie charts.
1 1 Slide © 2012 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole.
Class 16: Thursday, Nov. 4 Note: I will you some info on the final project this weekend and will discuss in class on Tuesday.
Statistics: Unlocking the Power of Data Lock 5 STAT 101 Dr. Kari Lock Morgan 9/6/12 Describing Data: One Variable SECTIONS 2.1, 2.2, 2.3, 2.4 One categorical.
OUTLIER, HETEROSKEDASTICITY,AND NORMALITY
Class 17: Tuesday, Nov. 9 Another example of interpreting multiple regression coefficients Steps in multiple regression analysis and example analysis Omitted.
Jan Shapes of distributions… “Statistics” for one quantitative variable… Mean and median Percentiles Standard deviations Transforming data… Rescale:
Lecture 19: Tues., Nov. 11th R-squared (8.6.1) Review
Lecture 24: Thurs. Dec. 4 Extra sum of squares F-tests (10.3) R-squared statistic (10.4.1) Residual plots (11.2) Influential observations (11.3,
Measures of Central Tendency
Lecture 24 Multiple Regression (Sections )
Lecture 24: Thurs., April 8th
Bivariate Statistics GTECH 201 Lecture 17. Overview of Today’s Topic Two-Sample Difference of Means Test Matched Pairs (Dependent Sample) Tests Chi-Square.
Regression Diagnostics Checking Assumptions and Data.
Lecture 19 Transformations, Predictions after Transformations Other diagnostic tools: Residual plot for nonconstant variance, histogram to check normality.
Class 11: Thurs., Oct. 14 Finish transformations Example Regression Analysis Next Tuesday: Review for Midterm (I will take questions and go over practice.
Stat 112: Lecture 16 Notes Finish Chapter 6: –Influential Points for Multiple Regression (Section 6.7) –Assessing the Independence Assumptions and Remedies.
Business Statistics - QBM117 Statistical inference for regression.
8/7/2015Slide 1 Simple linear regression is an appropriate model of the relationship between two quantitative variables provided: the data satisfies the.
STATS in Algebra 1 By Laura Worrall
Descriptive Statistics  Summarizing, Simplifying  Useful for comprehending data, and thus making meaningful interpretations, particularly in medium to.
Conditions of applications. Key concepts Testing conditions of applications in complex study design Residuals Tests of normality Residuals plots – Residuals.
Copyright, Gerry Quinn & Mick Keough, 1998 Please do not copy or distribute this file without the authors’ permission Experimental design and analysis.
© 2005 The McGraw-Hill Companies, Inc., All Rights Reserved. Chapter 12 Describing Data.
Numerical Descriptive Measures
Descriptive Statistics  Summarizing, Simplifying  Useful for comprehending data, and thus making meaningful interpretations, particularly in medium to.
STAT 250 Dr. Kari Lock Morgan
Census A survey to collect data on the entire population.   Data The facts and figures collected, analyzed, and summarized for presentation and.
Chapter 3 – Descriptive Statistics
1 1 Slide © 2009 Thomson South-Western. All Rights Reserved Slides by JOHN LOUCKS St. Edward’s University.
Choosing and using statistics to test ecological hypotheses
Copyright, Gerry Quinn & Mick Keough, 1998 Please do not copy or distribute this file without the authors’ permission Experimental Design & Analysis Presentation.
Quantitative Skills 1: Graphing
Applied Quantitative Analysis and Practices LECTURE#08 By Dr. Osman Sadiq Paracha.
Multiple regression models Experimental design and data analysis for biologists (Quinn & Keough, 2002) Environmental sampling and analysis.
Basics of Data Cleaning
Skewness & Kurtosis: Reference
Remedial measures … or “how to fix problems with the model” Transforming the data so that the simple linear regression model is okay for the transformed.
Regression Analysis Week 8 DIAGNOSTIC AND REMEDIAL MEASURES Residuals The main purpose examining residuals Diagnostic for Residuals Test involving residuals.
1 Review Sections Descriptive Statistics –Qualitative (Graphical) –Quantitative (Graphical) –Summation Notation –Qualitative (Numerical) Central.
Categorical vs. Quantitative…
Stat 112 Notes 16 Today: –Outliers and influential points in multiple regression (Chapter 6.7)
June 11, 2008Stat Lecture 10 - Review1 Midterm review Chapters 1-5 Statistics Lecture 10.
To be given to you next time: Short Project, What do students drive? AP Problems.
MMSI – SATURDAY SESSION with Mr. Flynn. Describing patterns and departures from patterns (20%–30% of exam) Exploratory analysis of data makes use of graphical.
September 18-19, 2006 – Denver, Colorado Sponsored by the U.S. Department of Housing and Urban Development Conducting and interpreting multivariate analyses.
Chapter 3, Part B Descriptive Statistics: Numerical Measures n Measures of Distribution Shape, Relative Location, and Detecting Outliers n Exploratory.
UNIT #1 CHAPTERS BY JEREMY GREEN, ADAM PAQUETTEY, AND MATT STAUB.
 Assumptions are an essential part of statistics and the process of building and testing models.  There are many different assumptions across the range.
Math 4030 – 7b Normality Issues (Sec. 5.12) Properties of Normal? Is the sample data from a normal population (normality)? Transformation to make it Normal?
Chapter 11 Displaying Distributions with Graphs. Turning Quantitative Data Into Information u Shape of the data u Center of the data u Spread of the data.
Biostatistics Regression and Correlation Methods Class #10 April 4, 2000.
1 Take a challenge with time; never let time idles away aimlessly.
AP Statistics Review Day 1 Chapters 1-4. AP Exam Exploring Data accounts for 20%-30% of the material covered on the AP Exam. “Exploratory analysis of.
AP Review Exploring Data. Describing a Distribution Discuss center, shape, and spread in context. Center: Mean or Median Shape: Roughly Symmetrical, Right.
BAE 5333 Applied Water Resources Statistics
Y - Tests Type Based on Response and Measure Variable Data
CHAPTER 29: Multiple Regression*
AP Exam Review Chapters 1-10
The greatest blessing in life is
Checking the data and assumptions before the final analysis.
Data Transformation, T-Tools and Alternatives
Presentation transcript:

Experimental design and analysis Graphical Exploration of Data  Gerry Quinn & Mick Keough, 1998 Do not copy or distribute without permission of authors.

Graphical displays Exploration –assumptions (normality, equal variances) –unusual values –which analysis? Analysis –model fitting Presentation/communication of results

Space shuttle data NASA meeting Jan 27th 1986 –day before launch of shuttle Challenger Concern about low air temperatures at launch Affect O-rings that seal joints of rocket motors Previous data studied

Joint temp. o F Number of incidents O-ring failure vs temperature Pre 1986

Joint temp. F o Number of incidents O-ring failure vs temperature

Checking assumptions - exploratory data analysis (EDA) Shape of sample (and therefore population) –is distribution normal (symmetrical) or skewed? Spread of sample –are variances similar in different groups? Are outliers present –observations very different from the rest of the sample?

Distributions of biological data Bell-shaped symmetrical distribution: normal y Pr(y) y Skewed asymmetrical distribution: log-normal poisson

Common skewed distributions Log-normal distribution:  proportional to  measurement data, e.g. length, weight etc. Poisson distribution:  =  2 count data, e.g. numbers of individuals

Exploring sample data

Example data set Quinn & Keough (in press) Surveys of 8 rocky shores along Point Nepean coast 10 sampling times ( ) 15 quadrats (0.25m 2 ) at each site Numbers of all gastropod species and % cover of macroalgae recorded from each quadrat

Frequency distributions NORMALLOG-NORMAL Value of variable (class) Number of observations Observations grouped into classes Value of variable (class)

Number of Cellana per quadrat Number of Cellana per quadrat Frequency Survey 5, all shores combined Total no. quadrats = 120

Dotplots Number of Cellana per quadrat Each observation represented by a dot Number of Cellana per quadrat, Cheviot Beach survey 5 No. quadrats = 15

Boxplot 25% of values } } } } " " " spread outlier hinge median * GROUP VARIABLE largest value smallest value

1. IDEAL2. SKEWED 4. UNEQUAL VARIANCES3. OUTLIERS * * * * *

SFPERRSPCPECBLBCPW Site Number of Cellana per quadrat Boxplots of Cellana numbers in survey 5

Scatterplots Plotting bivariate data Value of two variables recorded for each observation Each variable plotted on one axis (x or y) Symbols represent each observation Assess relationship between two variables

Cheviot Beach survey 5 n = % cover of Hormosira per quadrat Number of Cellana per quadrat

Scatterplot matrix Abbreviated to SPLOM Extension of scatterplot For plotting relationships between 3 or more variables on one plot Bivariate plots in multiple panels on SPLOM

SPLOM for Cheviot Beach survey 5 CELLANA - numbers of Cellana SIPHALL - numbers of Siphonaria HORMOS - % cover of Hormosira n = 15 quadrats

Transformations Improve normality. Remove relationship between mean and variance. Make variances more similar in different populations. Reduce influence of outliers. Make relationships between variables more linear (regression analysis).

Log transformation LognormalNormal y = log(y) Measurement data

Power transformation PoissonNormal y =  (y), i.e. y = y 0.5, y = y 0.25 Count data

Arcsin  transformation SquareNormal y = sin -1 (  (y)) Proportions and percentages

Outliers Observations very different from rest of sample - identified in boxplots. Check if mistakes (e.g. typos, broken measuring device) - if so, omit. Extreme values in skewed distribution - transform. Alternatively, do analysis twice - outliers in and outliers excluded. Worry if influential.

Assumptions not met? Check and deal with outliers Transformation –might fix non-normality and unequal variances Nonparametric rank test –does not assume normality –does assume similar variances –Mann-Whitney-Wilcoxon –only suitable for simple analyses

Category or line plot Mean number of Cellana per quadrat Survey Cheviot Beach Sorrento Mean number of Cellana per quadrat Survey