Sihua Peng, PhD Shanghai Ocean University

Slides:



Advertisements
Similar presentations
Randomized Complete Block and Repeated Measures (Each Subject Receives Each Treatment) Designs KNNL – Chapters 21,
Advertisements

SPH 247 Statistical Analysis of Laboratory Data 1 April 2, 2013 SPH 247 Statistical Analysis of Laboratory Data.
Multiple Comparisons in Factorial Experiments
Experiments with both nested and “crossed” or factorial factors
Design of Experiments and Analysis of Variance
AP Bell Ringer Sit in your regular number seat On as Sheet of Paper Define: Control Group Treatment Group Variable Independent Variable Dependent Variable.
5-1 Introduction 5-2 Inference on the Means of Two Populations, Variances Known Assumptions.
 For the IB Diploma Programme psychology course, the experimental method is defined as requiring: 1. The manipulation of one independent variable while.
QNT 531 Advanced Problems in Statistics and Research Methods
1 Chapter 13 Analysis of Variance. 2 Chapter Outline  An introduction to experimental design and analysis of variance  Analysis of Variance and the.
MANOVA Multivariate Analysis of Variance. One way Analysis of Variance (ANOVA) Comparing k Populations.
Chapter coverage Part A Part A –1: Practical tools –2: Consulting –3: Design Principles Part B (4-6) One-way ANOVA Part B (4-6) One-way ANOVA Part C (7-9)
ANALYSIS OF VARIANCE (ANOVA) BCT 2053 CHAPTER 5. CONTENT 5.1 Introduction to ANOVA 5.2 One-Way ANOVA 5.3 Two-Way ANOVA.
Chapter One Data Collection 1.2 Observational Studies; Simple Random Sampling.
1 Overview of Experimental Design. 2 3 Examples of Experimental Designs.
Single-Factor Studies KNNL – Chapter 16. Single-Factor Models Independent Variable can be qualitative or quantitative If Quantitative, we typically assume.
Chapter 12: Correlation and Linear Regression 1.
Beginning Statistics Table of Contents HAWKES LEARNING SYSTEMS math courseware specialists Copyright © 2008 by Hawkes Learning Systems/Quant Systems, Inc.
Statistics 3: mixed effect models Install R library lme4 to your computer: 1.R -> Packages -> Install packages 2.Choose mirror 3.Choose lme4 4.Open the.
Linear Regression Linear Regression. Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. Purpose Understand Linear Regression. Use R functions.
Statistics 300: Introduction to Probability and Statistics Section 1-4.
Stats Term Test 4 Solutions. c) d) An alternative solution is to use the probability mass function and.
ANOVA Overview of Major Designs. Between or Within Subjects Between-subjects (completely randomized) designs –Subjects are nested within treatment conditions.
1/54 Statistics Analysis of Variance. 2/54 Statistics in practice Introduction to Analysis of Variance Analysis of Variance: Testing for the Equality.
WELCOME TO BIOSTATISTICS! WELCOME TO BIOSTATISTICS! Course content.
Research Methods Systematic procedures for planning research, gathering and interpreting data, and reporting research findings.
AP Statistics Review Day 2 Chapter 5. AP Exam Producing Data accounts for 10%-15% of the material covered on the AP Exam. “Data must be collected according.
Chapter 12: Correlation and Linear Regression 1.
Part Two.
Why is Research Important?
Sihua Peng, PhD Shanghai Ocean University
Sihua Peng, PhD Shanghai Ocean University
EXPERIMENT DESIGN.
ANOVA Econ201 HSTS212.
Sihua Peng, PhD Shanghai Ocean University
Review This template can be used as a starter file for presenting training materials in a group setting. Sections Right-click on a slide to add sections.
Observational Study vs. Experimental Design
Chapter 8: Fundamental Sampling Distributions and Data Descriptions:
Applied Business Statistics, 7th ed. by Ken Black
Comparing Three or More Means
PCB 3043L - General Ecology Data Analysis.
Statistics Analysis of Variance.
Experimental Design Ch 12
Experimental Design.
CHAPTER 10 Comparing Two Populations or Groups
Module 02 Research Strategies.
Producing Data, Randomization, and Experimental Design
Producing Data, Randomization, and Experimental Design
Research Methods Part 2.
Sihua Peng, PhD Shanghai Ocean University
Unit 3- Investigative Biology Topic 2- Experimentation
Randomized Complete Block and Repeated Measures (Each Subject Receives Each Treatment) Designs KNNL – Chapters 21,
RESEARCH METHODOLOGY ON ENVIRONMENTAL HEALTH PRACTICE IN WEST AFRICA
Joe Brehm, Mariel Boldis, Steven Bristow, and Janyne Little
Fixed, Random and Mixed effects
Talking to Biologists, by a biologist
Analytics – Statistical Approaches
I. Introduction and Data Collection C. Conducting a Study
Sihua Peng, PhD Shanghai Ocean University
The Analysis of Variance
Introduction to Experimental Design
ENM 310 Design of Experiments and Regression Analysis Chapter 3
Chapter 8: Fundamental Sampling Distributions and Data Descriptions:
DESIGN OF EXPERIMENT (DOE)
DESIGN OF EXPERIMENTS by R. C. Baker
Chapter 10 – Part II Analysis of Variance
Principles of Experimental Design
14 Design of Experiments with Several Factors CHAPTER OUTLINE
Bootstrapping and Bootstrapping Regression Models
STATISTICS INFORMED DECISIONS USING DATA
Presentation transcript:

Sihua Peng, PhD Shanghai Ocean University 2017.10 Modern Biostatistics 4. Sampling and experimental design with R Sihua Peng, PhD Shanghai Ocean University 2017.10

Contents Introduction to R Data sets Introductory Statistical Principles Sampling and experimental design with R Graphical data presentation Simple hypothesis testing Introduction to Linear models Correlation and simple linear regression Single factor classification (ANOVA) Nested ANOVA Factorial ANOVA Simple Frequency Analysis

4 Sampling and experimental design with R A fundamental assumption of nearly all statistical procedures is that samples are collected randomly from populations. In order for a sample to truly represent a population, the sample must be collected without bias. R has a rich array of randomization tools to assist researches randomize their sampling and experimental designs.

4.1 Random sampling Biological surveys involve the collection of observations from naturally existing populations. Ideally, every possible observation should have an equal likelihood of being selected as part of the sample. The sample() function facilitates the drawing of random samples. > sample(1:37, 5, replace=F) [1] 2 16 28 30 20 Replace = T allows to put back, and replace = F means a one-time extraction.

4.1 Random sampling > MACNALLY <- read.table("macnally.csv", header=T, sep=",") > sample(row.names(MACNALLY), 5, replace=F) [1] "Arcadia" "Undera" "Warneet" "Tallarook" [5] "Donna Buang"

Selecting random coordinates from a rectangular grid Consider requiring 10 random quadrat locations from a 100 × 200 m grid. This can be done by using the runif() function to generate two sets of random coordinates: > data.frame(X=runif(10,0,100), Y=runif(10,0,200))

Random coordinates of an irregular shape Consider designing an experiment in which a number of point quadrats (lets say five) are to be established in a State Park. As represented in figure to the right, the site is not a regular rectangle and therefore the above technique is not appropriate. This problem is solved by first generating a matrix of site boundary coordinates (GPS latitude and longitude), and then using a specific set of functions from the sp package to coordinates to generate the five random coordinates.

Random coordinates of an irregular shape > LAT <- c(145.450, 145.456, 145.459, 145.457, 145.451, 145.450) > LONG <- c(37.525, 37.526, 37.528, 37.529, 37.530,37.525) > XY <- cbind(LAT,LONG) > plot(XY, type='l') > library(sp) > XY.poly <- Polygon(XY) > XY.points <- spsample(XY.poly, n=8, type='random') > XY.points

Random coordinates of an irregular shape

Random coordinates of an irregular shape These points can then be plotted on the map. > points(XY.points[1:5])

Random coordinates along a line If the line represents an irregular feature such as a river, or is very long, firstly, we can generate a matrix of X,Y coordinates for major deviations in the line, and then use the spsample() function to generate a set of random coordinates.

Random coordinates along a line > X <- c(0.77,0.5,0.55,0.45,0.4, 0.2, 0.05) > Y <- c(0.9,0.9,0.7,0.45,0.2,0.1,0.3) > XY <- cbind(X,Y) > library(sp) > XY.line <- Line(XY) > XY.points <- spsample(XY.line,n=10,'random') > plot(XY, type="l") > points(XY.points) > coordinates(XY.points)

Random coordinates along a line

4.2 Experimental design Randomization is also important in reducing confounding effects. Experimental design incorporates the order in which observations should be collected and/or the physical layout of the manipulation or survey. Good experimental design aims to reduce the risks of bias and confounding effects.

4.2.1 Fully randomized treatment allocation We design an experiment in which we intended to investigate the effect of fertilizer on the growth rate of a species of plant. We intended to have four different fertilizer treatments (A, B, C and D) and a total of six replicate plants per treatment. The plant seedlings are all in individual pots housed in a greenhouse and to assist with watering, we want to place all the seedlings on a large table arranged in a 4 × 6 matrix. To reduce the impacts of any potentially confounding effects (such as variations in water, light, temperature etc), fertilizer treatments should be assigned to seedling positions completely randomly.

gl() function To generate Factor Levels gl(n, k, length = n*k, labels = 1:n) n: an integer giving the number of levels. k: an integer giving the number of replications. length: an integer giving the length of the result. labels: an optional vector of labels for the resulting factor levels. >gl(2, 8, labels = c("Control", "Treat")) [1] Control Control Control Control Control Control Control Control Treat [10] Treat Treat Treat Treat Treat Treat Treat Levels: Control Treat

Solution This can be done by first generating a factorial vector (containing the levels A, B, C, and D, each repeated six times), using the sample function to randomize the treatment orders and then arranging it in a 4 × 6 matrix: > TREATMENTS <- gl(4,6,24,c('A','B','C','D')) > matrix(sample(TREATMENTS),nrow=4)

4.2.2 Randomized complete block treatment allocation When the conditions under which an experiment is to be conducted are expected to be sufficiently heterogeneous to substantially increase the variability in the response variable, experimental units are grouped into blocks. Each level of the treatment factor is then applied to a single unit within each block.

paste() and replicate() function >paste("Hello","world") [1] "Hello world“ > paste("A", 1:6, sep = "") [1] "A1" "A2" "A3" "A4" "A5" "A6“ To generate 3 random numbers that obey the standard normal distribution and repeat this process 5 times. > replicate(5, rnorm(3))            [,1]       [,2]       [,3]       [,4]       [,5] [1,] -0.2098800 -0.2891009 -1.5106925 -0.2941538  0.1072428 [2,]  0.1974659 -1.4352968  1.9620301 -0.5745457 -0.6394548 [3,] -0.3012112  0.5387016 -0.6761314 -0.4704064  0.2069635

Solution > TREATMENTS <- replicate(6,sample(c('A','B','C','D'))) > colnames(TREATMENTS) <- paste('Block',1:6,sep='') > TREATMENTS