Presentation is loading. Please wait.

Presentation is loading. Please wait.

Sihua Peng, PhD Shanghai Ocean University

Similar presentations


Presentation on theme: "Sihua Peng, PhD Shanghai Ocean University"— Presentation transcript:

1 Sihua Peng, PhD Shanghai Ocean University 2018.10
Modern Biostatistics 2. Data sets Sihua Peng, PhD Shanghai Ocean University

2 Contents Introduction to R Data sets
Introductory Statistical Principles Sampling and experimental design with R Graphical data presentation Simple hypothesis testing Introduction to Linear models Correlation and simple linear regression Single factor classification (ANOVA) Nested ANOVA Factorial ANOVA Simple Frequency Analysis

3 R Function Each function performs a specific function, followed by brackets, for example: mean(): average value sum(): Summation plot(): Plotting sort(): Sorting log(); log2; log10(): log10; exp(); sin(); cos();sd()

4 Data frames: An example

5 Data frames: An example
Firstly, generate the three variables (excluding the site labels as they are not variables) separately: > HABITAT <- factor(c("Mixed", "Gipps.Manna", "Gipps.Manna", "Gipps.Manna", "Mixed", "Mixed", "Mixed", "Mixed")) > GST <- c(3.4, 3.4, 8.4, 3, 5.6, 8.1, 8.3, 4.6) > EYR <- c(0, 9.2, 3.8, 5, 5.6, 4.1, 7.1, 5.3)

6 Data frames: An example
Next, use the names of the vectors as arguments in the data.frame() function to amalgamate the three separate variables into a single data frame (data set) which we will call MACNALLY. > MACNALLY <- data.frame(HABITAT, GST, EYR)

7 Data frames: An example
Notice that each vector (variable) becomes a column in the data frame and that each row represents a single sampling unit. By default, the rows are named using numbers corresponding to the number of rows in the data frame. However, these can be altered to reflect the names of the sampling units by assigning a list of alternative names to the row.names() property of the data frame.

8 Data frames: An example
> row.names(MACNALLY) <- c("Reedy Lake", "Pearcedale", "Warneet", "Cranbourne", "Lysterfield", "Red Hill", "Devilbend", "Olinda")

9 Access the data in a data frame
MACNALLY$HABITAT access the Column 1 MACNALLY$GST access the Column 2 MACNALLY$EYR access the Colum 3 MACNALLY[1,]  First row MACNALLY[,3]  Third column MACNALLY[3,2]  Element of third row and second column i=1:4; MACNALLY[i,]  rows from 1 to 4 MACNALLY[,2:3] cloumns from 2 to 3

10 Importing (reading) data
> MACNALLY <- read.table( + 'macnally.csv', header=T, + row.names=1, sep=‘,') > MACNALLY <- read.table( + 'macnally.txt', header=T, + row.names=1, sep='\t')

11 Reviewing a data frame - fix()
A data frame can also be viewed as a simple spreadsheet in a separate window by using the name of the data frame as an argument in the fix() function. The fix() function also enables simple editing of the data frame. >fix(MACNALLY)

12 Saving and loading of R objects
Any object in R (including data frames) can also be saved into a native R workspace image file (*.RData) either individually, or as a collection of objects using the save() function. For example; > save(MACNALLY, file='macnally.RData') The saved object(s) can be loaded during subsequent sessions by providing the name of the saved workspace image file as an argument to the load() function. For example; > load("macnally.RData")

13 Exporting (writing) data
The write.table() function is used to save data frames. > write.table(MACNALLY, "macnally.csv", quote = F, row.names = T, sep = ",")

14 Dummy data sets - generating random data
Normal > # generate 5 random numbers from a normal > # distribution with a mean of 10 and a standard > # deviation of 1 > rnorm(5,mean=10,sd=1) [1] Log-Normal > # generate 5 random numbers from a log-normal > # distribution whose logarithm has a mean of 2 and a > # standard deviation of 1 > rlnorm(5,mean=2,sd=1) [1]

15 Dummy data sets - generating random data
Poisson > # generate 5 random numbers from a Poisson > # distribution with a lambda parameter of 4 > rpois(5,min=1,max=10) [1] Binomial > # generate 5 random numbers from a binomial > # distribution based on 10 Bernoulli trials and > # a prob. of 0.5 > rbinom(5,size=10,prob=.5) [1]

16 Manipulating data sets
Subsets of data frames – data frame indexing > #extract all the bird densities from sites that have GST values greater than 3 > subset(MACNALLY, GST>3)

17 The %in% matching operator
Subset the MACNALLY dataset according to those rows that correspond to HABITAT 'Montane Forest' or 'Foothills Woodland' > MACNALLY[MACNALLY$HABITAT %in% c("Montane Forest", "Foothills Woodland"),]

18 Sorting datasets > MACNALLY[order(MACNALLY$HABITAT, MACNALLY$GST), ]

19


Download ppt "Sihua Peng, PhD Shanghai Ocean University"

Similar presentations


Ads by Google