Why R? Free Powerful (add-on packages) Online help from statistical community Code-based (can build programs) Publication-quality graphics
Why not? Time to learn code Very simple statistics may be faster with “point-and-click” software (e.g. Statistica, JMP)
Why generalized linear models (GLMs)? Most ecological data FAIL these two assumptions of parametric statistics: Variance is independent of mean (“homoscedasticity”) Data are normally distributed
Taylors power law: most ecological data has 1>b>2 Mean Variance Variance = a* Mean b
Many types of ecological data are expected to be non-normal Count data are expected to be Poisson Examples: population size, species richness Binary (0,1) data are expected to be binomial Examples: survivorship, species presence
Workshop in R & GLMs Session 1: Basic commands + linear models Session 2: Testing parametric assumptions Session 3: How generalized linear models work Session 4: Model simplification and overdispersion
Exercise 1. Open R “>” is the command prompt 2. Write: x <- “hello” x 3. What do the arrow keys do? And the “end” key? Ready!
Exercise x <- 5 y<- 1 x+y; x*y; x/y ; x^y sqrt(x); log (x); exp (x) Careful! Capitalization matters, Y and y are different. Spaces do not matter, x<-5 is the same as x < - 5. “;” means new command follows
Vectors X <- c(8,2,5,9) “c” means combine
Vectors x <- rep (0,4) x <- 1:4 x <- seq (1,7, by=2) 0,0,0,0 1,2,3,4 1,3,5,7 Create a vector called “test” 0,0,0,0,2,4,6,8,10 using all of the commands c, rep, seq test<- c (rep(0,4), seq(2,10,by=2))
Vectors Select an element of your vector (x = 1,3,5,7): x[2]3 1,5 3,5,7 x[c(1,3)] x[2:4] Change an element of your vector (x = 1,3,5,7): x[1] <- 9 ; x 9,3,5,7
Matrices Dog <- c(1,4,6,8) Cat<- c(2,3,5,7) Animals<-cbind (Dog, Cat) DogCat vector matrix
Logical operators x<- 5; y<- 6 x > y x< y x==y x!=y True is the same as 1, false is the same as 0 false true false true 2 + (x>=y) 2 + (x<=y) 2323
Logical operators x<- c(1,2,3,4); y<- c(5,6,7,8) z = 7]; z Useful for quickly making subsets of your data! 3,4 x<- c(1,0.01,3,0.02) In this vector, change all values <1 to 0 x[x<1]<-0
Conditional operators x<- 5 ; z<-0 if (x>4) {z<-2}; z Could have a large program running in { } 2
Loops y<-0; x<-0 for (y in 1:20) {x<- x+ 0.5; print(x)} Useful for programming randomization procedures. Bootstrap example: y<-0; x<-1:50 output<-rep(0,1000) for (y in 1:1000) {output [y] <- var (sample (x, replace=T))} mean(output)
Writing programs I encourage you to use the script editor! File > New script Write your code Select the code you want to run (CTRL-A is all code) Run code (CTRL-R) File > Save as R script files are always *.R
Entering data 1. In Excel, give your data columns/rows and text data simple one word labels (e.g."treatment") 2. Format cells so < 8 digits per cell. 3. Save as "csv" file. 4. Use the following command to find and load your file: diane<-read.table(file.choose(),sep=“,”,header=TRUE) 5. Check it is there! diane Invent a dataframe name
Dataframes Dataframes are analogous to spreadsheets Best if all columns in your dataframe have the same length Missing values are coded as "NA" in R If you coded your missing values with a different label in your spreadsheet (e.g. "none") then: read.table (….., na.strings="none")
Dataframes Two ways to identify a column (called "treatment") in your dataframe (called "diane"): diane$treatment OR attach(diane); treatment At end of session, remember to: detach(diane)
Summary statistics length (x) mean (x) var (x) cor (x,y) sum (x) summary (x) minimum, maximum, mean, median, quartiles What is the correlation between two variables in your dataset?
Factors A factor has several discrete levels (e.g. control, herbicide) If a vector contains text, R automatically assumes it is a factor. To manually convert numeric vector to a factor: x <- as.factor(x) To check if your vector is a factor, and what the levels are: is.factor(x) ; levels(x)
1. Download R on your computer. Either go to and follow the download CRAN linkshttp:// or directly to 2. Instruction Manuals to R are found at main webpage: follow links to Documentation > Manuals I recommend "An Introduction to R" Homework
3. Write a short program that: Allows you to import the data from Lakedata_06.csv (posted on Make lake area into a factor called AreaFactor: Area 0 to 5 ha: small Area 5.1 to 10: medium Area > 10 ha: large
hints You will need to: 1. Tell R how long AreaFactor will be. 2. Assign cells in AreaFactor to each of the 3 levels 3. Make AreaFactor into a factor, then check that it is a factor