Download presentation
Presentation is loading. Please wait.
Published byLisa Alexander Modified over 8 years ago
1
Lecture 1 Introduction Olivier MISSA, om502@york.ac.ukom502@york.ac.uk Advanced Research Skills
2
2 Aims Introduce the use of R for advanced statistical analyses beyond "Statistics for Ecologists". Demonstrate these analyses on a broad range of questions and situations. Develop your understanding of statistical programming. Empower you to tackle future analytical challenges on your own.
3
3 Aims Other skills will be developed too. Produce posters using CorelDraw (graphics package). Learn how to write a grant proposal.
4
4 Learning Outcomes At the end of the module, you should be able to : Determine which test to use for significance testing. Explore the inherent structure of your data through a wide range of multivariate techniques. Work out which model "best explains" the variable you are interested in. Produce high quality graphs (ready for publication) using fully R graphical capabilities.
5
5 Organisation Staff Olivier Missa (OM), module organiser, R sessions om502@york.ac.uk om502@york.ac.uk Emma Rand (ER), R sessions er13@york.ac.uk er13@york.ac.uk Phil Roberts (PTR), CorelDraw session ptr2@york.ac.uk Peter Mayhew (PJM), Grant writing session pjm19@york.ac.uk
6
6 Organisation Structure 9 theoretical lectures (OM) on advanced stats. 9 practical sessions (OM & ER) on using R. 1 practical session (PTR) on CorelDraw. 1 tutorial session (PJM) on Grant writing.
7
7 Organisation Content L1Introduction L2 – L4Linear Models L5 – L6GLMs & Mixed-effects models L7Non-Linear Models L8 – L9Multivariate Analyses Each lecture is accompanied by a practical session
8
8 Organisation Assessment Open Data Analysis exercise, Written report withIntroduction, Material & Methods, Results, Discussion. particular emphasis on justifying the analyses and interpreting the results properly.
9
9 What is R ? "R is a language and environment for statistical computing and graphics" R website A programming language, actually a dialect of S, which was developed in the 80s by John Chambers at the Bell Labs. The Bell Labs then sold S to MathSoft (now Insightful Co.), which developed it further into S-Plus, a commercial Statistical package. In the 90s, S was rewritten from scratch by two statisticians, Ross Ihaka & Rob Gentleman, from New Zealand. Since then R has continued to grow in scale and scope and is currently maintained by about 20 people across the globe.
10
10 Why use R ? The Key Benefits : it's Free It won't cost you a penny ever Open How things are calculated is not hidden Fully customisable The user is in full control Cutting Edge Stats Pros use it to create new techniques Very Widespread (increasingly so) Thousands of contributors (packages), millions of users Supported by an international user community happy to provide help and assistance
11
11 Why use R ? The Drawback : Steep Learning Curve You need to learn the language You need to know what you are doing (stats)
12
12 What is R Good for ? Absolutely everything (to do with data) Statistics Modelling Programming / Simulations Graphics (from very simple to complex, 2D, 3D,...) Database (simple relational functions) Bioinformatics (Bioconductor project) Platform interacting with other Softwares (e.g. Ggobi, WinBUGS, MySQL, GRASS GIS)
13
13 Example of a session > data(volcano) > dim(volcano) [1] 87 61 > volcano [,1] [,2] [,3] [,4] [,5] [,6] [,7]... [,61] [1,] 100 100 101 101 101 101 101... 103 [2,] 101 101 102 102 102 102 102... 104.......................... [87,] 97 97 97 98 98 99 99... 94 > volcano[1:3,1:3] [,1] [,2] [,3] [1,] 100 100 101 [2,] 101 101 102 [3,] 102 102 103
14
14 > range(volcano) [1] 94 195 > mean(volcano) [1] 130.1879 > sd(volcano) [1] 6.902227 7.565538 8.203669 8.735686... [8] 11.165554 11.735217 12.733854 13.668694...... > ?sd ## help('sd') does the same > sd function (x, na.rm = FALSE) { if (is.matrix(x)) apply(x, 2, sd, na.rm = na.rm) else if (is.vector(x)) sqrt(var(x, na.rm = na.rm)) else if (is.data.frame(x)) sapply(x, sd, na.rm = na.rm) else sqrt(var(as.vector(x), na.rm = na.rm)) }...
15
15 > sd(as.vector(volcano)) [1] 25.83233 > summary(as.vector(volcano)) Min. 1st Qu. Median Mean 3rd Qu. Max. 94.0 108.0 124.0 130.2 150.0 195.0 > volcano.v <- as.vector(volcano) > dim(volcano.v) NULL > length(volcano.v) [1] 5307 > 61*87 [1] 5307 > volcano.v[1:87] == volcano[,1] [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE.............. [87] TRUE > volcano.v[1:61] == volcano[1,]... only three values (out of 61) show "TRUE"
16
16 > plot(volcano) not useful, only show that elevation in columns 1 and 2 tend to be correlated
17
17 > plot(volcano) > plot(volcano.v, pch=20) > hist(volcano, prob=TRUE, +xlab="volcano elevation (m)") > x <- seq(90,200,1) > curve(dnorm(x, mean=mean(volcano.v), +sd=sd(volcano.v)), add=TRUE) > shapiro.test(volcano.v) Error in shapiro.test(volcano.v) : sample size must be between 3 and 5000 > smpl <- sample(volcano.v, 5000) > shapiro.test(smpl) Shapiro-Wilk normality test data: smpl W = 0.9358, p-value < 2.2e-16 W E
18
18 > library(nortest) ## Package of Normality tests > ad.test(volcano) ## Anderson-Darling Anderson-Darling normality test data: volcano A = 106.2715, p-value < 2.2e-16 > cvm.test(volcano) ## Cramer-von Mises > lillie.test(volcano) ## Lilliefors > pearson.test(volcano) ## Pearson (Chi 2 ) > sf.test(smpl) ## Shapiro-Francia > qqnorm(volcano.v) > qqline(volcano.v, col="red")
19
19 > x <- 10*(1:nrow(volcano)) ## 10, 20,..., 610 > y <- 10*(1:ncol(volcano)) ## 10, 20,..., 870 > image(x, y, volcano)
20
20 > x <- 10*(1:nrow(volcano)) > y <- 10*(1:ncol(volcano)) > image(x, y, volcano) > image(x, y, volcano, asp=1)
21
21 > x <- 10*(1:nrow(volcano)) > y <- 10*(1:ncol(volcano)) > image(x, y, volcano) > image(x, y, volcano, asp=1) > image(x, y, volcano, asp=1, +col = terrain.colors(100), +axes = FALSE, asp=1)
22
22 > x <- 10*(1:nrow(volcano)) > y <- 10*(1:ncol(volcano)) > image(x, y, volcano) > image(x, y, volcano, asp=1) > image(x, y, volcano, asp=1, +col = terrain.colors(100), +axes = FALSE, asp=1) > contour(x, y, volcano, +levels = seq(90, 200, by=5), +add = TRUE, col = "peru")
23
23 > x <- 10*(1:nrow(volcano)) > y <- 10*(1:ncol(volcano)) > image(x, y, volcano) > image(x, y, volcano, asp=1) > image(x, y, volcano, asp=1, +col = terrain.colors(100), +axes = FALSE) > contour(x, y, volcano, +levels = seq(90, 200, by=5), +add = TRUE, col = "peru") > image(x, y, volcano, asp=1, +col = terrain.colors(100), +axes = FALSE) > contour(x, y, volcano, +levels = seq(90, 200, by=10), +add = TRUE, col = "peru")
24
24 Gallery of other Volcano Graphs image + contour persp surface3d persp with shading
25
25 More Classical Graphs Histogram + Theoretical curve Boxplot Stripchart Barplot Pie chart 3D models
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.