Download presentation
Presentation is loading. Please wait.
Published byKelley Morrison Modified over 8 years ago
1
Sihua Peng, PhD Shanghai Ocean University 2016.10 Sihua Peng, PhD Shanghai Ocean University 2016.10
2
Four VIPs in statistics GossetPearsonFisherNeyman
3
William Sealy Gosset William Sealy Gosset (1876 –1937) was an English statistician. He published under the pen name Student, and developed the Student's t-distribution.
4
Karl Pearson Karl Pearson (1857 –1936) was an English mathematician and biostatistician. He has been credited with establishing the discipline of mathematical statistics. In 1911 he founded the world's first university statistics department at University College London. Many familiar statistical terms such as standard deviation, component analysis, and chi-square test were proposed by him.
5
Ronald Fisher Sir Ronald Aylmer Fisher (1890 – 1962), was an English statistician, and biologist. Many familiar statistical terms such as F-distribution, Fisher's linear discriminant, Fisher exact Test, Fisher's permutation test, and Von Mises–Fisher distribution were proposed by him.
6
Jerzy Neyman Jerzy Neyman (1894 – 1981), was a Polish mathematician and statistician who spent most of his professional career at the University of California, Berkeley. Neyman was the first to introduce the modern concept of a confidence interval into statistical hypothesis testing.
7
ReferencesReferences
8
Dr. Murray Logan He is the author of our text book, and he is an associate lecturer within the School of Biological Sciences, Monash University, Australia. http://users.monash.edu.au/~murray/i ndex.html The data sets in this book: http://users.monash.edu.au/~murray/BD AR/index.html
9
ContentsContents 1. Introduction to R 2. Data sets 3. Introductory Statistical Principles 4. Sampling and experimental design with R 5. Graphical data presentation 6. Simple hypothesis testing 7. Introduction to Linear models 8. Correlation and simple linear regression 9. Single factor classification (ANOVA) 10. Nested ANOVA 11. Factorial ANOVA 12. Simple Frequency Analysis
10
1. Introduction to R R: initially written by Ross Ihaka and Robert Gentleman at Dep. of Statistics of U of Auckland, New Zealand during 1990s.
11
VIPs of R Robert GentlemanRoss Ihaka https://www.stat.auckland.ac.nz/~ihaka/ https://en.wikipedia.org/wiki/Robert_Gentleman_(statistician)
12
What R does and does not odata handling and storage: numeric, textual omatrix algebra ohash tables and regular expressions ohigh-level data analytic and statistical functions oclasses (“OO”) ographics oprogramming language: loops, branching, subroutines is not a database, but connects to DBMSs language interpreter can be very slow, but allows to call own C/C++ code no spreadsheet view of data, but connects to Excel/MsOffice no professional / commercial support
13
Download R https://www.r-project.org/
14
Download R
15
Install R
16
The R environment After installed, you can run R.
17
The R environment Object: R is an object oriented language and everything in R is an object. For example, a single number is an object, a variable is an object, output is an object, a data set is an object that is itself a collection of objects, etc. Vector : A collection of one or more objects of the same type (e.g. all numbers or all characters etc). Function A set of instructions carried out on one or more objects. Functions are typically used to perform specific and common tasks that would otherwise require many instructions.
18
The R environment Parameter : The kind of information that can be passed to a function. Argument : The specific information passed to a function to determine how the function should perform its task. Operator : Is a symbol that has a pre-defined meaning. Familiar operators include + - * and /, which respectively perform addition, subtraction, multiplication and division.
19
Expressions, Assignment and Arithmetic >2+3 ← an expression [1] 5 ← the evaluated output > VAR1 <- 2 + 3 ← assign expression to the object VAR1 >VAR2 <-9 ← assign expression to object VAR2 > VAR2 - 1 ← print the contents of VAR2 minus 1 [1] 8 > ANS1 <- VAR1 * VAR2 ← evaluated expression assigned to ANS1 > ANS1 ← print the contents of ANS1 the evaluated output [1] 40
20
Expressions, Assignment and Arithmetic Objects can be concatenated (joined together) to create objects with multiple entries using the c() (concatenation) function. > c(1, 2, 6) ← concatenate 1, 2 and 6 [1] 1 2 6 ← printed output > c(VAR1, ANS1) ← concatenate VAR1 and ANS1 contents [1] 5 25 ← printed output
21
R workspaces Workspaces: Throughout an R session, all objects that have been added are stored within the R global environment, called the workspace. > ls() ← list current objects in R environment [1] "ANS1" "VAR1" "VAR2“ > rm(VAR1, VAR2) ← remove the VAR1 and VAR2 objects >rm(list = ls()) ← remove all user defined objects
22
R workspaces save.image() to save the workspace and thus all those objects (vectors, functions, etc) q() to quite R. getwd() To displays the current working folder setwd() To set the working folder help() >help(mean) >?mean
23
Vectors - variables The basic data storage unit in R is called a vector. A vector is a collection of one or more entries of the same class (type).
24
Factors To properly accommodate factorial (categorical) variables, R has an additional class of vector called a factor which stores the vector along with a list of the levels of the factorial variable. The factor() function converts a vector into a factor vector. >SHADE <- c("no", "no", "no", "no", "no", "full", "full", "full", "full", "full") > SHADE [1] "no" "no" "no" "no" "no" "full" "full" "full" [9] "full" "full“ >SHADE <- factor(SHADE) > SHADE [1] no no no no no full full full full full Levels: full no
25
MatricesMatrices A vector has only a single dimension – it has length. However, a vector can be converted into a matrix (2 dimensional array). X <- c(16.92, 24.03, 7.61, 15.49, 11.77) Y <- c(8.37, 12.93, 16.65, 12.2, 13.12) XY1 <- cbind(X, Y) XY2 <- rbind(X, Y)
26
To access the data in Matrices XY1[1,] First Row XY1[,2] Second column XY[2,2] the value in second row and second column XY1[1:3,] Rows from 1 to3 XY1[,1:2] Columns from 1 to 2
27
Data frames Data frames are generated by combining multiple vectors together such that each vector becomes a separate column in the data frame. In this way, a data frame is similar to a matrix in which each column can represent a different vector type. We will discuss Data Frame in details in the next chapter.
28
Working with scripts A collection of one or more commands is called a script. In R, a script is a plain text file with a separate command on each line and can be created and read in any text editor. A script is read into R by providing the full filename of the script file as an argument in the source() function. >source("filename.R")
29
A typical script may look like the following:
30
What is R package? The R package is a collection of functions with detailed descriptions and examples. Each package contains R functions, data, help files, description files, and so on. 30 R Packages
31
Common R packages (1) NameDescription ade4 Using the Euclidean method to analyze the ecological data ape Phylogeny and evolutionary analysis apTreeshape Phylogenetic tree analysis cluster Cluster analysis geiger Species Formation Rate and Evolutionary Analysis ouch Phylogenetic comparison pgirmess Ecological data analysis 31
32
R Packages Common R packages(2) NameDescription phangorn Phylogenetic analysis picante Analysis of phylogenetic diversity of community seqinr DNA sequence analysis SDMTools Species distribution model tool vegan Plant and plant community sequencing, and biodiversity calculation Graphics Plotting figures lattice Lattice 32
33
R Packages How to install a package? Install “ade4” How to install a package? Install “ade4” 33
34
How to install a package? A Simple way to install R package: >install.packages(“vegan”) 34
35
Using Packages Packages in the function must be first imported, and then can be used, so the importing the package is the first step. In the console, enter the following command: library(vegan) The functions within a package are used just like the basic functions built into R. 35
36
Control Flow Branch statement if/else ; switch. The loop statement for; while; repeat.
37
if/elseif/else if (condition) expr x=100 if (x>0) k<-log10 (x) paste("result of k:",k) if (condition) expr1 else expr2 x1=1000 if (x1<0) paste("You inputed a negtive value:",x1) else { k1<-log(x1,10) paste("The result is:",k1) }
38
Switch: switch (statement, list) x=3 switch(x,2+2,mean(1:10),rnorm(4),h=3) switch(x, 2+2, mean(1:10), (rnorm(4)), h=3) switch(x, (2+2), mean(1:10), {rnorm(4)}, h=3) The writing method is very flexible.
39
for loop:>for (name in expr1) expr2 name:variable Sum=0 for (i in 1:100){ Sum=Sum+i } paste("This is result:", Sum) N=2 switch(N, {paste("I am the first:", 1)}, {sum=0; for (i in 1:100){sum=sum+i};paste("This is second:", sum)}, rnorm(4))
40
Hilbert matrix Hilbert matrix, a matrix, the elements A (i, j) = 1 / (i + j-1).
41
Multiple for loops Constructing a 4 order Hilbert Matrix >n=4;x=matrix(0,nrow=4,ncol=4) >for(i in 1:n){ for(j in 1:n){ x[i,j]=1/(i+j-1) }
42
Multiple for loops > x [,1] [,2] [,3] [,4] [1,] 1.0000000 0.5000000 0.3333333 0.2500000 [2,] 0.5000000 0.3333333 0.2500000 0.2000000 [3,] 0.3333333 0.2500000 0.2000000 0.1666667 [4,] 0.2500000 0.2000000 0.1666667 0.1428571
43
while loop:>while (condtion) expr When the condition is satisfied, the expression is executed. Sum1=0 i=1 while (i<101){ Sum1=Sum1+i i=i+1 }
44
Repeat loop: >repeat expr The repeat loop requires the break statement to jump out of the loop. Sum2=0 i=1 repeat{ Sum2=Sum2+i i=i+1 if (i >100) break }
45
ReferencesReferences Biostatistical Design and Analysis Using R: A Practical Guide. By Murray Logan. WILEY- BLACKWELL. Introduction to Data Analysis and Graphical Presentation in Biostatistics with R. By Thomas W. MacFarland. Springer.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.