Download presentation
Presentation is loading. Please wait.
1
Sihua Peng, PhD Shanghai Ocean University 2018.10
Modern Biostatistics Sihua Peng, PhD Shanghai Ocean University
2
Four VIPs in statistics
Gosset Pearson Fisher Neyman
3
William Sealy Gosset William Sealy Gosset (1876 –1937) was an English statistician. He published under the pen name Student, and developed the Student's t-distribution.
4
Karl Pearson Karl Pearson (1857 –1936) was an English mathematician and biostatistician. He has been credited with establishing the discipline of mathematical statistics. In 1911 he founded the world's first university statistics department at University College London. Many familiar statistical terms such as standard deviation, component analysis, and chi-square test were proposed by him.
5
Ronald Fisher Sir Ronald Aylmer Fisher (1890 –1962), was an English statistician, and biologist. Many familiar statistical terms such as F-distribution, Fisher's linear discriminant, Fisher exact Test, Fisher's permutation test, and Von Mises–Fisher distribution were proposed by him.
6
Jerzy Neyman Jerzy Neyman (1894 – 1981), was a Polish mathematician and statistician who spent most of his professional career at the University of California, Berkeley. Neyman was the first to introduce the modern concept of a confidence interval into statistical hypothesis testing.
7
References
8
Dr. Murray Logan He is the author of our text book, and he is an associate lecturer within the School of Biological Sciences, Monash University, Australia. The data sets in this book:
9
Contents Introduction to R Data sets
Introductory Statistical Principles Sampling and experimental design with R Graphical data presentation Simple hypothesis testing Introduction to Linear models Correlation and simple linear regression Single factor classification (ANOVA) Nested ANOVA Factorial ANOVA Simple Frequency Analysis
10
1. Introduction to R R: initially written by Ross Ihaka and Robert Gentleman at Dep. of Statistics of U of Auckland, New Zealand during 1990s.
11
VIPs of R Ross Ihaka Robert Gentleman
12
What R does and does not data handling and storage: numeric, textual
matrix algebra hash tables and regular expressions high-level data analytic and statistical functions classes (“OO”) graphics programming language: loops, branching, subroutines is not a database, but connects to DBMSs language interpreter can be very slow, but allows to call own C/C++ code no spreadsheet view of data, but connects to Excel/MsOffice no professional / commercial support
13
Download R
14
Download R
15
Install R
16
The R environment After installed, you can run R.
17
The R environment Object:
R is an object oriented language and everything in R is an object. For example, a single number is an object, a variable is an object, output is an object, a data set is an object that is itself a collection of objects, etc. Vector : A collection of one or more objects of the same type (e.g. all numbers or all characters etc). Function A set of instructions carried out on one or more objects. Functions are typically used to perform specific and common tasks that would otherwise require many instructions.
18
The R environment Parameter :
The kind of information that can be passed to a function. Argument : The specific information passed to a function to determine how the function should perform its task. Operator : Is a symbol that has a pre-defined meaning. Familiar operators include + - * and /, which respectively perform addition, subtraction, multiplication and division.
19
Expressions, Assignment and Arithmetic
>2+3 ←an expression [1] 5 ←the evaluated output > VAR1 < ←assign expression to the object VAR1 >VAR2 <-9 ← assign expression to object VAR2 > VAR2 - 1 ←print the contents of VAR2 minus 1 [1] 8 > ANS1 <- VAR1 * VAR2 ←evaluated expression assigned to ANS1 > ANS1 ←print the contents of ANS1 the evaluated output [1] 40
20
Expressions, Assignment and Arithmetic
Objects can be concatenated (joined together) to create objects with multiple entries using the c() (concatenation) function. > c(1, 2, 6) ←concatenate 1, 2 and 6 [1] ←printed output > c(VAR1, ANS1) ←concatenate VAR1 and ANS1 contents [1] 5 25 ←printed output
21
R workspaces Workspaces:
Throughout an R session, all objects that have been added are stored within the R global environment, called the workspace. > ls() ←list current objects in R environment [1] "ANS1" "VAR1" "VAR2“ > rm(VAR1, VAR2) ←remove the VAR1 and VAR2 objects >rm(list = ls()) ←remove all user defined objects
22
R workspaces getwd() To displays the current working folder
save.image() to save the workspace and thus all those objects (vectors, functions, etc) q() to quite R. getwd() To displays the current working folder setwd() To set the working folder help() >help(mean) >?mean
23
Vectors - variables The basic data storage unit in R is called a vector. A vector is a collection of one or more entries of the same class (type).
24
Factors To properly accommodate factorial (categorical) variables, R has an additional class of vector called a factor which stores the vector along with a list of the levels of the factorial variable. The factor() function converts a vector into a factor vector. >SHADE <- c("no", "no", "no", "no", "no", "full", "full", "full", "full", "full") > SHADE [1] "no" "no" "no" "no" "no" "full" "full" "full" [9] "full" "full“ >SHADE <- factor(SHADE) [1] no no no no no full full full full full Levels: full no
25
Matrices A vector has only a single dimension – it has length. However, a vector can be converted into a matrix (2 dimensional array). X <- c(16.92, 24.03, 7.61, 15.49, 11.77) Y <- c(8.37, 12.93, 16.65, 12.2, 13.12) XY1 <- cbind(X, Y) XY2 <- rbind(X, Y)
26
To access the data in Matrices
XY1[1,] First Row XY1[,2] Second column XY[2,2] the value in second row and second column XY1[1:3,] Rows from 1 to3 XY1[,1:2] Columns from 1 to 2
27
Data frames Data frames are generated by combining multiple vectors together such that each vector becomes a separate column in the data frame. In this way, a data frame is similar to a matrix in which each column can represent a different vector type. We will discuss Data Frame in details in the next chapter.
28
Working with scripts A collection of one or more commands is called a script. In R, a script is a plain text file with a separate command on each line and can be created and read in any text editor. A script is read into R by providing the full filename of the script file as an argument in the source() function. >source("filename.R")
29
A typical script may look like the following:
30
R Packages What is R package?
The R package is a collection of functions with detailed descriptions and examples. Each package contains R functions, data, help files, description files, and so on.
31
Common R packages (1) R Packages ade4
Name Description ade4 Using the Euclidean method to analyze the ecological data ape Phylogeny and evolutionary analysis apTreeshape Phylogenetic tree analysis cluster Cluster analysis geiger Species Formation Rate and Evolutionary Analysis ouch Phylogenetic comparison pgirmess Ecological data analysis
32
Common R packages(2) R Packages phangorn Phylogenetic analysis picante
Name Description phangorn Phylogenetic analysis picante Analysis of phylogenetic diversity of community seqinr DNA sequence analysis SDMTools Species distribution model tool vegan Plant and plant community sequencing, and biodiversity calculation Graphics Plotting figures lattice Lattice
33
R Packages How to install a package? Install “ade4”
34
How to install a package?
A Simple way to install R package: >install.packages(“vegan”)
35
Using Packages library(vegan)
Packages in the function must be first imported, and then can be used, so the importing the package is the first step. In the console, enter the following command: library(vegan) The functions within a package are used just like the basic functions built into R.
36
Control Flow Branch statement if/else ; switch . The loop statement
for; while; repeat.
37
if/else if (condition) expr x=100 if (x>0) k<-log10 (x)
paste("result of k:",k) if (condition) expr1 else expr2 x1=1000 if (x1<0) paste("You inputed a negtive value:",x1) else { k1<-log(x1,10) paste("The result is:",k1) }
38
Switch: switch (statement, list)
x=3 switch(x,2+2,mean(1:10),rnorm(4),h=3) switch(x, 2+2, mean(1:10), (rnorm(4)), h=3) (2+2), {rnorm(4)}, The writing method is very flexible.
39
for loop:>for (name in expr1) expr2
name:variable Sum=0 for (i in 1:100){ Sum=Sum+i } paste("This is result:", Sum) N=2 switch(N, {paste("I am the first:", 1)}, {sum=0; for (i in 1:100){sum=sum+i};paste("This is second:", sum)}, rnorm(4))
40
Hilbert matrix Hilbert matrix, a matrix, the elements A (i, j) = 1 / (i + j-1).
41
Multiple for loops Constructing a 4 order Hilbert Matrix
>n=4;x=matrix(0,nrow=4,ncol=4) >for(i in 1:n){ for(j in 1:n){ x[i,j]=1/(i+j-1) }
42
Multiple for loops > x [,1] [,2] [,3] [,4]
[,1] [,2] [,3] [,4] [1,] [2,] [3,] [4,]
43
while loop:>while (condtion) expr
When the condition is satisfied, the expression is executed. Sum1=0 i=1 while (i<101){ Sum1=Sum1+i i=i+1 }
44
Repeat loop: >repeat expr
The repeat loop requires the break statement to jump out of the loop. Sum2=0 i=1 repeat{ Sum2=Sum2+i i=i+1 if (i >100) break }
45
References Biostatistical Design and Analysis Using R: A Practical Guide. By Murray Logan. WILEY-BLACKWELL. Introduction to Data Analysis and Graphical Presentation in Biostatistics with R. By Thomas W. MacFarland. Springer.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.