Presentation is loading. Please wait.

Presentation is loading. Please wait.

Sihua Peng, PhD Shanghai Ocean University

Similar presentations


Presentation on theme: "Sihua Peng, PhD Shanghai Ocean University"— Presentation transcript:

1 Sihua Peng, PhD Shanghai Ocean University 2016.10
Modern Biostatistics Sihua Peng, PhD Shanghai Ocean University

2 Four VIPs in statistics
Gosset Pearson Fisher Neyman

3 William Sealy Gosset William Sealy Gosset (1876 –1937) was an English statistician. He published under the pen name Student, and developed the Student's t-distribution.

4 Karl Pearson Karl Pearson (1857 –1936) was an English mathematician and biostatistician. He has been credited with establishing the discipline of mathematical statistics. In 1911 he founded the world's first university statistics department at University College London. Many familiar statistical terms such as standard deviation, component analysis, and chi-square test were proposed by him.

5 Ronald Fisher Sir Ronald Aylmer Fisher (1890 –1962), was an English statistician, and biologist. Many familiar statistical terms such as F-distribution, Fisher's linear discriminant, Fisher exact Test, Fisher's permutation test, and Von Mises–Fisher distribution were proposed by him. F-distribution arises frequently as the null distribution of a test statistic, most notably in the analysis of variance.

6 Jerzy Neyman Jerzy Neyman (1894 – 1981), was a Polish mathematician and statistician who spent most of his professional career at the University of California, Berkeley. Neyman was the first to introduce the modern concept of a confidence interval into statistical hypothesis testing.

7 References

8 Dr. Murray Logan He is the author of our text book, and he is an associate lecturer within the School of Biological Sciences, Monash University, Australia. The data sets in this book:

9 Contents Introduction to R Data sets
Introductory Statistical Principles Sampling and experimental design with R Graphical data presentation Simple hypothesis testing Introduction to Linear models Correlation and simple linear regression Single factor classification (ANOVA) Nested ANOVA Factorial ANOVA Simple Frequency Analysis

10 1. Introduction to R R: initially written by Ross Ihaka and Robert Gentleman at Dep. of Statistics of U of Auckland, New Zealand during 1990s.

11 VIPs of R Ross Ihaka Robert Gentleman

12 What R does and does not data handling and storage: numeric, textual
matrix algebra hash tables and regular expressions high-level data analytic and statistical functions classes (“OO”) graphics programming language: loops, branching, subroutines is not a database, but connects to DBMSs language interpreter can be very slow, but allows to call own C/C++ code no spreadsheet view of data, but connects to Excel/MsOffice no professional / commercial support

13 Download R

14 Download R

15 Install R

16 The R environment After installed, you can run R.

17 The R environment Object:
R is an object oriented language and everything in R is an object. For example, a single number is an object, a variable is an object, output is an object, a data set is an object that is itself a collection of objects, etc. Vector : A collection of one or more objects of the same type (e.g. all numbers or all characters etc). Function A set of instructions carried out on one or more objects. Functions are typically used to perform specific and common tasks that would otherwise require many instructions.

18 The R environment Parameter :
The kind of information that can be passed to a function. Argument : The specific information passed to a function to determine how the function should perform its task. Operator : Is a symbol that has a pre-defined meaning. Familiar operators include + - * and /, which respectively perform addition, subtraction, multiplication and division.

19 Expressions, Assignment and Arithmetic
>2+3 ←an expression [1] 5 ←the evaluated output > VAR1 < ←assign expression to the object VAR1 >VAR2 <-9 ← assign expression to object VAR2 > VAR2 - 1 ←print the contents of VAR2 minus 1 [1] 8 > ANS1 <- VAR1 * VAR2 ←evaluated expression assigned to ANS1 > ANS1 ←print the contents of ANS1 the evaluated output [1] 40

20 Expressions, Assignment and Arithmetic
Objects can be concatenated (joined together) to create objects with multiple entries using the c() (concatenation) function. > c(1, 2, 6) ←concatenate 1, 2 and 6 [1] ←printed output > c(VAR1, ANS1) ←concatenate VAR1 and ANS1 contents [1] 5 25 ←printed output

21 R workspaces > ls() ←list current objects in R environment
[1] "ANS1" "VAR1" "VAR2“ > rm(VAR1, VAR2) ←remove the VAR1 and VAR2 objects rm(list = ls()) ←remove all user defined objects Workspaces: Throughout an R session, all objects that have been added are stored within the R global environment, called the workspace.

22 R workspaces getwd() To displays the current working folder
save.image()  to save the workspace and thus all those objects (vectors, functions, etc) load()  to load the a previously saved workspace and thus all those objects. q()  to quite R. getwd() To displays the current working folder setwd() To set the working folder help() >help(mean) >?mean

23 Vectors - variables The basic data storage unit in R is called a vector. A vector is a collection of one or more entries of the same class (type).

24 Factors To properly accommodate factorial (categorical) variables, R has an additional class of vector called a factor which stores the vector along with a list of the levels of the factorial variable. The factor() function converts a vector into a factor vector. >SHADE <- c("no", "no", "no", "no", "no", "full", "full", "full", "full", "full") > SHADE [1] "no" "no" "no" "no" "no" "full" "full" "full" [9] "full" "full“ >SHADE <- factor(SHADE) [1] no no no no no full full full full full Levels: full no

25 Matrices A vector has only a single dimension – it has length. However, a vector can be converted into a matrix (2 dimensional array). X <- c(16.92, 24.03, 7.61, 15.49, 11.77) Y <- c(8.37, 12.93, 16.65, 12.2, 13.12) XY1 <- cbind(X, Y) XY2 <- rbind(X, Y)

26 To access the data in Matrices
XY1[1,]  First Row XY1[,2] Second column XY[2,2]  the value in second row and second column XY1[1:3,] Rows from 1 to3 XY1[,1:2] Columns from 1 to 2

27 Data frames Data frames are generated by combining multiple vectors together such that each vector becomes a separate column in the data frame. In this way, a data frame is similar to a matrix in which each column can represent a different vector type. We will discuss Data Frame in details in the next chapter.

28 Working with scripts A collection of one or more commands is called a script. In R, a script is a plain text file with a separate command on each line and can be created and read in any text editor. A script is read into R by providing the full filename of the script file as an argument in the source() function. >source("filename.R")

29 A typical script may look like the following:

30 References Biostatistical Design and Analysis Using R: A Practical Guide. By Murray Logan. WILEY-BLACKWELL. Introduction to Data Analysis and Graphical Presentation in Biostatistics with R. By Thomas W. MacFarland. Springer.

31


Download ppt "Sihua Peng, PhD Shanghai Ocean University"

Similar presentations


Ads by Google