Download presentation
Presentation is loading. Please wait.
Published byStuart Jefferson Modified over 9 years ago
1
Example of multivariate data What is R? R is available as Free Software under the terms of the Free Software Foundation'sFree Software Foundation GNU General Public LicenseGNU General Public License in source code form. It compiles and runs on a wide variety of UNIX platforms and similar systems (including FreeBSD and Linux),Windows and MacOS. R can be extended (easily) via packages. There are about eight packages supplied with the R distribution and many more are available through the CRAN family of Internet sites covering a very wide range of modern statistics. R is a language and environment for statistical computing and graphics.
2
Example of multivariate data The R environment A fully planned and coherent system that includes: an effective data handling and storage facility, a suite of operators for calculations on arrays (matrices), a large, coherent, integrated collection of intermediate tools for data analysis, graphical facilities for data analysis and display (on-screen or on hardcopy), a well-developed, simple and effective programming languages which includes conditionals, loops, user-defined recursive functions and input and output facilities. http://www.r-project.org/Download R for free at:
3
Exam of multivariate data R Download
4
Exam of multivariate data R Download
5
Exam of multivariate data R Download
6
Exam of multivariate data R packages
7
Exam of multivariate data R Console
8
Exam of multivariate data Import data in R
9
Exam of multivariate data Import data in R
10
Exam of multivariate data Install packages
11
Exam of multivariate data Install packages
12
Exam of multivariate data Install packages
13
Exam of multivariate data R script
14
Exam of multivariate data R script
15
Exam of multivariate data RStudio
16
Exam of multivariate data RStudio
17
Example of multivariate data Import data in RStudio
18
Exam of multivariate data Install packages in RStudio
19
Exam of multivariate data R in linux
20
Exam of multivariate data R in linux
21
Essential commands in R
22
Example Vectors in R # Character vector: > c("Huey","Dewey","Louie") [1] "Huey" "Dewey" "Louie" # Logical vector: > c(T,T,F,T) [1] TRUE TRUE FALSE TRUE # Numeric vector: > c(2,3,5,7,9) [1] 2 3 5 7 9 #Functions that create vectors: c-“concatenate” seq-”sequence” rep-”replicate” > c(42,57,12,39) [1] 42 57 12 39 > seq(4,9) [1] 4 5 6 7 8 9 > rep(1:2,5) [1] 1 2 1 2 1 2 1 2 1 2 > rep(1:2,c(3,4)) [1] 1 1 1 2 2 2 2
23
Example Factors in R Factors – a data structure that makes it possible to assign meaningful names to the categories. > pain=c(0,3,2,2,1) > fpain=factor(pain,levels=0:3) > levels(fpain)=c("none","mild","medium","severe") > fpain [1] none severe medium medium mild Levels: none mild medium severe > levels(fpain) [1] "none" "mild" "medium" "severe"
24
Example Matrices and arrays > x=1:2 > x=1:12 > dim(x)=c(3,4) > x [,1] [,2] [,3] [,4] [1,] 1 4 7 10 [2,] 2 5 8 11 [3,] 3 6 9 12 > x=matrix(1:12,nrow=3,byrow=T) > rownames(x)=LETTERS[1:3] > x [,1] [,2] [,3] [,4] A 1 2 3 4 B 5 6 7 8 C 9 10 11 12 > t(x) A B C [1,] 1 5 9 [2,] 2 6 10 [3,] 3 7 11 [4,] 4 8 12 LETTERS- build in variable that contains the capital letters A-Z. t(x) – the transpose matrix of x.
25
Example Matrices and arrays > cbind(A=1:4,B=5:8,C=9:12) A B C [1,] 1 5 9 [2,] 2 6 10 [3,] 3 7 11 [4,] 4 8 12 > rbind(A=1:4,B=5:8,C=9:12) [,1] [,2] [,3] [,4] A 1 2 3 4 B 5 6 7 8 C 9 10 11 12 # Use the functions cbind and rbind to “bind” vectors together columnwise or rowwise.
26
Example Data frames Data frame – it is a list of vectors and/or factors of the same length, which are related “across”, such that data in the same position come from the same experimental unit (subject, animal, etc.). > conc=c(5,12,20,24,35,40) > vol=c(20,25,33,40,50,55) > d=data.frame(conc,vol) > d conc vol 1 5 20 2 12 25 3 20 33 4 24 40 5 35 50 6 40 55
27
Example of multivariate data Data manipulation in R Data: “Soil” Soil properties of two adjacent locations on Wimbledon common, a sandy lowland heath (site1), and adjoining spoil mounds of calcareous clay (site 2). Parameters: Site - site number rep - quadrat replicate number pH cond - electrical conductivity of soil solution OM - percentage organic matter composition of soil H2O – percentage water content of soil after drying to 105°F
28
Example of multivariate data Read data in R >Soil=read.csv("E:/Multivariate_analysis/Data/Soil.csv",header=T) > Soil Site rep pH cond OM H2O 1 1 1 4.5 55 26 17 2 1 1 5.4 60 16 21 3 1 3 5.1 49 NA 18 4 1 4 4.8 55 27 18 5 2 1 7.6 155 5 25 6 2 2 7.8 124 NA 35 7 2 3 7.2 141 6 32 8 2 4 7.3 166 8 29 A comment in R is marked with # #import a.text file: > Soil=read.table("E:/Multivariate_analysis/Data/Soil.txt",header=T) #import a.csv file:
29
Example of multivariate data Data manipulation in R > names(Soil) [1] "Site" "rep" "pH" "cond" "OM" "H2O" #Display the column names of “Soil” data: #Display the row names: > rownames(Soil) [1] "1" "2" "3" "4" "5" "6" "7" "8" #Display the dimensions of the Soil data: > dim(Soil) [1] 8 6 rows (observations) columns (variables)
30
Example of multivariate data Data manipulation in R #Select the second column of the data: #or: #Select the third row of the data: > Soil[,2] [1] 1 1 3 4 1 2 3 4 > Soil$rep [1] 1 1 3 4 1 2 3 4 >Soil[3,] Site rep pH cond OM H2O 3 1 3 5.1 49 34 18 #Select rows 2,4, and 5: > Soil[c(2,4,5),] Site rep pH cond OM H2O 2 1 1 5.4 60 16 21 4 1 4 4.8 55 27 18 5 2 1 7.6 155 5 25
31
Example of multivariate data Data manipulation in R #Display the length of the second column: #Add a new column log.pH containing the logarithmic transform of pH: > length(Soil[,2]) [1] 8 >Soil2=transform(Soil,log.pH=log(Soil$pH)) > Soil2 Site rep pH cond OM H2O log.pH 1 1 1 4.5 55 26 17 1.504077 2 1 1 5.4 60 16 21 1.686399 3 1 3 5.1 49 NA 18 1.629241 4 1 4 4.8 55 27 18 1.568616 5 2 1 7.6 155 5 25 2.028148 6 2 2 7.8 124 NA 35 2.054124 7 2 3 7.2 141 6 32 1.974081 8 2 4 7.3 166 8 29 1.987874
32
Example of multivariate data Data manipulation in R #Delete the third column (pH) of the “Soil2” data: > Soil3=Soil2[,-3] > Soil3 Site rep cond OM H2O log.pH 1 1 1 55 26 17 1.504077 2 1 1 60 16 21 1.686399 3 1 3 49 NA 18 1.629241 4 1 4 55 27 18 1.568616 5 2 1 155 5 25 2.028148 6 2 2 124 NA 35 2.054124 7 2 3 141 6 32 1.974081 8 2 4 166 8 29 1.987874
33
Example of multivariate data Data manipulation in R #Select the first four columns of the “Soil” data: > Soil4=Soil[,1:4] > Soil4 Site rep pH cond 1 1 1 4.5 55 2 1 1 5.4 60 3 1 3 5.1 49 4 1 4 4.8 55 5 2 1 7.6 155 6 2 2 7.8 124 7 2 3 7.2 141 8 2 4 7.3 166
34
Example of multivariate data Data manipulation in R #Obtain a subset of the “Soil” data with cond >100: > Soil5=subset(Soil,Soil$cond>100) > Soil5 Site rep pH cond OM H2O 5 2 1 7.6 155 5 25 6 2 2 7.8 124 NA 35 7 2 3 7.2 141 6 32 8 2 4 7.3 166 8 29 #Obtain a subset of the “Soil” data with cond >100 and H2O<32 >Soil6=subset(Soil,Soil$cond>100&Soil$H2O<32) > Soil6 Site rep pH cond OM H2O 5 2 1 7.6 155 5 25 8 2 4 7.3 166 8 29
35
Example of multivariate data Data manipulation in R #Obtain a subset of the “Soil” data with no missing values (NA): > Soil7=subset(Soil, !is.na(Soil$OM)) > Soil7 Site rep pH cond OM H2O 1 1 1 4.5 55 26 17 2 1 1 5.4 60 16 21 4 1 4 4.8 55 27 18 5 2 1 7.6 155 5 25 7 2 3 7.2 141 6 32 8 2 4 7.3 166 8 29 #Obtain a subset of the “Soil” data with missing values (NA): > Soil8=subset(Soil,is.na(Soil$OM)) > Soil8 Site rep pH cond OM H2O 3 1 3 5.1 49 NA 18 6 2 2 7.8 124 NA 35
36
Example of multivariate data Data manipulation in R #Identify which observations have pH<7: > which(Soil$pH<7) [1] 1 2 3 4 # observations (rows) 1,2,3,and 4 have pH<7. #Identify which observations have missing values for OM: > which(is.na(Soil$OM)) [1] 3 6 #observations 3 and 6 have missing values for OM. #Identify which observation has pH=5.4: > which(Soil$pH==5.4) [1] 2 > which(Soil$Site!=1) [1] 5 6 7 8 #Identify which observations are not from the Site 1:
37
Example of multivariate data Data manipulation in R #Order “Soil” data by pH: > Soil9=Soil[order(Soil$pH),] > Soil9 Site rep pH cond OM H2O 1 1 1 4.5 55 26 17 4 1 4 4.8 55 27 18 3 1 3 5.1 49 NA 18 2 1 1 5.4 60 16 21 7 2 3 7.2 141 6 32 8 2 4 7.3 166 8 29 5 2 1 7.6 155 5 25 6 2 2 7.8 124 NA 35 > Soil10=Soil[order(-Soil$pH),] > Soil10 Site rep pH cond OM H2O 6 2 2 7.8 124 NA 35 5 2 1 7.6 155 5 25 8 2 4 7.3 166 8 29 7 2 3 7.2 141 6 32 2 1 1 5.4 60 16 21 3 1 3 5.1 49 NA 18 4 1 4 4.8 55 27 18 1 1 1 4.5 55 26 17 Increasing Decreasing
38
Example of multivariate data Data manipulation in R #Save “Soil10” data from the R console to your computer: >write.table(Soil10,file="E:/Multivariate_analysis/pH_Order_Soil.csv“, row.names=F,col.names=names(Soil10),quote=F,sep=",") #Load a package in R (after installing it): > library(MASS)# load the package called MASS # Get help with R functions: >help(read.table) >?read.table or
39
Example of multivariate data Get help in R
40
Example of multivariate data Simple summary statistics #Calculate mean, standard deviation, variance, median, sum, and maximum and minimum values for “cond” in “Soil” data: > mean(Soil$cond) [1] 100.625 > sd(Soil$cond) [1] 50.54824 > var(Soil$cond) [1] 2555.125 > median(Soil$cond) [1] 92 > sum(Soil$cond) [1] 805 > max(Soil$cond) [1] 166 > min(Soil$cond) [1] 49
41
Example of multivariate data Graphics in R
42
Example of multivariate data Graphics in R
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.