Download presentation
Presentation is loading. Please wait.
Published byMoshe Eyles Modified over 9 years ago
1
Training on R For 3 rd and 4 th Year Honours Students, Dept. of Statistics, RU Empowered by Higher Education Quality Enhancement Project (HEQEP) Department of Statistics Rajshahi University, Rajshahi-6205, Bangladesh March 21-23, 2013 Installation and Data Structures of R
2
Statistical Programming Language S developed at Bell Labs, 1976. Licensed as S-Plus in 1983. 1990 : R An open source program similar to S Developed by Robert Gentleman and Ross Ihaka (Auckland, NZ) 1997: Developed international “R-core” team Updated versions available every couple months For more: http://cran.r-project.org/mirrors.htmlhttp://cran.r-project.org/mirrors.html History of R
3
R is a free computer programming language, developed by renowned Statisticians. It is open-source and runs on Windows, Linux and Macintosh. R has excellent graphing capabilities. R has an excellent built-in help system. R's language has a powerful, easy to learn syntax with many built-in statistical functions. The language is easy to extend with user-written functions. Advantage of R
4
To obtain and install R on your computer Choose the appropriate item from the “Packages” menu Go to http://cran.r-project.org/mirrors.htmlhttp://cran.r-project.org/mirrors.html to choose a mirror near you Click on your favorite operating system (Windows, Linux, or Mac) Download and install from the “base” To install additional packages Start R on your computer Here, CRAN = Comprehensive R Archive Network.
5
To obtain and install R on your computer
8
Double Click
11
Command Prompt Tools bar Menu bar The R Environment
12
For clear screen ctrl+ L The R Environment
13
> Creating a Script File
14
Working in R: As Calculator OperatorSymbol Addition+ Subtraction- Multiplication* Division/ Power^ or ** Numeric Operators 4 +2 =6 4 – 2 = 2 4 * 2 = 8 4 / 2 = 2 4 ^ 2 = 16
15
Numeric 5, 5.76, etc Logical Values corresponding to True or False Character Strings Sequences of characters (blue, male, Rahim, etc) Variables are assigned by the operator <- or = Data type need not to be declared. a = 5 (or, a <- 5) b = “blue” c = a^2 + 5 c > aetc Variables & Assignment Operator
16
Data Structure Vectors Matrices Arrays Factors Lists Data frames
17
c() to concatenate elements or sub-vectors rep() to repeat elements or patterns seq() to generate sequences > c(2, 7, 9) > [1] 2 7 9 > a = c(2, 7, 9) > b = c(3, 5, 8, a) > b > [1] 2 7 9 2 7 9 rep(value(s), number of repetition) > rep(5,10) [1] 5 5 5 5 5 5 5 5 5 5 > rep(c(2,4,6),3) [1] 2 4 6 2 4 6 2 4 6 Vector Here we introduce three functions, c, seq, and rep, that are used to create vectors in various situations. seq(initial value, Terminated value, increment) > seq(2, 10, 2) > [1] 2 4 6 8 10
18
h = c(21,25, 19, 22, 23, 20)# Numeric vector h [1] 21 25 19 22 23 20 name = c(“Rahim”, “Rani”, “Raju”) # Character vector name [1] “Rahim” “Rani” “Raju” c = h > 22 # Logical vector c [1] FALSE TRUE FALSE FALSE TRUE FALSE a = c(1,2,3,4,5) a [1] 1 2 3 4 5 a = 1:5 a [1] 1 2 3 4 5 Vector
19
w = c(1, 3, 5, 2, 10) > w[3] # the third element of w >[1] 5 > w[3:5] # the third to fifth element of w, inclusive >[1] 5 2 10 > w[w>3] # elements in w greater than 3 >w[-2]# all except the second element >[1] 1 5 2 10 > w[w>2 & w<=5)# greater than 2 and less than or equal to 5 Vector Indexing
20
w = c(1, 3, 5, 2, 10) length(w)sum(w) cumsum(w)min(w) max(w)range(w) sum(w)mean(w) median(w)var(w) std(w)summary(w) abs(10-50)sort(w) sort(w, decreasing=T)etc Vector Vector used in functions
21
Specific R keyword help(keyword) ?keyword HTML > ?mean # information on mean command > help(mean) > help(median) > help.start() CRAN Full Manual help.start() HTML Finding "vague" topic help.search(“topic”) ??topic Working in R: Using help
22
# Generate a 3 by 4 array > x <- 1:12 > dim(x) <- c(3,4) > x [,1] [,2] [,3] [,4] [1,] 1 4 7 10 [2,] 2 5 8 11 [3,] 3 6 9 12 The dim assignment function sets or changes the dimension attribute of x, causing R to treat the vector of 12 numbers as a 3 × 4 matrix. Notice that the storage is column-major; that is, the elements of the first column are followed by those of the second, etc. # Generate a 4 by 5 array > A <- array(1:20, dim = c(4,5)) > A [,1] [,2] [,3] [,4] [,5] [1,] 1 5 9 13 17 [2,] 2 6 10 14 18 [3,] 3 7 11 15 19 [4,] 4 8 12 16 20 Array & Matrix A matrix in mathematics is just a two-dimensional array of numbers. Matrices and arrays are represented as vectors with dimensions:
23
Array & Matrix A matrix in mathematics is just a two-dimensional array of numbers. Matrices and arrays are represented as vectors with dimensions: # 3 x 2 matrix of 0 > Y <- matrix(0, nrow=3, ncol=2) > Y [,1] [,2] [1,] 0 0 [2,] 0 0 [3,] 0 0 # Generate a 3 by 2 Matrix > A = matrix(1:12, nrow=3, byrow=T) > A [,1] [,2] [,3] [,4] [1,] 1 2 3 4 [2,] 5 6 7 8 [3,] 9 10 11 12 > A[,2] # 2nd column of matrix A [1] 2 6 10 > A[3, ] # 3rd row of matrix A [1] 9 10 11 12 > A[2,2] # (2, 2) th element of matrix A [1] 2 6 10
24
Basic operations – Matrix R commandPurpose (output) A+B addition of A and B matrices A * Belement by element products A %*% Bproduct of A and B matrices t(A)transpose of matrix A solve(A)inverse of matrix A cbind()forms matrices by binding together matrices horizontally, or column-wise rbind()forms matrices by binding together matrices vertically, or row-wise
25
> A.mat <- matrix(c(19,8,11,2,18,17,15,19,10),nrow=3) > A.mat [,1] [,2] [,3] [1,] 19 2 15 [2,] 8 18 19 [3,] 11 17 10 > inv.A <- solve(A.mat) # inverse of matrix A.mat > t(A.mat) # transpose of matrix A.mat > A.mat %*% inv.A Basic operations – Matrix
26
> a=matrix(1:9,nrow=3) > b=matrix(2:10, nrow=3) > a [,1] [,2] [,3] [1,] 1 4 7 [2,] 2 5 8 [3,] 3 6 9 > b [,1] [,2] [,3] [1,] 2 5 8 [2,] 3 6 9 [3,] 4 7 10 > cbind(a,b) [,1] [,2] [,3] [,4] [,5] [,6] [1,] 1 4 7 2 5 8 [2,] 2 5 8 3 6 9 [3,] 3 6 9 4 7 10 > rbind(a,b) [,1] [,2] [,3] [1,] 1 4 7 [2,] 2 5 8 [3,] 3 6 9 [4,] 2 5 8 [5,] 3 6 9 [6,] 4 7 10 Basic operations – Matrix Cov.matrix = cov(b)Cor.matrix = cor(b) Row.mean = apply(b, 1, mean)Col.mean = apply(b, 2, mean) NOTE: apply(X, MARGIN, FUN)
27
vector: an ordered collection of data of the same type. > a = c(7,5,1) > a[2] [1] 5 list: an ordered collection of data of arbitrary types. > a = list(Name="Rahim",age=c(12, 23,10), Married = F) > a $Name [1] "Rahim" $age [1] 12 23 10 $Married [1] FALSE Typically, vector elements are accessed by their index (an integer), list elements by their name (a character string). List
28
Data frames Data frame is supposed to represent the typical data table that researchers come up with – like a spreadsheet. It is a rectangular table with rows and columns with same length; data within each column has the same type (e.g. number, text, logical), but different columns may have different types. Example: > a localisation tumorsize progress 1 proximal 6.3 FALSE 2 distal 8.0 TRUE 3 proximal 10.0 FALSE
29
We illustrate how to construct a data frame from the following car data. MakeModelCylinderWeightMileageType HondaCivicV4217033Sporty Chevrolet BerettaV4265526Compact FordEscortV4234533Small EagleSummitV4256033Small VolkswagenJettaV4233026Small BuickLe SabreV6332523Large MitsubishiGalantV4274525Compact DodgeGrand CaravanV6373518Van ChryslerNew YorkerV6345022Medium AcuraLegendV6326520Medium Making data frames
30
> Make <- c("Honda","Chevrolet","Ford","Eagle","Volkswagen","Buick","Mitsbusihi", + "Dodge","Chrysler","Acura") > Model <- c("Civic","Beretta","Escort","Summit","Jetta","Le Sabre","Galant", + "Grand Caravan","New Yorker","Legend") > Cylinder <-c (rep("V4",5),"V6","V4",rep("V6",3)) > Weight <- c(2170, 2655, 2345, 2560, 2330, 3325, 2745, 3735, 3450, 3265) > Mileage <- c(33, 26, 33, 33, 26, 23, 25, 18, 22, 20) > Type <- c("Sporty","Compact",rep("Small",3),"Large","Compact","Van", + rep("Medium",2))
31
Now data.frame() function combines the six vectors into a single data frame. > Car Car MakeModelCylinderWeightMileageType 1 HondaCivicV4217033Sporty 2 Chevrolet BerettaV4265526Compact 3 FordEscortV4234533Small 4 EagleSummitV4256033Small 5 VolkswagenJettaV4233026Small 6 BuickLe SabreV6332523Large 7 MitsubishiGalantV4274525Compact 8 DodgeGrand CaravanV6373518Van 9 ChryslerNew YorkerV6345022Medium 10 AcuraLegendV6326520Medium Making data frames
32
> names(Car) [1] "Make" "Model" "Cylinder“ "Weight" "Mileage" "Type" > Car[1,] Make Model Cylinder Weight Mileage Type 1 Honda Civic V4 2170 33 Sporty > Car[10,4] [1] 3265 > Car$Mileage [1] 33 26 33 33 26 23 25 18 22 20 > mean(Car$Mileage) #average mileage of the 10 vehicles [1] 25.9 > min(Car$Weight) [1] 2170 Making data frames
33
> table(Car$Type) # gives a frequency table Compact Large Medium Small Sporty Van 2 1 2 3 1 1 > table(Car$Make, Car$Type) # Cross tabulation Compact Large Medium Small Sporty Van Acura 0 0 1 0 0 0 Buick 0 1 0 0 0 0 Chevrolet 1 0 0 0 0 0 Chrysler 0 0 1 0 0 0 Dodge 0 0 0 0 0 1 Eagle 0 0 0 1 0 0 Ford 0 0 0 1 0 0 Honda 0 0 0 0 1 0 Mitsbusihi 1 0 0 0 0 0 Volkswagen 0 0 0 1 0 0 Making data frames
34
> Make.Small <- Car$Make[Car$Type == "Small"] > summary(Car$Mileage) # gives summary statistics Min. 1st Qu. Median Mean 3rd Qu. Max. 18.00 22.25 25.50 25.90 31.25 33.00 Making data frames
35
> b = data.frame(x=rnorm(10), y=rnorm(10), z=rnorm(10)) > b x y z 1 -1.7651180 0.462309932 0.09230914 2 -0.7340731 -1.681826091 0.66648791 3 -0.4968900 1.728658405 -0.68281664 4 -1.3217873 0.307030157 0.24192745 5 -0.2070019 0.003892192 1.19591807 6 -0.9633084 0.060328696 -1.40424843 7 -1.1323626 1.079521099 1.63552915 8 -0.7301976 -1.422012899 -0.16695860 9 0.2979073 0.528152338 0.65995778 10 -0.5759655 0.655296337 -0.39156127 > cor(b) x y z x 1.0000000000 0.0007151043 0.12151913 y 0.0007151043 1.0000000000 -0.05770153 z 0.1215191317 -0.0577015345 1.00000000 > apply(b,1,var) [1] 1.42472853 1.39573092 1.80047438 0.85041478 0.57226442 0.56454121 [7] 2.14379987 0.39516798 0.03357767 0.44098693 Making data frames
36
> b = data.frame(x=rnorm(10), y=rnorm(10), z=rnorm(10)) > b x y z 1 -1.7651180 0.462309932 0.09230914 2 -0.7340731 -1.681826091 0.66648791 3 -0.4968900 1.728658405 -0.68281664 4 -1.3217873 0.307030157 0.24192745 5 -0.2070019 0.003892192 1.19591807 6 -0.9633084 0.060328696 -1.40424843 7 -1.1323626 1.079521099 1.63552915 8 -0.7301976 -1.422012899 -0.16695860 9 0.2979073 0.528152338 0.65995778 10 -0.5759655 0.655296337 -0.39156127 attach(b) lm.D9 <- lm(y ~ x)# Regression of y on x lm.D90 <- lm(weight ~ group - 1) # omitting intercept anova(lm.D9) summary(lm.D9 Making data frames
37
Data Entry using Data Editor R has a Data Editor with spreadsheet-like interface. The interface quite useful for small data sets. Suppose we want to construct a data frame based on following data RollBstat101Bstat102 47017880 47027565 47036070 47047268
38
To do this – type > result <- data.frame(Roll=integer(0), Bstat101=numeric(0), Bstat102=numeric(0)) > result <- edit(result) Then enter the data in the Data Editor and close Editor > result # To see the data > result <- edit(result) # To modify the data Data Entry using Data Editor
39
An entire data frame can be read directly with the read.table() function. # Reading data from Excel.csv File > data1 <- read.table(file= “d:/RFiles/data1.csv", header=T, sep=“,”) > data1 <- read.csv(file= “d:/RFiles/data1.csv", header=T ) > data1 # Reading data from text file data2 <- read.table(file= “d:/RFiles/data3.txt", header=T, sep=“\t” ) > data2 > attach(data1) > detach(data1) Reading data from File
40
Importing from other statistical systems Package foreign on cran provides import facilities for files produced by the following statistical software. > read.mtp # imports a `Minitab Portable Worksheet’ > read.xport # reads a file in SAS format > read.spss # reads files created by spss Package Rstreams on cran contain functions > readSfile # reads binary objects produced by S-PLUS > data.restore # reads S-PLUS data dumps (created by data.dump)
41
Thanks
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.