Training on R For 3 rd and 4 th Year Honours Students, Dept. of Statistics, RU Empowered by Higher Education Quality Enhancement Project (HEQEP) Department.

Training on R For 3 rd and 4 th Year Honours Students, Dept. of Statistics, RU Empowered by Higher Education Quality Enhancement Project (HEQEP) Department of Statistics Rajshahi University, Rajshahi-6205, Bangladesh March 21-23, 2013 Installation and Data Structures of R

Statistical Programming Language S developed at Bell Labs, 1976. Licensed as S-Plus in 1983. 1990 : R An open source program similar to S Developed by Robert Gentleman and Ross Ihaka (Auckland, NZ) 1997: Developed international “R-core” team Updated versions available every couple months For more: http://cran.r-project.org/mirrors.htmlhttp://cran.r-project.org/mirrors.html History of R

 R is a free computer programming language, developed by renowned Statisticians.  It is open-source and runs on Windows, Linux and Macintosh.  R has excellent graphing capabilities.  R has an excellent built-in help system.  R's language has a powerful, easy to learn syntax with many built-in statistical functions.  The language is easy to extend with user-written functions. Advantage of R

To obtain and install R on your computer  Choose the appropriate item from the “Packages” menu  Go to http://cran.r-project.org/mirrors.htmlhttp://cran.r-project.org/mirrors.html to choose a mirror near you  Click on your favorite operating system (Windows, Linux, or Mac)  Download and install from the “base” To install additional packages  Start R on your computer Here, CRAN = Comprehensive R Archive Network.

To obtain and install R on your computer

Double Click

Command Prompt Tools bar Menu bar The R Environment

For clear screen ctrl+ L The R Environment

> Creating a Script File

Working in R: As Calculator OperatorSymbol Addition+ Subtraction- Multiplication* Division/ Power^ or ** Numeric Operators  4 +2 =6  4 – 2 = 2  4 * 2 = 8  4 / 2 = 2  4 ^ 2 = 16

 Numeric 5, 5.76, etc  Logical Values corresponding to True or False  Character Strings Sequences of characters (blue, male, Rahim, etc)  Variables are assigned by the operator <- or =  Data type need not to be declared. a = 5 (or, a <- 5) b = “blue” c = a^2 + 5 c > aetc Variables & Assignment Operator

Data Structure  Vectors  Matrices  Arrays  Factors  Lists  Data frames

c() to concatenate elements or sub-vectors rep() to repeat elements or patterns seq() to generate sequences > c(2, 7, 9) > [1] 2 7 9 > a = c(2, 7, 9) > b = c(3, 5, 8, a) > b > [1] 2 7 9 2 7 9 rep(value(s), number of repetition) > rep(5,10) [1] 5 5 5 5 5 5 5 5 5 5 > rep(c(2,4,6),3) [1] 2 4 6 2 4 6 2 4 6 Vector Here we introduce three functions, c, seq, and rep, that are used to create vectors in various situations. seq(initial value, Terminated value, increment) > seq(2, 10, 2) > [1] 2 4 6 8 10

h = c(21,25, 19, 22, 23, 20)# Numeric vector h [1] 21 25 19 22 23 20 name = c(“Rahim”, “Rani”, “Raju”) # Character vector name [1] “Rahim” “Rani” “Raju” c = h > 22 # Logical vector c [1] FALSE TRUE FALSE FALSE TRUE FALSE a = c(1,2,3,4,5) a [1] 1 2 3 4 5 a = 1:5 a [1] 1 2 3 4 5 Vector

w = c(1, 3, 5, 2, 10) > w[3] # the third element of w >[1] 5 > w[3:5] # the third to fifth element of w, inclusive >[1] 5 2 10 > w[w>3] # elements in w greater than 3 >w[-2]# all except the second element >[1] 1 5 2 10 > w[w>2 & w<=5)# greater than 2 and less than or equal to 5 Vector Indexing

w = c(1, 3, 5, 2, 10) length(w)sum(w) cumsum(w)min(w) max(w)range(w) sum(w)mean(w) median(w)var(w) std(w)summary(w) abs(10-50)sort(w) sort(w, decreasing=T)etc Vector Vector used in functions

Specific R keyword help(keyword) ?keyword HTML > ?mean # information on mean command > help(mean) > help(median) > help.start() CRAN Full Manual help.start() HTML Finding "vague" topic help.search(“topic”) ??topic Working in R: Using help

# Generate a 3 by 4 array > x <- 1:12 > dim(x) <- c(3,4) > x [,1] [,2] [,3] [,4] [1,] 1 4 7 10 [2,] 2 5 8 11 [3,] 3 6 9 12  The dim assignment function sets or changes the dimension attribute of x, causing R to treat the vector of 12 numbers as a 3 × 4 matrix.  Notice that the storage is column-major; that is, the elements of the first column are followed by those of the second, etc. # Generate a 4 by 5 array > A <- array(1:20, dim = c(4,5)) > A [,1] [,2] [,3] [,4] [,5] [1,] 1 5 9 13 17 [2,] 2 6 10 14 18 [3,] 3 7 11 15 19 [4,] 4 8 12 16 20 Array & Matrix A matrix in mathematics is just a two-dimensional array of numbers. Matrices and arrays are represented as vectors with dimensions:

Array & Matrix A matrix in mathematics is just a two-dimensional array of numbers. Matrices and arrays are represented as vectors with dimensions: # 3 x 2 matrix of 0 > Y <- matrix(0, nrow=3, ncol=2) > Y [,1] [,2] [1,] 0 0 [2,] 0 0 [3,] 0 0 # Generate a 3 by 2 Matrix > A = matrix(1:12, nrow=3, byrow=T) > A [,1] [,2] [,3] [,4] [1,] 1 2 3 4 [2,] 5 6 7 8 [3,] 9 10 11 12 > A[,2] # 2nd column of matrix A [1] 2 6 10 > A[3, ] # 3rd row of matrix A [1] 9 10 11 12 > A[2,2] # (2, 2) th element of matrix A [1] 2 6 10

Basic operations – Matrix R commandPurpose (output) A+B addition of A and B matrices A * Belement by element products A %*% Bproduct of A and B matrices t(A)transpose of matrix A solve(A)inverse of matrix A cbind()forms matrices by binding together matrices horizontally, or column-wise rbind()forms matrices by binding together matrices vertically, or row-wise

> A.mat <- matrix(c(19,8,11,2,18,17,15,19,10),nrow=3) > A.mat [,1] [,2] [,3] [1,] 19 2 15 [2,] 8 18 19 [3,] 11 17 10 > inv.A <- solve(A.mat) # inverse of matrix A.mat > t(A.mat) # transpose of matrix A.mat > A.mat %*% inv.A Basic operations – Matrix

> a=matrix(1:9,nrow=3) > b=matrix(2:10, nrow=3) > a [,1] [,2] [,3] [1,] 1 4 7 [2,] 2 5 8 [3,] 3 6 9 > b [,1] [,2] [,3] [1,] 2 5 8 [2,] 3 6 9 [3,] 4 7 10 > cbind(a,b) [,1] [,2] [,3] [,4] [,5] [,6] [1,] 1 4 7 2 5 8 [2,] 2 5 8 3 6 9 [3,] 3 6 9 4 7 10 > rbind(a,b) [,1] [,2] [,3] [1,] 1 4 7 [2,] 2 5 8 [3,] 3 6 9 [4,] 2 5 8 [5,] 3 6 9 [6,] 4 7 10 Basic operations – Matrix Cov.matrix = cov(b)Cor.matrix = cor(b) Row.mean = apply(b, 1, mean)Col.mean = apply(b, 2, mean) NOTE: apply(X, MARGIN, FUN)

vector: an ordered collection of data of the same type. > a = c(7,5,1) > a[2] [1] 5 list: an ordered collection of data of arbitrary types. > a = list(Name="Rahim",age=c(12, 23,10), Married = F) > a $Name [1] "Rahim" $age [1] 12 23 10 $Married [1] FALSE  Typically, vector elements are accessed by their index (an integer), list elements by their name (a character string). List

Data frames  Data frame is supposed to represent the typical data table that researchers come up with – like a spreadsheet.  It is a rectangular table with rows and columns with same length; data within each column has the same type (e.g. number, text, logical), but different columns may have different types. Example: > a localisation tumorsize progress 1 proximal 6.3 FALSE 2 distal 8.0 TRUE 3 proximal 10.0 FALSE

We illustrate how to construct a data frame from the following car data. MakeModelCylinderWeightMileageType HondaCivicV4217033Sporty Chevrolet BerettaV4265526Compact FordEscortV4234533Small EagleSummitV4256033Small VolkswagenJettaV4233026Small BuickLe SabreV6332523Large MitsubishiGalantV4274525Compact DodgeGrand CaravanV6373518Van ChryslerNew YorkerV6345022Medium AcuraLegendV6326520Medium Making data frames

> Make <- c("Honda","Chevrolet","Ford","Eagle","Volkswagen","Buick","Mitsbusihi", + "Dodge","Chrysler","Acura") > Model <- c("Civic","Beretta","Escort","Summit","Jetta","Le Sabre","Galant", + "Grand Caravan","New Yorker","Legend") > Cylinder <-c (rep("V4",5),"V6","V4",rep("V6",3)) > Weight <- c(2170, 2655, 2345, 2560, 2330, 3325, 2745, 3735, 3450, 3265) > Mileage <- c(33, 26, 33, 33, 26, 23, 25, 18, 22, 20) > Type <- c("Sporty","Compact",rep("Small",3),"Large","Compact","Van", + rep("Medium",2))

Now data.frame() function combines the six vectors into a single data frame. > Car Car MakeModelCylinderWeightMileageType 1 HondaCivicV4217033Sporty 2 Chevrolet BerettaV4265526Compact 3 FordEscortV4234533Small 4 EagleSummitV4256033Small 5 VolkswagenJettaV4233026Small 6 BuickLe SabreV6332523Large 7 MitsubishiGalantV4274525Compact 8 DodgeGrand CaravanV6373518Van 9 ChryslerNew YorkerV6345022Medium 10 AcuraLegendV6326520Medium Making data frames

> names(Car) [1] "Make" "Model" "Cylinder“ "Weight" "Mileage" "Type" > Car[1,] Make Model Cylinder Weight Mileage Type 1 Honda Civic V4 2170 33 Sporty > Car[10,4] [1] 3265 > Car$Mileage [1] 33 26 33 33 26 23 25 18 22 20 > mean(Car$Mileage) #average mileage of the 10 vehicles [1] 25.9 > min(Car$Weight) [1] 2170 Making data frames

> table(Car$Type) # gives a frequency table Compact Large Medium Small Sporty Van 2 1 2 3 1 1 > table(Car$Make, Car$Type) # Cross tabulation Compact Large Medium Small Sporty Van Acura 0 0 1 0 0 0 Buick 0 1 0 0 0 0 Chevrolet 1 0 0 0 0 0 Chrysler 0 0 1 0 0 0 Dodge 0 0 0 0 0 1 Eagle 0 0 0 1 0 0 Ford 0 0 0 1 0 0 Honda 0 0 0 0 1 0 Mitsbusihi 1 0 0 0 0 0 Volkswagen 0 0 0 1 0 0 Making data frames

> Make.Small <- Car$Make[Car$Type == "Small"] > summary(Car$Mileage) # gives summary statistics Min. 1st Qu. Median Mean 3rd Qu. Max. 18.00 22.25 25.50 25.90 31.25 33.00 Making data frames

> b = data.frame(x=rnorm(10), y=rnorm(10), z=rnorm(10)) > b x y z 1 -1.7651180 0.462309932 0.09230914 2 -0.7340731 -1.681826091 0.66648791 3 -0.4968900 1.728658405 -0.68281664 4 -1.3217873 0.307030157 0.24192745 5 -0.2070019 0.003892192 1.19591807 6 -0.9633084 0.060328696 -1.40424843 7 -1.1323626 1.079521099 1.63552915 8 -0.7301976 -1.422012899 -0.16695860 9 0.2979073 0.528152338 0.65995778 10 -0.5759655 0.655296337 -0.39156127 > cor(b) x y z x 1.0000000000 0.0007151043 0.12151913 y 0.0007151043 1.0000000000 -0.05770153 z 0.1215191317 -0.0577015345 1.00000000 > apply(b,1,var) [1] 1.42472853 1.39573092 1.80047438 0.85041478 0.57226442 0.56454121 [7] 2.14379987 0.39516798 0.03357767 0.44098693 Making data frames

> b = data.frame(x=rnorm(10), y=rnorm(10), z=rnorm(10)) > b x y z 1 -1.7651180 0.462309932 0.09230914 2 -0.7340731 -1.681826091 0.66648791 3 -0.4968900 1.728658405 -0.68281664 4 -1.3217873 0.307030157 0.24192745 5 -0.2070019 0.003892192 1.19591807 6 -0.9633084 0.060328696 -1.40424843 7 -1.1323626 1.079521099 1.63552915 8 -0.7301976 -1.422012899 -0.16695860 9 0.2979073 0.528152338 0.65995778 10 -0.5759655 0.655296337 -0.39156127 attach(b) lm.D9 <- lm(y ~ x)# Regression of y on x lm.D90 <- lm(weight ~ group - 1) # omitting intercept anova(lm.D9) summary(lm.D9 Making data frames

Data Entry using Data Editor R has a Data Editor with spreadsheet-like interface. The interface quite useful for small data sets.  Suppose we want to construct a data frame based on following data RollBstat101Bstat102 47017880 47027565 47036070 47047268

 To do this – type > result <- data.frame(Roll=integer(0), Bstat101=numeric(0), Bstat102=numeric(0)) > result <- edit(result)  Then enter the data in the Data Editor and close Editor > result # To see the data > result <- edit(result) # To modify the data Data Entry using Data Editor

An entire data frame can be read directly with the read.table() function. # Reading data from Excel.csv File > data1 <- read.table(file= “d:/RFiles/data1.csv", header=T, sep=“,”) > data1 <- read.csv(file= “d:/RFiles/data1.csv", header=T ) > data1 # Reading data from text file data2 <- read.table(file= “d:/RFiles/data3.txt", header=T, sep=“\t” ) > data2 > attach(data1) > detach(data1) Reading data from File

Importing from other statistical systems Package foreign on cran provides import facilities for files produced by the following statistical software. > read.mtp # imports a `Minitab Portable Worksheet’ > read.xport # reads a file in SAS format > read.spss # reads files created by spss Package Rstreams on cran contain functions > readSfile # reads binary objects produced by S-PLUS > data.restore # reads S-PLUS data dumps (created by data.dump)

Thanks

Training on R For 3 rd and 4 th Year Honours Students, Dept. of Statistics, RU Empowered by Higher Education Quality Enhancement Project (HEQEP) Department.

Similar presentations

Presentation on theme: "Training on R For 3 rd and 4 th Year Honours Students, Dept. of Statistics, RU Empowered by Higher Education Quality Enhancement Project (HEQEP) Department."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Training on R For 3 rd and 4 th Year Honours Students, Dept. of Statistics, RU Empowered by Higher Education Quality Enhancement Project (HEQEP) Department.

Similar presentations

Presentation on theme: "Training on R For 3 rd and 4 th Year Honours Students, Dept. of Statistics, RU Empowered by Higher Education Quality Enhancement Project (HEQEP) Department."— Presentation transcript:

Similar presentations

About project

Feedback