Presentation is loading. Please wait.

Presentation is loading. Please wait.

Introduction to R 02.10.2017 Samal Dharmarathna.

Similar presentations


Presentation on theme: "Introduction to R 02.10.2017 Samal Dharmarathna."— Presentation transcript:

1 Introduction to R Samal Dharmarathna

2 Today’s Class; A brief introduction to programing language R Installation Objects Operators Generating & manipulating data Functions Plotting and slight touch to packages Elementary Analysis Representing data in useful manner Hands on R

3 What is R? R is a language and environment for statistical computing and graphics, is similar to the S language and environment. R provides a wide variety of statistical (linear and nonlinear modelling, classical statistical tests, time-series analysis, classification, clustering, …) and graphical techniques, and is highly extensible. The S language is often the vehicle of choice for research in statistical methodology, and R provides an Open Source route to participation in that activity. Well-designed publication-quality plots can be produced, including mathematical symbols and formulae R is available as a Free Software Source:

4 Why R is interesting? R is an interpreted language, not a compiled one, meaning that all commands typed on the keyboard are directly executed without requiring to build a complete program like in most computer languages (C, Fortran, Java, ). R's syntax is very simple and intuitive. R manual R packages Very active R community Stackoverflow R-bloggers etc.

5 What is Rstudio? RStudio is a set of integrated tools designed to help you be more productive with R. It includes; A console Syntax-highlighting editor that supports direct code execution Variety of robust tools for plotting, viewing history, debugging and managing your workspace. For more features on Rstudio;

6 Installing R and RStudio
Download R from the following link (windows, Mac, Linux) Download RStudio from the following link (windows, Mac, Linux)

7 Creating Objects Simple calculations Some objects
[1] 15 > 25 -> b > b [1] 25 > c <- a + b > c [1] 40 > (10 + 5)*5 [1] 75 Identifies lowercase and UPPERCASE Objects in the memory > ls() [1] "a" "b" "c" "y" "Y" > y <- 5 > Y <- 50 > y [1] 5 > Y [1] 50 Clear objects in the memory rm(list = ls(all=TRUE))

8 Creating Objects All objects have two intrinsic attributes Mode Length
The basic type of the elements of the object. There are four main modes as numeric, character, complex, and logical (FALSE or TRUE). Length The number of elements of the object.

9 Mode and Length > q <- "Hello" > s <- TRUE > p <- 7
> mode(p) [1] "numeric" > length(p) [1] 1 > q <- "Hello" > mode(q) [1] "character" > length(q) [1] 1 > s <- TRUE > mode(s) [1] "logical" > length(s) [1] 1 > t <- 5i > mode(t) [1] "complex" > length(t) [1] 1 > w <- (1:5) > w [1] > mode(w) [1] "numeric" > length(w) [1] 5

10 Overview of the type of objects representing data
Modes Possibility of several modes in same object vector Numeric, character, complex, logical No factor Numeric, character array matrix data frame Yes ts list Source: R for Beginners by Emmanuel Paradis

11 Reading and writing data in a file
read.table(file, header = FALSE, sep = "") file : name of the file, possibly with its path, or a remote access to a file of type URL ( header : a logical (FALSE or TRUE) indicating if the file contains the names of the variables on its first line the field separator used in the file, for instance sep=“\t" if it is a tabulation write.table(x, file = "", append = FALSE, quote = TRUE, sep= " ", row.names = TRUE, col.names= TRUE) x : the name of the object to be written file : the name of the file append : if TRUE, adds the data without erasing those possibly existing in the file

12 Reading and writing data in a file cont’d….
write.table(x, file = "", append = FALSE, quote = TRUE, sep= " ", row.names = TRUE, col.names= TRUE) quote : a logical or a numeric vector; if TRUE the variables of mode character and the factors are written within “ ", otherwise the numeric vector indicates the numbers of the variables to write within “ " (in both cases the names of the variables are written within “ " but not if quote = FALSE) sep : the field separator used in the file row.names : a logical indicating whether the names of the rows are written in the file col.names : same for the names of the columns

13 Operators Source: R for Beginners by Emmanuel Paradis

14 Generating Data Regular Sequences A regular sequence of integers
> x <- 1:20 > x [1] The operator ‘:’ has priority on the arithmetic operators within an expression: > 1:(10-2) [1] > 1:10-2 [1]

15 Generating Data cont’d….
Random Sequences It is useful in statistics to be able to generate random data, and R can do it for a large number of probability density functions. These functions are of the form rfunc(n, p1, p2, ...), where func indicates the probability distribution, n the number of data generated, and p1, p2, are the values of the parameters of the distribution. > rnorm(10, mean = 0, sd = 1)

16 Accessing the values of an object: the indexing system
The indexing system is an efficient and flexible way to access selectively the elements of an object. > z <- 1:10 > z [1] > z[5] [1] 5 > z[7] <- 70 [1] > m <- matrix(1:6, 2, 3) > m [,1] [,2] [,3] [1,] [2,] > m[,3] [1] 5 6 > m[2,] [1] 2 4 6

17 Arithmetic's and simple functions
There are numerous functions in R to manipulate data. The simplest one c, concatenates the objects listed in parentheses. > c(1:5, seq(10,11,0.2)) [1] > p <- 1:4 > p [1] > q <- rep(1,4) > q [1] > r <- p+q > r [1] > p <- 1:4 > p [1] > s <- 10 > s [1] 10 > t <- p*s > t [1]

18 Matrix computation R has facilities for matrix computation and manipulation. The functions rbind and cbind bind matrices with respect to the rows and the columns, respectively. > m1 <- matrix(1, nrow = 2, ncol = 2) > m1 [,1] [,2] [1,] [2,] > m2 <- matrix(2, nrow = 2, ncol = 2) > m2 [1,] [2,] > rbind(m1,m2) [,1] [,2] [1,] [2,] [3,] [4,] > cbind(m1,m2) [,1] [,2] [,3] [,4] [1,] [2,]

19 The operator for the product of two matrices is ‘%*%’
> m3 <- rbind(m1,m2) %*% cbind(m1,m2) > m3 [,1] [,2] [,3] [,4] [1,] [2,] [3,] [4,] The diagonal; > diag(m3) [1] The transposition; > m5 <- matrix(1:8, nrow = 2, ncol = 4) > m5 [,1] [,2] [,3] [,4] [1,] [2,] > t (m5) [,1] [,2] [1,] [2,] [3,] [4,] > m4 <- cbind(m1,m2) %*% rbind(m1,m2) > m4 [,1] [,2] [1,] [2,]

20 Plotting with R R offers a remarkable variety of graphics.
Each graphical function has a large number of options for making the production of graphics very flexible. There are two kinds of graphical functions: the high-level plotting functions which create a new graph, and the low-level plotting functions which add elements to an existing graph. The plots can save as image or pdf files. > hist(y) (histogram of the frequencies of y)

21 R Packages A list of packages are distributed with base installation of R In addition to the default packages, a number of contributed packages are available Available on CRAN web site

22 Hands on R and RStudio

23 Let’s Install R and RStudio
Download R from the following link (windows, Mac, Linux) Download RStudio from the following link (windows, Mac, Linux)

24 Create Some Objects Simple calculations Some objects
[1] 25 > 35 -> b > b [1] 35 > c <- a + b > c [1] 60 > (20 + 5)*4 [1] 100 Identifies lowercase and UPPERCASE Objects in the memory > ls() [1] "a" "b" "c" "y" "Y" > y <- 10 > Y <- 30 > y [1] 10 > Y [1] 30 Clear objects in the memory rm(list = ls(all=TRUE))

25 Percentiles and Quartiles
> w <- seq(1,15,2) > w [1] > quantile(w) 0% 25% 50% 75% 100% > quantile(w, c(0.3,0.6,0.9)) 30% 60% 90%

26 Mean and Median > z <- c(seq(1,9,2),seq(10,16,2)) > z
[1] > summary(z) Min. 1st Qu. Median Mean 3rd Qu. Max. > median(z) [1] 9 > mean(z) [1]

27 Simple Functions > f1 <- function(x,y){ x+y } > f1(5,6)
[1] 11 > f2 <- function(x,y){ x*y } > f2(5,6) [1] 30

28 Functions with multiple tasks
> f3 <- function(x,y){ z1 <- 2*x + y z2 <- x + 2*y z3 <- 2*x + 2*y z4 <- x/y return(c(z1,z2,z3,z4)) } > f3(1,2) [1]

29 List indices (double square brackets)
> f4 <- function(x,y){ z5 <- x + y z6 <- x + 2*y list(z5, z6) } f4(2,5) #Answer will list both z5 & z6 > f4(2,5) [[1]] [1] 7 [[2]] [1] 12

30 > f4(2,5)[[1]] #Only display z5
[1] 7 > f4(2,5)[[2]] #Only display z6 [1] 12

31 Elementary Analysis of a data set …..Some tips
No. of person trips in different modes (as a percentage) A B C D E F G

32 Observations: Mean:…………… Median:…………. Mean:…………… Median:…………. Conclusions:

33 Mode distribution for different purposes
A B C D E F G H I Purpose Percentage % Observations:…………………… Conclusions:…………………….

34 Analysis of mode sharing for different purposes
a. Purpose A b. Purpose B Observations:…………………… Conclusions:…………………….


Download ppt "Introduction to R 02.10.2017 Samal Dharmarathna."

Similar presentations


Ads by Google