Presentation is loading. Please wait.

Presentation is loading. Please wait.

Introduction to R Workshop By Stephen O. Opiyo MCIC-Computational Biology Laboratory Day 1.

Similar presentations


Presentation on theme: "Introduction to R Workshop By Stephen O. Opiyo MCIC-Computational Biology Laboratory Day 1."— Presentation transcript:

1 Introduction to R Workshop By Stephen O. Opiyo MCIC-Computational Biology Laboratory Day 1

2 Outline R workshop outline (http://kh- tps.osu.edu/Routline.pdf) Introduction to R (Day 1) Graphics (data visualization) in R (Day 2) Basic Statistics using R (Day 2)

3 Getting Started Assume that R for Windows and Macs already installed on your laptop. (Instructions for installations given at the webpage below) http://kh-tps.osu.edu/RWindows.pdf http://kh-tps.osu.edu/RMac.pdf Open R on your Windows or Mac

4 R on Windows

5 R on MACs

6 History of R Idea of R came from S developed at Bell Labs since 1976 S intended to support research and data analysis projects S to S-Plus licensed to Insightful (“S-Plus”) R: Open source platform similar to S developed by Robert Gentleman and Ross Ihaka (U of Auckland, NZ) during the 1990s. Since 1997: international “R-core” developing team Updated versions available every two months http://www.r-project.org/

7 What is R? Data handling and storage: numeric, textual Matrix algebra Tables and regular expressions Graphics Data analysis

8 R is not –a database –a collection of “black boxes” –a spreadsheet software package –commercially supported

9 Useful reading materials R for Beginners http://cran.r-project.org/doc/contrib/Paradis-rdebuts_en.pdf An Introduction to R” by Longhow Lam http://cran.r-project.org/doc/contrib/Lam-IntroductionToR_LHL.pdf Practical Regression and Anova using R http://cran.r-project.org/doc/contrib/Faraway-PRA.pdf An R companion to ‘Experimental Design http://cran.r-project.org/doc/contrib/Vikneswaran-ED_companion.pdf The R Guide http://cran.r-project.org/doc/contrib/Owen-TheRGuide.pdf R for Biologists http://cran.r-project.org/doc/contrib/Martinez-RforBiologistv1.1.pdf

10 Useful reading materials Multilevel Modeling in R http://cran.r-project.org/doc/contrib/Bliese_Multilevel.pdf R reference cards http://cran.r-project.org/doc/contrib/refcard.pdf http://cran.r-project.org/doc/contrib/Short-refcard.pdf http://cran.r-project.org/doc/contrib/Baggott-refcard-v2.pdf R reference card data mining http://cran.r-project.org/doc/contrib/Short-refcard.pdf RStudio - Documentation http://www.rstudio.com/ide/docs/

11 Useful books Learning Rstudio for R Statistical Computing by Mark Van Der Loo, Edwin De Jonge Paperback, 126 pages Published December 25th 2012 by Packt Publishing ISBN 1782160604 (ISBN13: 9781782160601) Getting Started with RStudio By: John Verzani Publisher: O'Reilly Media, Inc. Pub. Date: September 22, 2011 Print ISBN-13: 978-1-4493-0903-9 R Graphics Cookbook by Winston Chang (Jan 3, 2013) R For Dummies by Meys, Joris, de Vries The R Book by Michael J. Crawley

12 RStudio

13 RStudio is a free open source integrated development environment for R (http://www.rstudio.com/ide/) Assume RStudio for Windows and Macs already installed on your laptop. (Instructions for installations given at the webpage below) http://kh-tps.osu.edu/RSWindows.pdf http://kh-tps.osu.edu/RSMaC.pdf

14 Using RStudio Script editor View help, plots & files; manage packages View variables in workspace and history file R console

15 Open RStudio and set up a working directory

16 Datasets and R scripts Save all the files to R_workshop directory - Day 1 R files (D1_Example_1.R to D1_Example_12.R ) = 12 files - Day 1 data files (D1_Data_1.csv or txt to D1_Data_2.csv or txt) = 4 files - Day 2 R files (D2_Example_1.R to D2_Example_3.R ) = 3 files - Day 2 data files (D1_Data_1.csv to D1_Data_3.csv) = 3 files

17 R: Session management

18 R: session management You can enter a command at the command prompt in a console (>). To quit R, use >q(). Result is stored in an object using the assignment operator: (<-) or the equal character (=). Test<-2 and Test=2 Test is an object with a value of 2 To print (show) the object just enter the name of the object Test

19 Naming object in R Object names cannot contain `strange' symbols like !, +, -, #. A dot (.) and an underscore (_) are allowed, also a name starting with a dot. Object names can contain a number but cannot start with a number.(E.g., Example_1, not 1Example_1) R is case sensitive, X and x are two different objects, as well as temp.1 and temP.1

20 R_Example_1 Example1<-c(1,2,3,4,5) Example1 1Example<-c(1,2,3,4,5) Example2<-c("Second", "Example") Example2 2Example<-c("Second", "Example") Temp.1<-c("Second", "Example") Temp.1

21 R: session management Your R objects are stored in a workspace Workspace is not saved on disk unless you tell R to do so. Objects are lost when you close R and not save the objects. R session are saved in a file.RData.

22 R: session management Just checking what the current working directory type getwd() To list the objects in your workspace: ls() To remove objects you no longer need: rm() To remove ALL objects in your workspace: rm(list=ls()) or use Remove all objects To save your workspace to a file, you may type save.image() or use Save Workspace… in the File menu The default workspace file is called.RData

23 Basic data types

24 Basic data types: Logical, numeric, and character, etc.

25 Basic data type (Example 2) Logical (True or False) Numerical (relating to numbers) Characters (letters, numerical digits, common punctuation marks (such as "." or "-”), etc.

26 Vectors and Matrices A vector –Ordered collection of data of the same type –Example: last names of all students in this workshop –In R, single number is a vector of length 1 A matrix –Rectangular table of data of the same type –Example: Mean intensities of all genes measured during a microarray experiment

27 Vectors (Example 3) Vector: Ordered collection of data of the same data type > x <- c(5.2, 1.7, 6.3) > log(x) [1] 1.6486586 0.5306283 1.8405496 > y <- 1:5 > z <- seq(1, 1.4, by = 0.1) > yz<-y + z [1] 2.0 3.1 4.2 5.3 6.4 > length(y) [1] 5 > M<-mean(y + z) [1] 4.2

28 Operation on vector elements (Example 4) Mydata <- c(2,3.5,-0.2) Vector (c=“concatenate”) > Mydata [1] 2 3.5 -0.2 > x4 0 [1] TRUE TRUE FALSE > x5 0] [1] 2 3.5 > x6<-Mydata[-c(1,3)] [1] 3.5 Test on the elements Extract the positive elements Remove elements 1 and 3

29 Operation on vector elements (Example 4) > Colors <- c("Red","Green","Red") Character vector > Colors[2] [1] "Green" > x1 <- 25:30 > x1 [1] 25 26 27 28 29 30 Number sequences > x2<-x1[3:5] [1] 27 28 29 Various elements 3 to 5 >x3<-x1[c(2,6)]Elements 2 and 6

30 > x <- c(5,-2,3,-7) > y <- c(1,2,3,4)*10Operation on all the elements > y [1] 10 20 30 40 > sort(x)Sorting a vector in ascending order [1] -7 -2 3 5 > sort(x, decreasing=TRUE) Sort in descending order [1] 5 3 -2 -7 > order(x) [1] 4 2 3 1Element order for sorting > y[order(x)] [1] 40 20 30 10Operation on all the components > rev(x)Reverse a vector [1] -7 3 -2 5 Operation on vector elements (Example 5)

31 Matrices (Example 6) Matrix: Rectangular table of data of the same type > m <- matrix(1:12) Create matrix using the “matrix function” m [,1] [1,] 1 [2,] 2 [3,] 3 [4,] 4 [5,] 5 [6,] 6 [7,] 7 [8,] 8 [9,] 9 [10,] 10 [11,] 11 [12,] 12 V<-1:12 V1<-c(1,2,3,4,5,6,7,8,9,10,11,12)

32 Matrices (Example 6) Matrix: Rectangular table of data of the same type > mr <- matrix(1:12, 4) four rows mr [,1] [,2] [,3] [1,] 1 5 9 [2,] 2 6 10 [3,] 3 7 11 [4,] 4 8 12

33 Matrices (Example 6) Matrix: Rectangular table of data of the same type > mm <- matrix(1:12, 4, byrow = T); m By row creation [,1] [,2] [,3] [1,] 1 2 3 [2,] 4 5 6 [3,] 7 8 9 [4,] 10 11 12 > tmm<-t(mm) t is transpose [,1] [,2] [,3] [,4] [1,] 1 4 7 10 [2,] 2 5 8 11 [3,] 3 6 9 12 > dim(mm) dim is dimension [1] 4 3four rows and three columns > dim(t(mm)) [1] 3 4 three rows and four columns

34 Matrices (Example 6) Matrix: Rectangular table of data of the same type > x <- c(3,-1,2,0,-3,6) > x.mat <- matrix(x,ncol=2) Matrix with 2 cols > x.mat [,1] [,2] [1,] 3 0 [2,] -1 -3 [3,] 2 6  x.matr <- matrix(x,ncol=2, byrow=T)By row creation > x.matr [,1] [,2] [1,] 3 -1 [2,] 2 0 [3,] -3 6

35 0peration on matrices (Example 6) > x.matr[,2] 2 nd col [1] -1 0 6 > x.matr[c(1,3),]1 st and 3 rd lines [,1] [,2] [1,] 3 -1 [2,] -3 6 > x.mat[-2,]remove second row (No 2 nd line) [,1] [,2] [1,] 3 -1 [2,] -3 6

36 R as a calculator (Example 7) In R, plus (+), minus (-), division(/), and multiplication (*) > 5 + (6 + 7) * pi^2 [1] 133.3049 > log(exp(1)) [1] 1 > sin(pi/3)^2 + cos(pi/3)^2 [1] 1 > Sin(pi/3)^2 + cos(pi/3)^2 Error: couldn't find function "Sin” Take few minutes to do any calculations

37 Lists and data frames

38 Lists A list is an ordered collection of data of arbitrary types. Lists don’t need to be of the same mode or type and they can be a numeric vector, a logical value and a function and so on. A component of a list can be referred as aa[[I]] or aa$x, where aa is the name of the list and x is a name of a component of aa.

39 Lists (Example 8) aa[[1]] is the first component of aa, while aa[1] is the sublist consisting of the first component of aa only. list: an ordered collection of data of arbitrary types. > doe = list(name="john", age=28, married=F) doe[[1]] [1] "john" > doe[1] $name [1] "john” > doe$name [1] "john“ > doe$age [1] 28 Typically, vector elements are accessed by their index (an integer), list elements by their name (a character string). But both types support both access methods.

40 Lists are very flexible (Example 8) > my.list <- list(c(5,4,-1),c("X1","X2","X3")) > my.list [[1]]: [1] 5 4 -1 [[2]]: [1] "X1" "X2" "X3" > my.list[[1]] [1] 5 4 -1 > my.list1 <- list(c1=c(5,4,-1),c2=c("X1","X2","X3")) > my.list1$c2[2:3] [1] "X2" "X3"

41 Lists: Session (Example 8) Empl <- list(employee=“Anna”, spouse=“Fred”, children=3, child.ages=c(4,7,9)) Empl[[4]] [1] 4 7 9 Empl$child.ages [1] 4 7 9 Empl[4] # a sublist consisting of the 4 th component of Empl $child.ages [1] 4 7 9 #letters is a function for a to z #toupper(letters) for capital A to Z #LETTERS for capital A to Z names(Empl) <- letters[1:4] names(Empl) [1] "a" "b" "c" "d" Empl1 <- c(Empl, service=8) $a [1] "Anna" $b [1] "Fred" $c [1] 3 $d [1] 4 7 9 $service [1] 8 unl<-unlist(Empl) # converts it to a vector. Mixed types will be converted to character, giving a character vector. unl a b c d1 d2 d3 service "Anna" "Fred" "3" "4" "7" "9" "8"

42 More lists (Example 8) > x <- c(3,-1,2,0,-3,6) > x.mat <- matrix(x,ncol=2) Matrix with 2 cols > x.mat [,1] [,2] [1,] 3 -1 [2,] 2 0 [3,] -3 6 > dimnames(x.mat) <- list(c("L1","L2","L3"), c("R1","R2")) > x.mat R1 R2 L1 3 -1 L2 2 0 L3 -3 6

43 Data frame Data frame: Rectangular table with rows and columns; data within each column has the same type (e.g. number, text, logical), but different columns may have different types. Example of a data frame with 10 rows and 3 columns

44 Importing and exporting data in R

45 Importing Data (Exercise 9) The easiest way to enter data in R is to work with a text file, in which the columns are separated by tabs; or comma-separated values (csv) files. Example of importing data are provided below. mydata <- read.table("D1_Data_1.csv", header=TRUE) mydata_correct <- read.table("D1_Data_1.csv", sep=”, ", header=TRUE) mydata <- read.csv("D1_Data_1.csv", header=TRUE) mydatab<- read.table("D1_Data_1.txt", header=TRUE) Mydatab_correct<- read.table("D1_Data_1.txt",, sep=”,\t", header=TRUE) mydatab<- read.delim("D1_Data_1.txt", header=TRUE) Import from Web URL Data_URL<-read.table(file=http://kh-tps.osu.edu/D1_Data_1.csv, header=TRUE, sep=“,”)http://kh-tps.osu.edu/D1_Data_1.csv Importing data in Rstudio using (Import Dataset) on the Workspace Importing data from Excel, SAS, SPSS see the link below http://www.statmethods.net/input/importingdata.html

46 Keyboard Input (Exercise 10) # create a data frame from scratch age <- c(25, 30, 56, 49, 12, 16, 60, 34, 45, 22) gender <- c("male", "female", "male","male", "female", "male","male", "female", "male","male") weight <- c(160, 110, 220, 100, 65, 120, 179, 134, 165, 153) mydata <- data.frame(age,gender,weight)

47 Viewing Data (Exercise 10) There are a number of functions for listing the contents of an object or dataset. # list the variables in mydata names(mydata) # list the structure of mydata str(mydata) # dimensions of an object dim(mydata)

48 Viewing Data (Example 10) # class of an object (numeric, matrix, dataframe, etc) class(mydata) # print mydata mydata # print first 6 rows of mydata head(mydata) # print first 2 rows of mydata head(mydata, n=2) print last 6 rows of mydata tail(mydata) # print last 2 rows of mydata tail(mydata, n=2)

49 Missing Data (Example 11) In R, missing values are represented by the symbol NA (not available). Impossible values (e.g., dividing by zero) are represented by the symbol NaN (not a number). Testing for Missing Values is.na(x) # returns TRUE of x is missing y <- c(1,2,3,NA) is.na(y) # returns a vector (F F F T) Excluding missing values from analyses Arithmetic functions on missing values yield missing values. x <- c(1,2,NA,3) mean(x) # returns NA mean(x, na.rm=TRUE) # returns 2

50 Exporting Data (Example 12) To A csv File write.table(mydata, "c: mydata.csv", sep=", ") To To A Tab Delimited Text File write.table(mydata, "c: mydata.csv”,, sep=“\t ") Exporting R objects into other formats. For SPSS, SAS and Stata. you will need to load the foreign packages. foreign For Excel, you will need the xlsReadWrite package.xlsReadWrite

51 Exercise

52 Exercise 1 1.Download/import data D1_Data_2.csv and name it Data_1. 2.Get the names of the variables in the Data_1. 3.2)Create a new data from the first four column of Data_1 and named it Data_2. 4.Get the names of the variable of Data_2. 5.What is the length of Data_2? 6.Remove the second column from Data_2 and create a new data called Data_3. 7.List the first six rows (lines) of Data_3. 8.List the last six rows of Data_3. 9.List the first 4 rows of Data_3. 10.Sort the third column of Data_3 and called it Data_4.


Download ppt "Introduction to R Workshop By Stephen O. Opiyo MCIC-Computational Biology Laboratory Day 1."

Similar presentations


Ads by Google