Presentation is loading. Please wait.

Presentation is loading. Please wait.

Introduce to R chris.

Similar presentations


Presentation on theme: "Introduce to R chris."— Presentation transcript:

1 Introduce to R chris

2 outlines Overview Fundamental Usage Data Structures Flow Control
Functions and operators Demo

3 Overview What is R, and why should I use it? Scripting Language
Simpler, easier to debug than compiled languages (i.e. Java, Pascal, Fortran). Evaluation from S Language 1970’s: John Bell Labs 1990’s: Ross Ihaka and Robert Gentleman R is a Free OpenSource version of S/S-Plus.

4 R User Interface Batch or command line processing Graphics windows
bash$ R to start R> q() to quit Graphics windows > X11() > postscript() > dev.off() File path is relative to working directory > getwd() > setwd() Load a package library with library() GUIs, tcltk

5 why should I use it? Advantages Easily carryout statistical analyses.
Good interacting with huge data set. Generate sophisticated graphs. With an extensive set of build-in functions.

6 Statistical Computing Environment
R Statistical techniques: Linear and Non-linear Modeling Classical Statistical Tests Time-Series Analysis Classification Clustering And Numerous Graphing Techniques Etc…

7 Graphical Environments
One of R’s strengths is the ease with which well-designed publication quality plot can be produced. Mathematical symbols/formulas can be put where needed in plots. Full Control of Design in Graphics if defaults are not satisfactory.

8 Easily Extendable R can be easy extended via packages available at Comprehensive R Archive Network (CRAN). Packages cover a wide range of modern statistics

9 Overview Summary Open source and open development.
Design and deployment of portable, extensible, and scalable software. Interoperability with other languages: C, XML. Variety of statistical and numerical methods. High quality visualization and graphics tools. Effective, extensible user interface. Innovative tools for producing documentation and training materials: vignettes. Supports the creation, testing, and distribution of software and data modules: packages.

10 Sold? Simple to Learn Powerful Free (No per seat fees)
Publication Ready Graphics Open Source Easy transition to S-Plus

11 Fundamental Usage R as a Calculator R as a Graphics Tool Variables
Basic (Atomic) Data Types Missing Values

12 R as a Calculator > log2(32) [1] 5 > print(sqrt(2)) [1] 1.414214
> pi [1] > seq(0, 5, length=6) [1] > 1+1:10 [1]

13 R as a Graphics Tool > plot(sin(seq(0, 2*pi, length=100)))

14 Variables > a <- 49 > sqrt(a) [1] 7
> b <- "The dog ate my homework" > sub("dog","cat",b) [1] "The cat ate my homework" > c <- (1+1==3) > c [1] FALSE > as.character(b) [1] "FALSE"

15 Basic (Atomic) Data Types
Logical > x <- T; y <- F > x; y [1] TRUE [1] FALSE Numerical > a <- 5; b <- sqrt(2) > a; b [1] 5 [1] Character > a <- "1"; b <- 1 > a; b [1] "1" [1] 1 > a <- "character" > b <- "a"; c <- a > a; b; c [1] "character" [1] "a"

16 Missing Values > NA | TRUE [1] TRUE > NA & TRUE [1] NA
Variables of each data type (numeric, character, logical) can also take the value NA: not available. o NA is not the same as 0 o NA is not the same as “” o NA is not the same as FALSE o NA is not the same as NULL Operations that involve NA may or may not produce NA: > NA==1 [1] NA > 1+NA > max(c(NA, 4, 7)) > max(c(NA, 4, 7), na.rm=T) [1] 7 > NA | TRUE [1] TRUE > NA & TRUE [1] NA

17 Objects Mode Length Attributes Class Atomic mode:
logical, numeric, complex or character list, graphics, function, expression, call .. Length Atomic vector: number of elements matrix, array: product of dimensions Recursive list: number of components data frame: number of columns Attributes subordinate names variable names within adata frame Class allow for an object oriented style of programming

18 Vectors vector: an ordered collection of data of the same type
> a <- c(1,2,3) > a*2 [1] 2 4 6 Example: the mean spot intensities of all spots on a microarray is a numeric vector In R, a single number is the special case of a vector with 1 element. Other vector types: character strings, logical

19

20

21

22 Vectors Vector: Ordered collection of data of the same data type
> x <- c(5.2, 1.7, 6.3) > log(x) [1] > y <- 1:5 > z <- seq(1, 1.4, by = 0.1) > y + z [1] > length(y) [1] 5 > mean(y + z) [1] 4.2

23 Vectors > Mydata <- c(2,3.5,-0.2) # Vector (c=“concatenate”)
> Colors <- c("Red","Green","Red") # Character vector > x1 <- 25:30 > x1 [1] # Number sequences > Colors[2] [1] "Green" # One element > x1[3:5] [1] # Various elements

24 Operation on Vector Elements
Test on the elements Extract the positive elements Remove elements > Mydata [1] > Mydata > 0 [1] TRUE TRUE FALSE > Mydata[Mydata>0] [1] 2 3.5 > Mydata[-c(1,3)] [1] 3.5

25 Matrices and Arrays matrix: rectangular table of data of the same type
Example: the expression values for genes for 30 tissue biopsies is a numeric matrix with rows and 30 columns. array: 3-,4-,..dimensional matrix Example: the red and green foreground and background values for spots on 120 arrays is a 4 x x 120 (3D) array.

26 Matrix a matrix is a vector with an additional attribute (dim) that defines the number of columns and rows only one mode (numeric, character, complex, or logical) allowed can be created using matrix() x<-matrix(data=0,nr=2,nc=2) or x<-matrix(0,2,2)

27 Matrices > m <- matrix(1:12, 4, byrow = T); m
Matrix: Rectangular table of data of the same type > m <- matrix(1:12, 4, byrow = T); m [,1] [,2] [,3] [1,] [2,] [3,] [4,]

28 Matrices > y <- -1:2 > m.new <- m + y > t(m.new)
[,1] [,2] [,3] [,4] [1,] [2,] [3,] > dim(m) [1] 4 3 > dim(t(m.new)) [1] 3 4

29 Matrices Matrix: Rectangular table of data of the same type
> x <- c(3,-1,2,0,-3,6) > x.mat <- matrix(x,ncol=2) # Matrix with 2 cols > x.mat [,1] [,2] [1,] [2,] [3,] > x.mat <- matrix(x,ncol=2, byrow=T) # By row creation [1,] [2,] [3,]

30 Dealing with Matrices > x.mat[,2] # 2nd col [1] -1 0 6
> x.mat[c(1,3),] # 1st and 3rd lines [,1] [,2] [1,] [2,] > x.mat[-2,] # No 2nd line

31 Dealing with Matrices [,1] [,2] [,3] > dim(x.mat) # Dimension
[1] 3 2 > t(x.mat) # Transpose [,1] [,2] [,3] [1,] [2,] > x.mat %*% t(x.mat) # %*% Multiplication (inner product) [1,] [2,] [3,] > solve() # Inverse of a square matrix > eigen() # Eigenvectors and eigenvalues

32 Generate a Matrix > xmat<-matrix(1:12,nrow=3,byrow=T) > xmat
[,1] [,2] [,3] [,4] [1,] [2,] [3,] > length(xmat) [1] 12 > dim(xmat) [1] 3 4 > mode(xmat) [1] "numeric" > names(xmat) NULL > dimnames(xmat)

33 Generate a Matrix > dimnames(xmat)<-list(c("A","B","C"), c("W","X","Y","Z")) > dimnames(xmat) [[1]] [1] "A" "B" "C" [[2]] [1] "W" "X" "Y" "Z" > xmat W X Y Z A B C

34 Generate a Matrix > matrix(0,3,3) [,1] [,2] [,3] [1,] 0 0 0
[1,] [2,] [3,]

35 Diagonal Element of a Matrix
> m <- matrix(1:12, 4, byrow = T) > m [,1] [,2] [,3] [1,] [2,] [3,] [4,] > diag(m) [1] 1 5 9

36 Inverse of Matrices > m<-matrix(c(1,3,5,,9,11,13,15,19,21),3,byrow=T) > m [,1] [,2] [,3] [1,] [2,] [3,] > solve(m) [,1] [,2] [,3] [1,] [2,] [3,]

37 rbind() & cbind() > x<-c(1,2,3) > y<-matrix(0,3,3)
> rbind(y,x) [,1] [,2] [,3] x > cbind(y,x) x [1,] [2,] [3,]

38 Array Arrays are generalized matrices by extending the function dim() to mor thantwo dimensions. > xarr<-array(c(1:8,11:18,111:118),dim=c(2,4,3)) # row, col, array > xarr , , 1 [,1] [,2] [,3] [,4] [1,] [2,] , , 2 [,1] [,2] [,3] [,4] [1,] [2,] , , 3 [,1] [,2] [,3] [,4] [1,] [2,]

39 Lists list: ordered collection of data of arbitrary types. Example:
> doe <- list(name="john",age=28,married=F) > doe$name [1] "john“ > doe$age [1] 28 > doe[[3]] [1] FALSE Typically, vector elements are accessed by their index (an integer) and list elements by $name (a character string). But both types support both access methods. Slots are accessed

40 Lists Are Very Flexible
> my.list <- list(c(5,4,-1),c("X1","X2","X3")) > my.list [[1]]: [1] [[2]]: [1] "X1" "X2" "X3" > my.list[[1]] > my.list <- list(c1=c(5,4,-1),c2=c("X1","X2","X3")) > my.list$c2[2:3] [1] "X2" "X3“

41 More Lists > x.mat [,1] [,2] [1,] 3 -1 [2,] 2 0 [3,] -3 6
[1,] [2,] [3,] > dimnames(x.mat) <- list(c("L1","L2","L3"), c("R1","R2")) R1 R2 L L L

42 Data Frames L<-LETTERS[1:4] #A B C D x<-1:4 #1 2 3 4
data frame: rectangular table with rows and columns; data within each column has the same type (e.g. number, text, logical), but different columns may have different types. Represents the typical data table that researchers come up with – like a spreadsheet. Example1: > a <-data.frame(localization,tumorsize,progress,row.names=patients) > a localization tumorsize progress XX proximal FALSE XX distal TRUE XX proximal FALSE Example2: L<-LETTERS[1:4] #A B C D x<-1: # data.frame(x,L) #create data frame

43 What type is my data? Class from which object inherits
(vector, matrix, function, logical, list, … ) mode Numeric, character, logical, … storage.mode typeof Mode used by R to store object (double, integer, character, logical, …) is.function Logical (TRUE if function) is.na Logical (TRUE if missing) names Names associated with object dimnames Names for each dim of array slotNames Names of slots of BioC objects attributes Names, class, etc.

44 Branching if (logical expression) { statements } else {
alternative statements else branch is optional { } are optional with one statement ifelse (logical expression, yes statement, no statement)

45 Loops When the same or similar tasks need to be performed multiple times; for all elements of a list; for all columns of an array; etc. for(i in 1:10) { print(i*i) } i<-1 while(i<=10) { i<-i+sqrt(i) j <-1 repeat { print(j) j <- j + j/3 if (j > 10) break Also: repeat, break, next

46 Regular Expressions Tools for text matching and replacement which are available in similar forms in many programming languages (Perl, Unix shells, Java) > a <- c("CENP-F","Ly-9", "MLN50", "ZNF191", "CLH-17") > grep("L", a) [1] 2 3 5 > grep("L", a, value=T) [1] "Ly-9" "MLN50" "CLH-17" > grep("^L", a, value=T) [1] "Ly-9" > grep("[0-9]", a, value=T) [1] "Ly-9" "MLN50" "ZNF191" "CLH-17" > gsub("[0-9]", "X", a) [1] "CENP-F" "Ly-X" "MLNXX" "ZNFXXX" "CLH-XX"

47 Storing Data Every R object can be stored into and restored from a file with the commands “save” and “load”. This uses the XDR (external data representation) standard of Sun Microsystems and others, and is portable between MS-Windows, Unix, Mac. > save(x, file=“x.Rdata”) > load(“x.Rdata”)

48 Importing and Exporting Data
There are many ways to get data in and out. Most programs (e.g. Excel), as well as humans, know how to deal with rectangular tables in the form of tab-delimited text files. > x <- read.delim(“filename.txt”) Also: read.table, read.csv, scan > write.table(x, file=“x.txt”, sep=“\t”) Also: write.matrix, write

49 Functions and Operators
Functions do things with data “Input”: function arguments (0,1,2,…) “Output”: function result (exactly one) Example: add <- function(a,b) { result <- a+b return(result) } Operators: Short-cut writing for frequently used functions of one or two arguments.

50 Frequently used operators
<- Assign + Sum - Difference * Multiplication / Division ^ Exponent %% Mod %*% Dot product %/% Integer division %in% Subset | Or & And < Less > Greater <= Less or = >= Greater or = ! Not != Not equal == Is equal

51 Frequently used functions
Concatenate cbind,rbind Concatenate vectors min Minimum max Maximum length # values dim # rows, cols floor Max integer in which TRUE indices table Counts summary Generic stats Sort, order, rank Sort, order, rank a vector print Show value cat Print as char paste c() as char round Round apply Repeat over rows, cols

52 Statistical functions
rnorm, dnorm, pnorm, qnorm Normal distribution random sample, density, cdf and quantiles lm, glm, anova Model fitting loess, lowess Smooth curve fitting sample Resampling (bootstrap, permutation) .Random.seed Random number generation mean, median Location statistics var, cor, cov, mad, range Scale statistics svd, qr, chol, eigen Linear algebra

53 Graphical functions plot Generic plot eg: scatter points Add points
lines, abline Add lines text, mtext Add text legend Add a legend axis Add axes box Add box around all axes par Plotting parameters (lots!) colors, palette Use colors

54 Useful Functions > seq(2,12,by=2) [1] 2 4 6 8 10 12
> seq(4,5,length=5) [1] > rep(4,10) [1] > paste("V",1:5,sep="") [1] "V1" "V2" "V3" "V4" "V5" > LETTERS[1:7] [1] "A" "B" "C" "D" "E" "F" "G"

55 Vector Functions And also: min() max() cummin() cummax() range()
> vec <- c(5,4,6,11,14,19) > sum(vec) [1] 59 > prod(vec) [1] > mean(vec) [1] > median(vec) [1] 8.5 > var(vec) [1] > sd(vec) [1] > summary(vec) Min. 1st Qu. Median Mean 3rd Qu. Max. And also: min() max() cummin() cummax() range()

56 "apply" and Its Relatives (lapply, sapply)
Often we want to repeat the same function on all the rows or columns of a matrix, or all the elements of a list. We could do this in a loop, but R has a more efficient operator the apply function

57 Applying a Function to the Rows or Columns of a Matrix
If mat is a matrix and fun is a function (such as mean, var, lm ...) that takes a vector as its arguments, then apply(mat,1,fun) applies fun to each row of mat apply(mat,2,fun) applies fun to each column of mat In either case, the output is a vector.

58 apply apply( arr, margin, fct )
Apply the function fct along some dimensions of the array arr, according to margin, and return a vector or array of the appropriate size. > x [,1] [,2] [,3] [1,] [2,] [3,] [4,] > apply(x, 1, sum) [1] > apply(x, 2, sum) [1]

59 lapply When the same or similar tasks need to be performed multiple times for all elements of a list or for all columns of an array. May be easier and faster than “for” loops lapply(li, function ) To each element of the list li, the function function is applied. The result is a list whose elements are the individual function results. > li = list("klaus","martin","georg") > lapply(li, toupper) > [[1]] > [1] "KLAUS" > [[2]] > [1] "MARTIN" > [[3]] > [1] "GEORG"

60 Relatives of “apply” lapply(list,fun) applies the function to every element of list tapply(x,factor,fun) uses the factor to split x into groups, and then applies fun to each group


Download ppt "Introduce to R chris."

Similar presentations


Ads by Google