Introduce to R chris.

Slides:



Advertisements
Similar presentations
R for Macroecology Aarhus University, Spring 2011.
Advertisements

Training on R For 3 rd and 4 th Year Honours Students, Dept. of Statistics, RU Empowered by Higher Education Quality Enhancement Project (HEQEP) Department.
Introduction to Matlab
Introduction to GTECH 201 Session 13. What is R? Statistics package A GNU project based on the S language Statistical environment Graphics package Programming.
R for Research Data Analysis using R Day1: Basic R Baburao Kamble University of Nebraska-Lincoln.
CS Lecture 03 Outline Sed and awk from previous lecture Writing simple bash script Assignment 1 discussion 1CS 311 Operating SystemsLecture 03.
An introduction to R Honors 207 Cognitive Science (These Slides were Shamelessly Stolen from Dr. Pablo Gomez, DePaul University)
Alternative text for elementary statistics –Elementary Concepts –Basic Statistics.
Lecture 7 Sept 19, 11 Goals: two-dimensional arrays (continued) matrix operations circuit analysis using Matlab image processing – simple examples Chapter.
Lecture 6 Sept 15, 09 Goals: two-dimensional arrays matrix operations circuit analysis using Matlab image processing – simple examples.
R – a brief introduction Johannes Freudenberg Cincinnati Children’s Hospital Medical Center
Concatenation MATLAB lets you construct a new vector by concatenating other vectors: – A = [B C D... X Y Z] where the individual items in the brackets.
How to Use the R Programming Language for Statistical Analyses Part I: An Introduction to R Jennifer Urbano Blackford, Ph.D. Department of Psychiatry Kennedy.
What is R By: Wase Siddiqui. Introduction R is a programming language which is used for statistical computing and graphics. “R is a language and environment.
MATLAB and SimulinkLecture 11 To days Outline  Introduction  MATLAB Desktop  Basic Features  Branching Statements  Loops  Script file / Commando.
Introduction to the R language
Introduction to MATLAB Session 1 Prepared By: Dina El Kholy Ahmed Dalal Statistics Course – Biomedical Department -year 3.
Introduction to MATLAB
Data, graphics, and programming in R 28.1, 30.1, Daily:10:00-12:45 & 13:45-16:30 EXCEPT WED 4 th 9:00-11:45 & 12:45-15:30 Teacher: Anna Kuparinen.
Sébastien Lê Agrocampus Rennes A very short introduction to “R” The “Rcmdr” package and its environment.
Session 3: More features of R and the Central Limit Theorem Class web site: Statistics for Microarray Data Analysis.
Computational Methods of Scientific Programming Lecturers Thomas A Herring, Room A, Chris Hill, Room ,
Piotr Wolski Introduction to R. Topics What is R? Sample session How to install R? Minimum you have to know to work in R Data objects in R and how to.
Using the ‘R’ Language for Bioinformatics
Vectors and Matrices In MATLAB a vector can be defined as row vector or as a column vector. A vector of length n can be visualized as matrix of size 1xn.
Matlab Programming for Engineers Dr. Bashir NOURI Introduction to Matlab Matlab Basics Branching Statements Loops User Defined Functions Additional Data.
Installing R CRAN: –(R homepage: –Windows 95 and later  Base –rw2001.exe.
R Programming Yang, Yufei. Normal distribution.
R packages/libraries Data input/output Rachel Carroll Department of Public Health Sciences, MUSC Computing for Research I, Spring 2014.
Introduction to Programming in R Department of Statistical Sciences and Operations Research Computation Seminar Series Speaker: Edward Boone
STAT 534: Statistical Computing Hari Narayanan
© 2015 by Wade Rogers Introduction to R Cytomics Workshop December, 2015.
Postgraduate Computing Lectures PAW 1 PAW: Physicist Analysis Workstation What is PAW? –A tool to display and manipulate data. Learning PAW –See ref. in.
CIS 595 MATLAB First Impressions. MATLAB This introduction will give Some basic ideas Main advantages and drawbacks compared to other languages.
Lecture 11 Introduction to R and Accessing USGS Data from Web Services Jeffery S. Horsburgh Hydroinformatics Fall 2013 This work was funded by National.
1 Introduction to R A Language and Environment for Statistical Computing, Graphics & Bioinformatics Introduction to R Lecture 3
16BIT IITR Data Collection Module If you have not already done so, download and install R from download.
Introduction to R and Data Science Tools in the Microsoft Stack Jamey Johnston.
Introduction to R.
Programming in R Intro, data and programming structures
R programming language
Introduction to R Samal Dharmarathna.
Second Annual Cytomics Workshop April, 2017
Digital Text and Data Processing
Introduction to R.
Statistical Analysis with Excel
National Scientific Library at Tbilisi State University
INTRODUCTION TO BASIC MATLAB
MATLAB DENC 2533 ECADD LAB 9.
Statistical Analysis with Excel
R Programming Language R01
Introduction to MATLAB
Lab 1 Introductions to R Sean Potter.
R04: Basic Functions 林 建 甫 C.F. Jeff Lin, MD. PhD.
StatLab Matlab Workshop
Statistical Analysis with Excel
Use of Mathematics using Technology (Maltlab)
Matlab tutorial course
PHP.
Communication and Coding Theory Lab(CS491)
Basics of R, Ch Functions Help Managing your Objects
CSCI N317 Computation for Scientific Applications Unit R
Topics Introduction to Value-returning Functions: Generating Random Numbers Writing Your Own Value-Returning Functions The math Module Storing Functions.
Vectors and Matrices In MATLAB a vector can be defined as row vector or as a column vector. A vector of length n can be visualized as matrix of size 1xn.
Lecture 7 – Delivering Results with R
Simulation And Modeling
R Course 1st Lecture.
Matlab Training Session 2: Matrix Operations and Relational Operators
Data analysis with R and the tidyverse
R tutorial
Presentation transcript:

Introduce to R chris

outlines Overview Fundamental Usage Data Structures Flow Control Functions and operators Demo

Overview What is R, and why should I use it? Scripting Language Simpler, easier to debug than compiled languages (i.e. Java, Pascal, Fortran). Evaluation from S Language 1970’s: John Chambers @ Bell Labs 1990’s: Ross Ihaka and Robert Gentleman R is a Free OpenSource version of S/S-Plus.

R User Interface Batch or command line processing Graphics windows bash$ R to start R> q() to quit Graphics windows > X11() > postscript() > dev.off() File path is relative to working directory > getwd() > setwd() Load a package library with library() GUIs, tcltk

why should I use it? Advantages Easily carryout statistical analyses. Good interacting with huge data set. Generate sophisticated graphs. With an extensive set of build-in functions.

Statistical Computing Environment R Statistical techniques: Linear and Non-linear Modeling Classical Statistical Tests Time-Series Analysis Classification Clustering And Numerous Graphing Techniques Etc…

Graphical Environments One of R’s strengths is the ease with which well-designed publication quality plot can be produced. Mathematical symbols/formulas can be put where needed in plots. Full Control of Design in Graphics if defaults are not satisfactory. http://www.r-project.org/

Easily Extendable R can be easy extended via packages available at Comprehensive R Archive Network (CRAN). www.r-project.org Packages cover a wide range of modern statistics

Overview Summary Open source and open development. Design and deployment of portable, extensible, and scalable software. Interoperability with other languages: C, XML. Variety of statistical and numerical methods. High quality visualization and graphics tools. Effective, extensible user interface. Innovative tools for producing documentation and training materials: vignettes. Supports the creation, testing, and distribution of software and data modules: packages.

Sold? Simple to Learn Powerful Free (No per seat fees) Publication Ready Graphics Open Source Easy transition to S-Plus

Fundamental Usage R as a Calculator R as a Graphics Tool Variables Basic (Atomic) Data Types Missing Values

R as a Calculator > log2(32) [1] 5 > print(sqrt(2)) [1] 1.414214 > pi [1] 3.141593 > seq(0, 5, length=6) [1] 0 1 2 3 4 5 > 1+1:10 [1] 2 3 4 5 6 7 8 9 10 11

R as a Graphics Tool > plot(sin(seq(0, 2*pi, length=100)))

Variables > a <- 49 > sqrt(a) [1] 7 > b <- "The dog ate my homework" > sub("dog","cat",b) [1] "The cat ate my homework" > c <- (1+1==3) > c [1] FALSE > as.character(b) [1] "FALSE"

Basic (Atomic) Data Types Logical > x <- T; y <- F > x; y [1] TRUE [1] FALSE Numerical > a <- 5; b <- sqrt(2) > a; b [1] 5 [1] 1.414214 Character > a <- "1"; b <- 1 > a; b [1] "1" [1] 1 > a <- "character" > b <- "a"; c <- a > a; b; c [1] "character" [1] "a"

Missing Values > NA | TRUE [1] TRUE > NA & TRUE [1] NA Variables of each data type (numeric, character, logical) can also take the value NA: not available. o NA is not the same as 0 o NA is not the same as “” o NA is not the same as FALSE o NA is not the same as NULL Operations that involve NA may or may not produce NA: > NA==1 [1] NA > 1+NA > max(c(NA, 4, 7)) > max(c(NA, 4, 7), na.rm=T) [1] 7 > NA | TRUE [1] TRUE > NA & TRUE [1] NA

Objects Mode Length Attributes Class Atomic mode: logical, numeric, complex or character list, graphics, function, expression, call .. Length Atomic vector: number of elements matrix, array: product of dimensions Recursive list: number of components data frame: number of columns Attributes subordinate names variable names within adata frame Class allow for an object oriented style of programming

Vectors vector: an ordered collection of data of the same type > a <- c(1,2,3) > a*2 [1] 2 4 6 Example: the mean spot intensities of all 15488 spots on a microarray is a numeric vector In R, a single number is the special case of a vector with 1 element. Other vector types: character strings, logical

Vectors Vector: Ordered collection of data of the same data type > x <- c(5.2, 1.7, 6.3) > log(x) [1] 1.6486586 0.5306283 1.8405496 > y <- 1:5 > z <- seq(1, 1.4, by = 0.1) > y + z [1] 2.0 3.1 4.2 5.3 6.4 > length(y) [1] 5 > mean(y + z) [1] 4.2

Vectors > Mydata <- c(2,3.5,-0.2) # Vector (c=“concatenate”) > Colors <- c("Red","Green","Red") # Character vector > x1 <- 25:30 > x1 [1] 25 26 27 28 29 30 # Number sequences > Colors[2] [1] "Green" # One element > x1[3:5] [1] 27 28 29 # Various elements

Operation on Vector Elements Test on the elements Extract the positive elements Remove elements > Mydata [1] 2 3.5 -0.2 > Mydata > 0 [1] TRUE TRUE FALSE > Mydata[Mydata>0] [1] 2 3.5 > Mydata[-c(1,3)] [1] 3.5

Matrices and Arrays matrix: rectangular table of data of the same type Example: the expression values for 10000 genes for 30 tissue biopsies is a numeric matrix with 10000 rows and 30 columns. array: 3-,4-,..dimensional matrix Example: the red and green foreground and background values for 20000 spots on 120 arrays is a 4 x 20000 x 120 (3D) array.

Matrix a matrix is a vector with an additional attribute (dim) that defines the number of columns and rows only one mode (numeric, character, complex, or logical) allowed can be created using matrix() x<-matrix(data=0,nr=2,nc=2) or x<-matrix(0,2,2)

Matrices > m <- matrix(1:12, 4, byrow = T); m Matrix: Rectangular table of data of the same type > m <- matrix(1:12, 4, byrow = T); m [,1] [,2] [,3] [1,] 1 2 3 [2,] 4 5 6 [3,] 7 8 9 [4,] 10 11 12

Matrices > y <- -1:2 > m.new <- m + y > t(m.new) [,1] [,2] [,3] [,4] [1,] 0 4 8 12 [2,] 1 5 9 13 [3,] 2 6 10 14 > dim(m) [1] 4 3 > dim(t(m.new)) [1] 3 4

Matrices Matrix: Rectangular table of data of the same type > x <- c(3,-1,2,0,-3,6) > x.mat <- matrix(x,ncol=2) # Matrix with 2 cols > x.mat [,1] [,2] [1,] 3 0 [2,] -1 -3 [3,] 2 6 > x.mat <- matrix(x,ncol=2, byrow=T) # By row creation [1,] 3 -1 [2,] 2 0 [3,] -3 6

Dealing with Matrices > x.mat[,2] # 2nd col [1] -1 0 6 > x.mat[c(1,3),] # 1st and 3rd lines [,1] [,2] [1,] 3 -1 [2,] -3 6 > x.mat[-2,] # No 2nd line

Dealing with Matrices [,1] [,2] [,3] > dim(x.mat) # Dimension [1] 3 2 > t(x.mat) # Transpose [,1] [,2] [,3] [1,] 3 2 -3 [2,] -1 0 6 > x.mat %*% t(x.mat) # %*% Multiplication (inner product) [1,] 10 6 -15 [2,] 6 4 -6 [3,] -15 -6 4 > solve() # Inverse of a square matrix > eigen() # Eigenvectors and eigenvalues

Generate a Matrix > xmat<-matrix(1:12,nrow=3,byrow=T) > xmat [,1] [,2] [,3] [,4] [1,] 1 2 3 4 [2,] 5 6 7 8 [3,] 9 10 11 12 > length(xmat) [1] 12 > dim(xmat) [1] 3 4 > mode(xmat) [1] "numeric" > names(xmat) NULL > dimnames(xmat)

Generate a Matrix > dimnames(xmat)<-list(c("A","B","C"), c("W","X","Y","Z")) > dimnames(xmat) [[1]] [1] "A" "B" "C" [[2]] [1] "W" "X" "Y" "Z" > xmat W X Y Z A 1 2 3 4 B 5 6 7 8 C 9 10 11 12

Generate a Matrix > matrix(0,3,3) [,1] [,2] [,3] [1,] 0 0 0 [1,] 0 0 0 [2,] 0 0 0 [3,] 0 0 0

Diagonal Element of a Matrix > m <- matrix(1:12, 4, byrow = T) > m [,1] [,2] [,3] [1,] 1 2 3 [2,] 4 5 6 [3,] 7 8 9 [4,] 10 11 12 > diag(m) [1] 1 5 9

Inverse of Matrices > m<-matrix(c(1,3,5,,9,11,13,15,19,21),3,byrow=T) > m [,1] [,2] [,3] [1,] 1 3 5 [2,] 9 11 13 [3,] 15 19 21 > solve(m) [,1] [,2] [,3] [1,] -0.5000 1.0000 -0.5 [2,] 0.1875 -1.6875 1.0 [3,] 0.1875 0.8125 -0.5

rbind() & cbind() > x<-c(1,2,3) > y<-matrix(0,3,3) > rbind(y,x) [,1] [,2] [,3] 0 0 0 x 1 2 3 > cbind(y,x) x [1,] 0 0 0 1 [2,] 0 0 0 2 [3,] 0 0 0 3

Array Arrays are generalized matrices by extending the function dim() to mor thantwo dimensions. > xarr<-array(c(1:8,11:18,111:118),dim=c(2,4,3)) # row, col, array > xarr , , 1 [,1] [,2] [,3] [,4] [1,] 1 3 5 7 [2,] 2 4 6 8 , , 2 [,1] [,2] [,3] [,4] [1,] 11 13 15 17 [2,] 12 14 16 18 , , 3 [,1] [,2] [,3] [,4] [1,] 111 113 115 117 [2,] 112 114 116 118

Lists list: ordered collection of data of arbitrary types. Example: > doe <- list(name="john",age=28,married=F) > doe$name [1] "john“ > doe$age [1] 28 > doe[[3]] [1] FALSE Typically, vector elements are accessed by their index (an integer) and list elements by $name (a character string). But both types support both access methods. Slots are accessed by @name.

Lists Are Very Flexible > my.list <- list(c(5,4,-1),c("X1","X2","X3")) > my.list [[1]]: [1] 5 4 -1 [[2]]: [1] "X1" "X2" "X3" > my.list[[1]] > my.list <- list(c1=c(5,4,-1),c2=c("X1","X2","X3")) > my.list$c2[2:3] [1] "X2" "X3“

More Lists > x.mat [,1] [,2] [1,] 3 -1 [2,] 2 0 [3,] -3 6 [1,] 3 -1 [2,] 2 0 [3,] -3 6 > dimnames(x.mat) <- list(c("L1","L2","L3"), c("R1","R2")) R1 R2 L1 3 -1 L2 2 0 L3 -3 6

Data Frames L<-LETTERS[1:4] #A B C D x<-1:4 #1 2 3 4 data frame: rectangular table with rows and columns; data within each column has the same type (e.g. number, text, logical), but different columns may have different types. Represents the typical data table that researchers come up with – like a spreadsheet. Example1: > a <-data.frame(localization,tumorsize,progress,row.names=patients) > a localization tumorsize progress XX348 proximal 6.3 FALSE XX234 distal 8.0 TRUE XX987 proximal 10.0 FALSE Example2: L<-LETTERS[1:4] #A B C D x<-1:4 #1 2 3 4 data.frame(x,L) #create data frame

What type is my data? Class from which object inherits (vector, matrix, function, logical, list, … ) mode Numeric, character, logical, … storage.mode typeof Mode used by R to store object (double, integer, character, logical, …) is.function Logical (TRUE if function) is.na Logical (TRUE if missing) names Names associated with object dimnames Names for each dim of array slotNames Names of slots of BioC objects attributes Names, class, etc.

Branching if (logical expression) { statements } else { alternative statements else branch is optional { } are optional with one statement ifelse (logical expression, yes statement, no statement)

Loops When the same or similar tasks need to be performed multiple times; for all elements of a list; for all columns of an array; etc. for(i in 1:10) { print(i*i) } i<-1 while(i<=10) { i<-i+sqrt(i) j <-1 repeat { print(j) j <- j + j/3 if (j > 10) break Also: repeat, break, next

Regular Expressions Tools for text matching and replacement which are available in similar forms in many programming languages (Perl, Unix shells, Java) > a <- c("CENP-F","Ly-9", "MLN50", "ZNF191", "CLH-17") > grep("L", a) [1] 2 3 5 > grep("L", a, value=T) [1] "Ly-9" "MLN50" "CLH-17" > grep("^L", a, value=T) [1] "Ly-9" > grep("[0-9]", a, value=T) [1] "Ly-9" "MLN50" "ZNF191" "CLH-17" > gsub("[0-9]", "X", a) [1] "CENP-F" "Ly-X" "MLNXX" "ZNFXXX" "CLH-XX"

Storing Data Every R object can be stored into and restored from a file with the commands “save” and “load”. This uses the XDR (external data representation) standard of Sun Microsystems and others, and is portable between MS-Windows, Unix, Mac. > save(x, file=“x.Rdata”) > load(“x.Rdata”)

Importing and Exporting Data There are many ways to get data in and out. Most programs (e.g. Excel), as well as humans, know how to deal with rectangular tables in the form of tab-delimited text files. > x <- read.delim(“filename.txt”) Also: read.table, read.csv, scan > write.table(x, file=“x.txt”, sep=“\t”) Also: write.matrix, write

Functions and Operators Functions do things with data “Input”: function arguments (0,1,2,…) “Output”: function result (exactly one) Example: add <- function(a,b) { result <- a+b return(result) } Operators: Short-cut writing for frequently used functions of one or two arguments.

Frequently used operators <- Assign + Sum - Difference * Multiplication / Division ^ Exponent %% Mod %*% Dot product %/% Integer division %in% Subset | Or & And < Less > Greater <= Less or = >= Greater or = ! Not != Not equal == Is equal

Frequently used functions Concatenate cbind,rbind Concatenate vectors min Minimum max Maximum length # values dim # rows, cols floor Max integer in which TRUE indices table Counts summary Generic stats Sort, order, rank Sort, order, rank a vector print Show value cat Print as char paste c() as char round Round apply Repeat over rows, cols

Statistical functions rnorm, dnorm, pnorm, qnorm Normal distribution random sample, density, cdf and quantiles lm, glm, anova Model fitting loess, lowess Smooth curve fitting sample Resampling (bootstrap, permutation) .Random.seed Random number generation mean, median Location statistics var, cor, cov, mad, range Scale statistics svd, qr, chol, eigen Linear algebra

Graphical functions plot Generic plot eg: scatter points Add points lines, abline Add lines text, mtext Add text legend Add a legend axis Add axes box Add box around all axes par Plotting parameters (lots!) colors, palette Use colors

Useful Functions > seq(2,12,by=2) [1] 2 4 6 8 10 12 > seq(4,5,length=5) [1] 4.00 4.25 4.50 4.75 5.00 > rep(4,10) [1] 4 4 4 4 4 4 4 4 4 4 > paste("V",1:5,sep="") [1] "V1" "V2" "V3" "V4" "V5" > LETTERS[1:7] [1] "A" "B" "C" "D" "E" "F" "G"

Vector Functions And also: min() max() cummin() cummax() range() > vec <- c(5,4,6,11,14,19) > sum(vec) [1] 59 > prod(vec) [1] 351120 > mean(vec) [1] 9.833333 > median(vec) [1] 8.5 > var(vec) [1] 34.96667 > sd(vec) [1] 5.913262 > summary(vec) Min. 1st Qu. Median Mean 3rd Qu. Max. 4.000 5.250 8.500 9.833 13.250 19.000 And also: min() max() cummin() cummax() range()

"apply" and Its Relatives (lapply, sapply) Often we want to repeat the same function on all the rows or columns of a matrix, or all the elements of a list. We could do this in a loop, but R has a more efficient operator the apply function

Applying a Function to the Rows or Columns of a Matrix If mat is a matrix and fun is a function (such as mean, var, lm ...) that takes a vector as its arguments, then apply(mat,1,fun) applies fun to each row of mat apply(mat,2,fun) applies fun to each column of mat In either case, the output is a vector.

apply apply( arr, margin, fct ) Apply the function fct along some dimensions of the array arr, according to margin, and return a vector or array of the appropriate size. > x [,1] [,2] [,3] [1,] 5 7 0 [2,] 7 9 8 [3,] 4 6 7 [4,] 6 3 5 > apply(x, 1, sum) [1] 12 24 17 14 > apply(x, 2, sum) [1] 22 25 20

lapply When the same or similar tasks need to be performed multiple times for all elements of a list or for all columns of an array. May be easier and faster than “for” loops lapply(li, function ) To each element of the list li, the function function is applied. The result is a list whose elements are the individual function results. > li = list("klaus","martin","georg") > lapply(li, toupper) > [[1]] > [1] "KLAUS" > [[2]] > [1] "MARTIN" > [[3]] > [1] "GEORG"

Relatives of “apply” lapply(list,fun) applies the function to every element of list tapply(x,factor,fun) uses the factor to split x into groups, and then applies fun to each group