Programming in R Intro, data and programming structures
About R R is an open source comprehensive statistical package, widely used around the world. R is a real object-oriented programming language (compared to SAS or Minitab). R is available for Windows, Mac and Linux. Can be compared to Matlab by structure, by origin – free version of S. + Easy and flexible programming. Has a lot of contributed packages covering most statistical and machine learning methods (also recent ones!). Slower than C and Fortran. Cannot handle as large data sets as for example Perl and Python.
Installing R R project web site:
Getting help Specific function Help browser help(function) Help browser help.start() Search for something in help"expression") Quick reminder of function arguments: args(function) Examples of how to use function: example(function) If some method is not installed on the computer: RSiteSearch("expression")
Preliminaries R is case-sensitive Comments: Start with hash-mark #R is a cool language! Data assignment: Use -> or <- or = a <- 3 3 -> b c = 3 Variable types: called modes, ex. integer, numeric, character, logical, complex
Working with vectors Vectors are the 'workhorses' of R The function c() combines individual values (comma-spaced) to a vector Print on screen by entering the variable name or use the function print() [1]is the row number, and in this case x is interpreted as a row vector The length of the vector is obtained with the length() function The mode of the vector is obtained with the mode() function
Listing and removing objects Listing defined objects (vectors, matrices, data frames): Use the function ls() with no arguments Compact display of an R object. Useful for complex data frames and lists. Removing objects: Use the function rm()
Sequences Use of ' : ' , seq() and rep()
Indexing Finding elements satisfying specific conditions Indexing follows format vector1[vector2] Finding elements satisfying specific conditions
Filtering Filtering follows the indexing principles
Set operations on vectors
Other operations on vectors Important: In R, operations with vectors are performed element- by-element Some operations: Element-wise: +-*/^ log exp sin cos sqrt length –number of elements sum - sum of all elements mean max min order Logicals: TRUE or FALSE: a<-TRUE;
Working with matrices Use the function matrix() a<-matrix(values,nrow=m,ncol=n) values is a list of values enclosed in c(), i.e. a row vector or an already defined vector. m is the number of rows and n is the number of columns of the matrix. The number of values must be dividable by both m and n. The values are entered column-wise. The identifiers nrow= and ncol= can be omitted Note the double indexing, first number for row and second number for column
Matrix filtering Follows the same principles as for vectors Column 2
Editing sizes of matrices Adding and deleting rows and columns of matrices
Matrix operations Matrix/vector multiplication Matrix/matrix
Matrix operations Matrix transpose b = aT Matrix inverse b = a-1
Lists A list is a collection of objects Can be of different modes (ex. "logical", "integer", "double", "character") List indexing is different from vector and matrix indexing
Lists Adding and deleting components of lists
Lists Accessing list components and values unlist coerces to a common mode
Data frames Data frames are two-dimensional analogs of matrices that can contain columns of different mode Use the function data.frame(object 1, object 2, … , object k) Matrices need to be protected , otherwise each column of a matrix will be identified as a single object in the data frame. Protection is made with the function I()
Data frames Accessing components and values of data frames (both list and matrix indexing works)
Data frames Combining data frames
Factors A factor can be seen as a vector with certain levels
Sorting Sorting according to either numeric values or characters
Conditional execution if (expr) { … } else{ If you need to connect several conditions, use '&' , '&&', '| ' or '||'
Loops for (name in expr1 ) { … } while (condition)
Avoiding loops The *apply() family of functions is among the most important in R for speeding up computations apply(m,dimcode,f,fargs) - m the matrix - dimcode 1 applies the function to rows 2 applies the function to columns - f the function to be applied - fargs optional arguments to be supplied to f
Avoiding loops Use apply over matrices and data frames Use lapply and sapply over lists
Writing your own functions Function writing must always end with writing the value which should be returned You may also use "return(value)" to show what value the function should return
Writing your own functions If several arguments need to be returned, list may be used
Writing your own functions Obligatory arguments and arguments by default Variables can be specified in any order (provided you specify the name) when you call the function
Simulation The key to simulation is random number generation (RNG). R has several RNG implemented, default is the "Mersenne-Twister" RNG. The sample() function can be used for random sampling from vectors and matrices
Simulation R has functions to generate variates from many distributions ex. rnorm(),rbinom(),rchisq(),rpois(),rgamma(),mvrnorm()
Simulation The set.seed() function can be used to generate reproducible sequences of random numbers