Download presentation
Presentation is loading. Please wait.
Published byErnest Barrett Modified over 6 years ago
1
Programming in R coding, debugging and optimizing Katia Oleinik Scientific Computing and Visualization Boston University
2
if Comparison operators: Logical operators: if (condition) {
command(s) } else { } Comparison operators: == equal != not equal > (<) greater (less) >= (<=) greater (less) or equal Logical operators: & and | or ! not
3
if > # define x > x <- 7 > # simple if statement
> if ( x < 0 ) print("Negative") > # simple if-else statement > if ( x < 0 ) print("Negative") else print("Non-negative") [1] "Non-negative" > # if statement may be used inside other constructions > y <- if ( x < 0 ) -1 else 0 > y [1] 0
4
if > # multiline if - else statement > if ( x < 0 ) { x <- x+10 print("x is negative: subtract 10") + } else if ( x == 0 ) { print("x is equal zero") + } else { print("x is positive: add 10") + } [1] positive Note: For multiline if-statements braces are necessary even for single statement bodies. The left and right braces must be on the same line with else keyword (in interactive session).
5
ifelse ifelse (test_condition, true_value, false_value)
> # ifelse statement > y <- ifelse ( x < 0, -1, 0 ) > # nested ifelse statement > y <- ifelse ( x < 0, -1, ifelse (x > 0, 1, 0) )
6
ifelse Best of all – ifelse statement operates on vectors!
> # ifelse statement on a vector > digits <- 0 : 9 > (odd <- ifelse( digits %% 2 > 0, TRUE, FALSE )) [1] FALSE TRUE FALSE TRUE FALSE TRUE FALSE TRUE FALSE TRUE
7
ifelse Exercise: define a random vector ranging from -10 to 10:
x<- as.integer( runif( 10, -10, 10 ) ) create vector y, such that its elements equal to absolute values of x Note: normally, you would use abs() function to achieve this result
8
switch switch (statement, list) > # simple switch statement
> x <- 3 > switch( x, 2, 4, 6, 8 ) [1] 6 > switch( x, 2, 4 ) # returns NULL since there are only 2 elements in the list
9
switch switch (statement, name1 = str1, name2 = str2, … )
> # switch statement with named list > day <- "Tue" > switch( day, Sun = 0, Mon = 1, Tue = 2, Wed = 3, … ) [1] 2 > # switch statement with a “default” value > food <- "meet" > switch( food, banana="fruit", carrot="veggie", "neither") [1] "neither"
10
loops There are 3 statements that provide explicit looping: - repeat
- for - while Built – in constructs to control the looping: - next - break Note: Use explicit loops only if it is absolutely necessary. R has other functions for implicit looping, which will run much faster: apply(), sapply(), tapply(), and lapply().
11
repeat repeat { } statement causes repeated evaluation of the body until break is requested. Be careful – infinite loop may occur! > # find the greatest odd divisor of an integer > x <- 84 > repeat{ + print(x) + if( x%%2 != 0) break + x <- x/2 + } [1] 84 [1] 42 [1] 21 >
12
for for (object in sequence) { command(s) }
> # print all words in a vector > names <- c("Sam", "Paul", "Michael") > > for( j in names ){ + print(paste("My name is" , j)) + } [1] "My name is Sam" [1] "My name is Paul" [1] "My name is Michael"
13
for for (object in sequence) { command(s)
if (…) next # return to the start of the loop if (…) break # exit from (innermost) loop }
14
while while (test_statement) { command(s) }
> # find the largest odd divisor of a given number > x <- 84 > while (x %% 2 == 0){ + x <- x/2 + } > x [1] 21 >
15
loops Exercise: Using either loop statement print all the numbers from 0 to 30 divisible by 7. Use %% - modular arithmetic operator to check divisibility.
16
function myFun <- function (ARG, OPT_ARGs ){ statement(s) }
ARG: vector, matrix, list or a data frame OPT_ARGs: optional arguments Functions are a powerful R elements. They allows you to expand on existing functions by writing your own custom functions.
17
function myFun <- function (ARG, OPT_ARGs ){ statement(s) } Naming:
Variable naming rules apply. Avoid usage of existing (built-in) functions Arguments: Argument list can be empty. Some (or all) of the arguments can have a default value ( arg1 = TRUE ) The argument ‘…’ can be used to allow one function to pass on argument settings to another function. Return value: The value returned by the function is the last value computed, but you can also use return() statement.
18
function > # simple function: calculate (x+1)2
> myFun <- function (x) { + x^2 + 2*x + 1 + } > myFun(3) [1] 16 >
19
function > # function with optional arguments: calculate (x+a)2
> myFun <- function (x, a=1) { + x^2 + 2*x*a + a^2 + } > myFun(3) [1] 16 > myFun(3,2) [1] 25 > > # arguments can be called using their names ( and out of order!!!) > myFun( a = 2, x = 1) [1] 9
20
function > # Some optional arguments can be specified as ‘…’ to pass them to another function > myFun <- function (x, … ) { + plot (x, … ) + } > > # print all the words together in one sentence > myFun <- function ( … ) { + print(paste ( … ) ) > myFun("Hello", " R! ") [1] "Hello R! "
21
function Local and global variables:
All variables appearing inside a function are treated as local, except their initial value will be of that of the global (if such variable exists). > # define a function > myFun <- function (x) { + cat ("u=", u, "\n") # this variable is local ! + u<-u # this will not affect the value of variable outside f() + cat ("u=", u, "\n") + } > > u < # define a variable > myFun(5) #execute the function u= 2 u= 3 > cat ("u=", u, "\n") # print the value of the variable
22
function Local and global variables:
If you want to access the global variable – you can use the super-assignment operator <<-. You should avoid doing this!!! > # define a function > myFun <- function (x) { + cat ("u=", u, "\n") # this variable is local ! + u <<- u # this WILL affect the value of variable outside f() + cat ("u=", u, "\n") + } > > u < # define a variable > myFun(u) #execute the function u= 2 u= 3 > cat ("u=", u, "\n") # print the value of the variable
23
function Call vector variables:
Functions do not change their arguments. > # define a function > myFun <- function (x) { + x <- 2 + print (x) + } > > x < # assign value to x > y <- myFun(x) # call the function [1] 2 > print(x) # print value of x [1] 3
24
function Call vector variables:
If you want to change the value of the function’s argument, reassign the return value to the argument. > # define a function > myFun <- function (x) { + x <- 2 + print (x) + } > > x < # assign value to x > x <- myFun(x) # call the function [1] 2 > print(x) # print value of x
25
function Finding the source code:
You can find the source code for any R function by printing its name without parentheses. > # get the source code of lm() function > lm function (formula, data, subset, weights, na.action, method = "qr", model = TRUE, x = FALSE, y = FALSE, qr = TRUE, singular.ok = TRUE, contrasts = NULL, offset, ...) { ret.x <- x ret.y <- y cl <- match.call() . . . z } <environment: namespace:stats> >
26
function Finding the source code:
For generic functions there are many methods depending on the type of the argument. > # get the source code of mean() function > mean function (x, ...) UseMethod("mean") <environment: namespace:base> >
27
function Finding the source code:
You can first explore different methods and then chose the one you need. > # get the source code of mean() function > methods("mean") [1] mean.Date mean.POSIXct mean.POSIXlt mean.data.frame [5] mean.default mean.difftime > > # get source code > mean.default function (x, trim = 0, na.rm = FALSE, ...) { if (!is.numeric(x) && !is.complex(x) && !is.logical(x)) { . . . z } <environment: namespace:stats>
28
apply apply (OBJECT, MARGIN, FUNCTION, ARGs )
object: vector, matrix or a data frame margin: 1 – rows, 2 – columns, c(1,2) – both function: function to apply args: possible arguments Description: Returns a vector or array or list of values obtained by applying a function to margins of an array or matrix
29
apply Example: Create matrix and apply different functions to its rows and columns. > # create 3x4 matrix > x <- matrix( 1:12, nrow = 3, ncol = 4) > x [,1] [,2] [,3] [,4] [1,] [2,] [3,] >
30
apply Example: Create matrix and apply different functions to its rows and columns. > # create 3x4 matrix > x <- matrix( 1:12, nrow = 3, ncol = 4) > x [,1] [,2] [,3] [,4] [1,] [2,] [3,] > # find median of each row > apply (x, 1, median) [1] >
31
apply Example: Create matrix and apply different functions to its rows and columns. > # create 3x4 matrix > x <- matrix( 1:12, nrow = 3, ncol = 4) > x [,1] [,2] [,3] [,4] [1,] [2,] [3,] > # find mean of each column > apply (x, 2, mean) [1] >
32
apply Example: Create matrix and apply different functions to its rows and columns. > # create 3x4 matrix > x <- matrix( 1:12, nrow = 3, ncol = 4) > x [,1] [,2] [,3] [,4] [1,] [2,] [3,] > # create a new matrix with values 0 or 1 for even and odd elements of x > apply (x, c(1,2), function (x) x%%2) [1,] [2,] [3,] >
33
lapply llapply() function returns a list: lapply(X, FUN, ...)
> # create a list > x <- list(a = 1:10, beta = exp(-3:3), logic = c(TRUE,FALSE,FALSE)) > # compute the list mean for each list element > lapply (x, mean) $a [1] 5.5 $beta [1] $logic [1] >
34
sapply lsapply() function returns a vector or a matrix:
sapply(X, FUN, ... , simplify = TRUE, USE.NAMES = TRUE) > # create a list > x <- list(a = 1:10, beta = exp(-3:3), logic = c(TRUE,FALSE,FALSE)) > # compute the list mean for each list element > sapply (x, mean) a beta logic >
35
code sourcing source ("file", … )
file: file with a source code to load (usually with extension .r ) echo: if TRUE, each expression is printed after parsing, before evaluation.
36
code sourcing katana:~ % emacs foo_source.r & # dummy function
Linux prompt katana:~ % emacs foo_source.r & Text editor # dummy function foo <- function(x){ x+1 } R session > # load foo.r source file > source ("foo_source.r") > # create a vector > x <- c(3,5,7) > # call function > foo(x) [1] 4 6 8
37
code sourcing > # load foo.r source file
> source ("foo_source.r", echo = TRUE) > # dummy function > foo <- function(x){ + x+1; + } > # create a vector > x <- c(3,5,7) > # call function > foo(x) [1] 4 6 8
38
code sourcing Exercise:
- write a function that computes a logarithm of inverse of a number log(1/x) - save it in the file with .r extension - load it into your workspace - execute it - try execute it with input vector ( 2, 1, 0, -1 ).
39
debugging R package includes debugging tools.
cat () & print () – print out the values browser () – pause the code execution and “browse” the code debug (FUN) – execute function line by line undebug (FUN) – stop debugging the function
40
debugging # dummy function inv_log <- function(x){ y <- 1/x
inv_log.r # dummy function inv_log <- function(x){ y <- 1/x browser() y <- log(y) } > # load foo.r source file > source ("inv_log.r", echo = TRUE) > # dummy function > inv_log <- function(x){ + y<-1/x; + browser(); + y<-log(y); + } > inv_log (x) # call function Called from: inv_log(x) Browse[1]> y # check the values of local variables [1] Inf
41
debugging <RET> Go to the next statement if the function is being debugged. Continue execution if the browser was invoked. c or cont Continue execution without single stepping. n Execute the next statement in the function. This works from the browser as well. where Show the call stack. Q Halt execution and jump to the top-level immediately. To view the value of a variable whose name matches one of these commands, use the print() function, e.g. print(n).
42
debugging # dummy function inv_log <- function(x){ y <- 1/x
inv_log.r # dummy function inv_log <- function(x){ y <- 1/x browser() y <- log(y) } > # load foo.r source file > source ("inv_log.r", echo = TRUE) > # dummy function > inv_log <- function(x){ + y<-1/x; + browser(); + y<-log(y); + } > inv_log (x) # call function Called from: inv_log(x) Browse[1]> y [1] Inf Browse[1]> n debug: y <- log(y) Browse[2]> Warning message: In log(y) : NaNs produced >
43
debugging # dummy function inv_log <- function(x){ y <- 1/x
inv_log.r # dummy function inv_log <- function(x){ y <- 1/x y <- log(y) } > # load foo.r source file > source ("inv_log.r", echo = TRUE) > # dummy function > inv_log <- function(x){ + y<-1/x; + y<-log(y); + } > debug(inv_log) # debug mode > inv_log (x) # call function Called from: inv_log(x) debugging in: inv_log(x) debug: { y <- 1/x y <- log(y) } Browse[2]> . . . > undebug(inv_log) # exit debugging mode
44
timing Use system.time() functions to measure the time of execution.
> # make a function > g <- function(n) { + y = vector(length=n) + for (i in 1:n) y[i]=i/(i+1) + y + }
45
timing Use system.time() functions to measure the time of execution.
> # make a function > myFun <- function(x) { + y = vector(length=x) + for (i in 1:x) y[i]=i/(i+1) + y + } > # execute the function, measuring the time of the execution > system.time( myFun(100000) ) user system elapsed
46
optimization How to speed up the code?
47
optimization How to speed up the code? Use vectors !
48
optimization How to speed up the code? Use vectors !
> # using loops > g1 <- function(x) { + y = vector(length=x) + for (i in 1:x) y[i]=i/(i+1) + y + } > # using vectors > x <- (1:100000) > g2 <- function(x) { + x/(x+1) + } >
49
optimization How to speed up the code? Use vectors !
> # using loops > g1 <- function(x) { + y = vector(length=x) + for (i in 1:x) y[i]=i/(i+1) + y + } > # execute the function > system.time( g1(100000) ) user system elapsed > # using vectors > x <- (1:100000) > g2 <- function(x) { + x/(x+1) + } > # execute the function > system.time( g2(x) ) user system elapsed
50
optimization How to speed up the code?
Avoid dynamically expanding arrays
51
optimization How to speed up the code?
Avoid dynamically expanding arrays > vec1<-NULL > vec2 <- vector( + mode="numeric",length=100000)
52
optimization How to speed up the code?
Avoid dynamically expanding arrays > vec1<-NULL > # execute the command > system.time( + for(i in 1:100000) + vec1 <- c(vec1,mean(1:100))) user system elapsed > vec2 <- vector( + mode=“numeric”,length=100000) > # execute the command > system.time( + for(i in 1:100000) + vec2[i] <- mean(1:100)) user system elapsed
53
optimization How to speed up the code?
Avoid dynamically expanding arrays > f1<-function(x){ + vec1 <- NULL + for(i in 1:100000) + vec1 <- c(vec1,mean(1:10)) + } > # execute the command > system.time( f1(0) ) user system elapsed > f2<-function(x){ + vec2 <- vector( + mode="numeric",length=100000) + for(i in 1:100000) + vec2[i] <- mean(1:10) + } > # execute the command > system.time( f2(0) ) user system elapsed
54
optimization How to speed up the code? Use optimized R-functions, i.e.
rowSums(), rowMeans(), table(), etc. In some simple cases – it is worth it to write your own!
55
optimization How to speed up the code? Use optimized R-functions, i.e.
rowSums(), rowMeans(), table(), etc. In some simple cases – it is worth it to write your own! > matx <- matrix + (rnorm( ),100000,10) > # execute the command > system.time(apply(matx,1,mean)) user system elapsed > matx <- matrix + (rnorm( ),100000,10) > # execute the command > system.time(rowMeans(matx)) user system elapsed
56
optimization How to speed up the code? Use optimized R-functions, i.e.
rowSums(), rowMeans(), table(), etc. In some simple cases – it is worth it to write your own! > system.time( + for(i in 1:100000)mean(1:100)) user system elapsed > system.time( + for(i in 1:100000) + sum(1:100) / length(1:100) ) user system elapsed
57
optimization How to speed up the code? Use vectors
Avoid dynamically expanding arrays Use optimized R-functions, i.e. rowSums(), rowMeans(), table(), etc. In some simple cases – it is worth it to write your own implementation!
58
optimization How to speed up the code? Use vectors
Avoid dynamically expanding arrays Use optimized R-functions, i.e. rowSums(), rowMeans(), table(), etc. In some simple cases – it is worth it to write your own implementation! Use R - compiler or C/C++ code
59
compiling Use library(compiler): cmpfun() - compile existing function
cmpfile() - compile source file loadcmp() - load compiled source file
60
compiling # dummy function fsum <- function(x){ s <- 0
for ( n in x) s <- s+n s }
61
compiling # dummy function fsum <- function(x){ s <- 0
for ( n in x) s <- s+n s } > # load compiler library > library (compiler) >
62
compiling # dummy function fsum <- function(x){ s <- 0
fsum.r # dummy function fsum <- function(x){ s <- 0 for ( n in x) s <- s+n s } > # load compiler library > library (compiler) > # load function from a source file (if necessary) > source ("fsum.r") >
63
compiling # dummy function fsum <- function(x){ s <- 0
fsum.r # dummy function fsum <- function(x){ s <- 0 for ( n in x) s <- s+n s } > # load compiler library > library (compiler) > # load function from a source file (if necessary) > source (“fsum.r”) > fsumcomp <- cmpfun(fsum)
64
compiling Using compiled functions decreases the time of computation.
> # run non-compiled version > system.time(fsum(1:100000)) user system elapsed > # run compiled version > system.time(fsumcomp(1:100000)) user system elapsed
65
profiling Profiling is a tool, which can be used to find out how much time is spent in each function. Code profiling can give a way to locate those parts of a program which will benefit most from optimization. Rprof() – turn profiling on Rprof(NULL) – turn profiling off summaryRprof("Rprof.out") – Summarize the output of the Rprof() function to show the amount of time used by different R functions.
66
profiling Brownian Motion simulation.
Input: x - initial position, steps - number of steps bm.R # slow version of BM function bmslow <- function (x, steps){ BM <- matrix(x, nrow=length(x)) for (i in 1:steps){ # sample from normal distribution z <- rnorm(2) # attach a new column to the output matrix BM <- cbind (BM,z) } return(BM)
67
profiling Brownian Motion simulation.
Input: x - initial position, steps - number of steps bm.R # a faster version of BM function bm <- function (x, steps){ # allocate enough space to hold the output matrix BM <- matrix(nrow = length(x), ncol=steps+1) # add initial point to the matrix BM[,1] = x # sample from normal distribution (delX, delY) z <- matrix(rnorm(steps*length(x)),nrow=length(x)) for (i in 1:steps) BM[,i+1] <- BM[,i] + z[,i] return(BM) }
68
profiling > # load compiler library (if you have not done it before) > require (compiler) > # compile function from a source file > cmpfun ("bm.R") > # load function from a compiled file > loadcmp ("bm.Rc")
69
profiling > # simulate 100 steps > BMsmall <- bm(c(0,0),100)
> # plot the result > plot(BMsmall[1,],BMsmall[2,],…)
70
profiling > # start profiling slow function
> Rprof("bmslow.out") # optional – provide output file name > # run function > BMS <- bmslow(c(0,0), ) > # finish profiling > Rprof(NULL)
71
profiling > # start profiling faster function
> Rprof("bm.out") # optional – provide output file name > # run function > BM <- bm(c(0,0), ) > # finish profiling > Rprof(NULL)
72
profiling > summaryRprof("bmslow.out") $by.self
self.time self.pct total.time total.pct "cbind" "rnorm" "bmslow" … > summaryRprof("bm.out") $by.self self.time self.pct total.time total.pct "bm" "rnorm" "matrix" "+" ":" …
73
This tutorial has been made possible by
Scientific Computing and Visualization group at Boston University. Katia Oleinik
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.