The Art of R Programming Chapter 15 – Writing Fast R Code Chapter 16 – Interfacing R to Other languages Chapter 17 – Parallel R
Use vectorized functions instead of loops x <- runif( ) y <- runif( ) system.time(z <- x + y) # user system elapsed system.time(for (i in 1:length(x)) z[i] <- x[i] + y[i]) # user system elapsed R functions are much slower than native codes. The followings are all functions: “for” “:” (range operator) “[]” (vector reference)
Another example of vectorization oddCount <- function(x) return (length(which(x % 2 == 1))) x <- sample(1: , , replace=T) system.time(oddCount(x)) # user system elapsed system.time({ cnt <- 0 for (i in 1:length(x)) if (x[i] % 2 == 1) cnt <- cnt + 1 cnt }) # user system elapsed 1.353
Vectorized functions ifelse, which, where, any, all rowSums, colSums outer, lower.tri, upper.tri, expand.grid *apply family BUT not “apply” itself (apply is implemented in R)
Bytecode compiler library(compiler) oddCountCompiled <- cmpfun(oddCountSequential) system.time(oddCountCompiled(x)) # user system elapsed Faster but still not as fast as the vectorized version.
Power Matrix example powers <- function(x, degree) { pw <- matrix(x, nrow=length(x)) prod <- x for (i in 2:degree) { prod <- prod * x pw <- cbind(pw,prod) # build the matrix sequentially } return(pw) } x <- runif( ) system.time(powers(x, 8)) # user system elapsed 0.258
Power Matrix example powers2 <- function(x, degree) { # allocate the matrix in one go pw <- matrix(nrow=length(x), ncol=degree) prod <- x pw[, 1] <- prod for (i in 2:degree) { prod <- prod * x pw[, i] <- prod } return(pw) } system.time(powers2(x, 8)) # user system elapsed 0.14
Power Matrix example powers3 <- function(x, degree) { return(outer(x, 1:degree, "^")) } system.time(powers3(x, 8)) # user system elapsed powers4 <- function(x, degree) { repx <- matrix(rep(x, degree), nrow=length(x)) return(t(apply(repx, 1, cumprod))) } system.time(powers4(x, 8)) # user system elapsed 6.322
Profiling Rprof() invisible(powers(x, 8)) Rprof(NULL) SummaryRprof() $by.self self.time self.pct total.time total.pct "cbind" "*" $by.total total.time total.pct self.time self.pct "powers" "cbind" "*"
Memory Allocation and Copying # you have to run this block in one go z <- runif(10) tracemem(z) # 0x76dc288 z[3] <- 8 tracemem(z) # 0x76dc288 z[20] <- 100 tracemem(z) # 0x4a52cf0
Memory issue Object size is limited to 2^31 – 1 (=4GB) even if you are using 64bit OS with huge physical memory. Workaround: Chunking ff, bigmemory package R in 64bit OS
Language Bindings You can write codes in other languages and call it from R, or vice versa. RPy (to Python) Rcpp (to C/C++) etc... I will not go into details; ask me if you are interested.
Parallelization Two (major) packages – Rmpi Interface to MPI (Message Passing Interface) – Snow Transparent parallelization
Snow Example library(snow) # create cluster cl <- makeCluster(rep("localhost", 8), type="SOCK") a <- matrix(rnorm( ), ncol=2) # sequential execution system.time(apply(a, 1, "%*%", c(1, 10))) # parallel execution system.time(parApply(cl, a, 1, "%*%", c(1, 10))) # destroy cluster stopCluster(cl)
Snow Functions clusterExport clusterEvalQ clusterApply clusterSetupRNG Necessary to generate different random number series on each host
By Daniels220 from Wikipedia Cautions! Optimize only when necessary – Parallel codes are more complex, unpredictable, and difficult to debug Consider Amdahl's law S: Speed up N: Number of processors P: Proportion of parallelizable code