Writing Faster Code in R R meetup Arelia T. Werner May 20 th 2015 Tectoria, Victoria, BC
Background Different skill levels with R in this group Me: easy to understand versus runs faster I work with ‘big’ data so faster code useful Also - faster code assists with debugging I have a tendency to write in for loops (I think this comes from learning from people who previously programmed in Fortran)
Example Loop versus Function Speed > system.time(for (i in 1:1000) { rnorm(100) }) user system elapsed > system.time(replicate(1000, rnorm(100))) user system elapsed
Rules of thumb with Loops Avoid nested loops at all costs Use a counter with while loops
Avoid loops with “apply” > system.time( for (i in 1:ncol(worldbank)) { + tmp <- is.na(worldbank[[i]]) + mv[i] <- sum(tmp) + }) user system elapsed > mv [1] > system.time(apply(worldbank, 2, function(x) sum(is.na(x)))) user system elapsed
The best tool for microbenchmarking in R is the microbenchmark package. It provides very precise timings, making it possible to compare operations that only take a tiny amount of time. For example, the following code compares the speed of two ways of computing a square root.microbenchmark Instead of using microbenchmark(), you could use the built- in function system.time(). But system.time() is much less precise, so you’ll need to repeat each operation many times with a loop, and then divide to find the average time of each operation, as in the code below. Alex will talk about this more.
worldbank <- read.table(" sep=":", header=TRUE) worldbank <- worldbank[c(1,4,7,10,13,16,19,22)]