Programming and Simulations Frank Witmer 6 January 2011
Outline General programming tips Programming loops Simulation – Distributions – Sampling – Bootstrapping
General Programming Tips Use meaningful variable names Include more comments than you think necessary Debugging your code – Since R is interpreted, non-function variables are available for inspection if execution terminates – Built-in debugging support: debug(), browser(), trace() – But generally adding print statements in functions is sufficient Syntax highlighting! –
Loops Because R is an interpreted language, all variables in the system are evaluated and stored at every step So avoid loops for computationally intense analysis
For & While loop syntax for (variable in sequence) { expression } while (condition) { expression }
if/else control statements if ( condition1 ) { expression1 } else if ( condition2 ) { expression2 } else { expression3 }
Ways to avoid loops (sometimes) tapply: apply a function (FUN) to a variable based on a grouping variable lapply: apply a function (FUN) to each variable in a given list – sapply: same as lapply but output is more user- friendly
Data simulation Can simulate data using standard distribution functions, e.g. core names norm, pois Use ‘r’ prefix to generate random values of the distribution – rnorm(numVals, mean, sd) – rpois(numVals, mean) Use set.seed() if you want your simulated data to be reproducible
Standard distribution functions
Sampling Sample from a dataset using: sample(dataset, numItems, replace?) Can use to simulate survey results or bootstrap statistical estimates
Bootstrap overview Method to measure accuracy of estimates from a sample empirically For a sample of size n, draw many random samples, also of size n, with replacement Two ways to bootstrap regression estimates – residual resampling: add resampled regression residuals to the original dep. var. & re-estimate – data resampling: sample complete cases of original data and estimate coefficients
Recall: Boston Metadata CRIM per capita crime rate by town ZN proportion of residential land zoned for lots over 25,000 ft 2 INDUS proportion of non-retail business acres per town CHAS Charles River dummy variable (=1 if tract bounds river; 0 otherwise) NOX Nitrogen oxide concentration (parts per 10 million) RM average number of rooms per dwelling AGE proportion of owner-occupied units built prior to 1940 DIS weighted distances to five Boston employment centres RAD index of accessibility to radial highways TAX full-value property-tax rate per $10,000 PTRATIO pupil-teacher ratio by town B 1000(Bk ) 2 where Bk is the proportion of blacks by town LSTAT % lower status of the population MEDV Median value of owner-occupied homes in $1000's