Presentation is loading. Please wait.

Presentation is loading. Please wait.

Programming and Simulations Frank Witmer 6 January 2011.

Similar presentations


Presentation on theme: "Programming and Simulations Frank Witmer 6 January 2011."— Presentation transcript:

1 Programming and Simulations Frank Witmer 6 January 2011

2 Outline General programming tips Programming loops Simulation – Distributions – Sampling – Bootstrapping

3 General Programming Tips Use meaningful variable names Include more comments than you think necessary Debugging your code – Since R is interpreted, non-function variables are available for inspection if execution terminates – Built-in debugging support: debug(), browser(), trace() – But generally adding print statements in functions is sufficient Syntax highlighting! – http://sourceforge.net/projects/npptor/

4 Loops Because R is an interpreted language, all variables in the system are evaluated and stored at every step So avoid loops for computationally intense analysis

5 For & While loop syntax for (variable in sequence) { expression } while (condition) { expression }

6 if/else control statements if ( condition1 ) { expression1 } else if ( condition2 ) { expression2 } else { expression3 }

7 Ways to avoid loops (sometimes) tapply: apply a function (FUN) to a variable based on a grouping variable lapply: apply a function (FUN) to each variable in a given list – sapply: same as lapply but output is more user- friendly

8 Data simulation Can simulate data using standard distribution functions, e.g. core names norm, pois Use ‘r’ prefix to generate random values of the distribution – rnorm(numVals, mean, sd) – rpois(numVals, mean) Use set.seed() if you want your simulated data to be reproducible

9 Standard distribution functions

10 Sampling Sample from a dataset using: sample(dataset, numItems, replace?) Can use to simulate survey results or bootstrap statistical estimates

11 Bootstrap overview Method to measure accuracy of estimates from a sample empirically For a sample of size n, draw many random samples, also of size n, with replacement Two ways to bootstrap regression estimates – residual resampling: add resampled regression residuals to the original dep. var. & re-estimate – data resampling: sample complete cases of original data and estimate coefficients

12 Recall: Boston Metadata CRIM per capita crime rate by town ZN proportion of residential land zoned for lots over 25,000 ft 2 INDUS proportion of non-retail business acres per town CHAS Charles River dummy variable (=1 if tract bounds river; 0 otherwise) NOX Nitrogen oxide concentration (parts per 10 million) RM average number of rooms per dwelling AGE proportion of owner-occupied units built prior to 1940 DIS weighted distances to five Boston employment centres RAD index of accessibility to radial highways TAX full-value property-tax rate per $10,000 PTRATIO pupil-teacher ratio by town B 1000(Bk - 0.63) 2 where Bk is the proportion of blacks by town LSTAT % lower status of the population MEDV Median value of owner-occupied homes in $1000's


Download ppt "Programming and Simulations Frank Witmer 6 January 2011."

Similar presentations


Ads by Google