Presentation is loading. Please wait.

Presentation is loading. Please wait.

Introduction to Programming in R Department of Statistical Sciences and Operations Research Computation Seminar Series Speaker: Edward Boone

Similar presentations


Presentation on theme: "Introduction to Programming in R Department of Statistical Sciences and Operations Research Computation Seminar Series Speaker: Edward Boone"— Presentation transcript:

1 Introduction to Programming in R Department of Statistical Sciences and Operations Research Computation Seminar Series Speaker: Edward Boone Email: elboone@vcu.edu

2 What is R? The R statistical programming language is a free open source package based on the S language developed by Bell Labs. The language is very powerful for writing programs. Many statistical functions are already built in. Contributed packages expand the functionality to cutting edge research. Since it is a programming language, generating computer code to complete tasks is required.

3 Getting Started Where to get R? Go to www.r-project.orgwww.r-project.org Downloads: CRAN Set your Mirror: Anyone in the USA is fine. Select Windows 95 or later. Select base. Select R-2.4.1-win32.exeR-2.4.1-win32.exe  The others are if you are a developer and wish to change the source code.

4 Getting Started The R GUI?

5 Getting Started Opening a script. This gives you a script window.

6 Getting Started Submitting a program: Use button Right mouse click and run selection. Submit Selection

7 Getting Started Basic assignment and operations. Arithmetic Operations:  +, -, *, /, ^ are the standard arithmetic operators. Matrix Arithmetic.  * is element wise multiplication  %*% is matrix multiplication Assignment  To assign a value to a variable use “<-”

8 Getting Started How to use help in R?  R has a very good help system built in.  If you know which function you want help with simply use ?_______ with the function in the blank.  Ex: ?hist.  If you don’t know which function to use, then use help.search(“_______”).  Ex: help.search(“histogram”).

9 Importing Data How do we get data into R? Remember we have no point and click… First make sure your data is in an easy to read format such as CSV (Comma Separated Values). Use code:  D <- read.table(“path”,sep=“,”,header=TRUE)

10 Working with data. Accessing columns. D has our data in it…. But you can’t see it directly. To select a column use D$column.

11 Working with data. Subsetting data. Use a logical operator to do this.  ==, >, =, <> are all logical operators.  Note that the “equals” logical operator is two = signs. Example:  D[D$Gender == “M”,]  This will return the rows of D where Gender is “M”.  Remember R is case sensitive!  This code does nothing to the original dataset.  D.M <- D[D$Gender == “M”,] gives a dataset with the appropriate rows.

12 Creating a Vector To create a vector use the c() function b <- c(3,1,0.3,0.1) This creates the column vector

13 Random Number Generation Random number generation is important in simulations as well as some model fitting techniques.  Consider: X1 <- rnorm(100,5,2) This generates a vector of 100 normal random variables with mean 5 and standard deviation 2.

14 Random Number Generation Generate two more vectors: X2 <- rnorm(100,15,3) X3 <- rnorm(100,22,5) This gives us two more vectors of normally distributed values.

15 Determining the Size of a Vector Use the length function. n1 <- length(X1)  Use this only for vectors. Can produce different results on matricies.

16 Creating a Vector of Repeated Values Often we want a vector of ones around. Use the rep() function. ones <- rep(1,n1)  This creates a vector of ones of length n1.

17 Creating a Matrix from Vectors Use the cbind() function.  X <- cbind(ones,X1,X2,X3)  This binds the column vectors together into a matrix.

18 Create a Regression Relationship Using our randomly generated data create a regression relationship. Use the code: Y <- X%*%b + rnorm(100,0,1)

19 Estimate a Regression Model Find the normal equations Use the code XtX <- t(X)%*%X XtY <- t(X)%*%Y

20 Solve the normal equations To estimate the regression parameters solve the normal equations. Use the following code. bhat <- solve(XtX)%*%XtY Check it bhat lm(Y ~ X1 + X2 + X3)

21 Create a Regression Function Use the function() format reg1 <- function(Y,X){ res <- solve(t(X)%*%X)%*%t(X)%*%Y return(res) } Don’t forget to return the result. Remember the code in braces is the function.

22 Try the function Use the data already created.  reg1(Y,X)

23 Add to the function Use the list function to return more than one result. Essentially, you are adding properties to the object reg2. reg2 <- function(Y,X){ coeff <- solve(t(X)%*%X)%*%t(X)%*%Y resid <- Y - X%*%coeff mse <- t(resid)%*%resid/(length(Y)-length(coeff)-1) res <- list(coeff,resid,mse) return(res) }

24 Try the function Use the data already created.  reg2(Y,X)

25 Add names to the function properties Use the names function allows you to name the properties. reg3 <- function(Y,X){ coeff <- solve(t(X)%*%X)%*%t(X)%*%Y resid <- Y - X%*%coeff mse <- t(resid)%*%resid/(length(Y)-length(coeff)-1) res <- list(coeff,resid,mse) names(res) <- c('coeff','residuals','mse') return(res) }

26 Programming Goal: PRESS PRESS will give us the ability to demonstrate basic programming constructs in an application.  Matrix Operations  Creating Functions  Loops  Data subsetting and storage

27 Programming Goal: PRESS PRESS is the predictive sums of squares of a regression model. It is computed via: where is the predicted value of y i using a model fit with all of the data except observation i.

28 Loops To construct a for loop use the following structure for(i in 1:n){ Operations… }

29 PRESS PRESS <- function(Y,X){ n1 <- length(Y) ind1 <- 1:n1 presshold <- rep(0,n1) for(i in 1:n1){ X1 <- X[ind1 != i,] Y1 <- Y[ind1 != i] coef1 <- reg3(Y1,X1)$coeff X2 <- X[ind1==i,] Y2 <- Y[ind1==i,] Yp <- X2%*%coef1 presshold[i] <- (Y2 - Yp)^2 } res <- mean(presshold) return(res) }

30 Try the function Use the data already created. PRESS(Y,X)

31 If…then constructs If you are interested in an if… then statement on a vector use the ifelse() function.  ifelse(condition, True action, False action) Example X1 <- runif(15,0,1) X2 <- ifelse(X1<.5,1,0) cbind(X1,X2)  Did it work?

32 If…then constructs If you are not interested in a vector, then use the if{}else{} construct.

33 Source Files Source files allows you to store all of your created functions in a single file and have all those functions available to you. To load a self created library use: source(Path) Don’t forget that \ in the path needs to be replaced with \\

34 Writing to a file To write to a file use the write.table() function. write.table(dataset, path, sep=“,”, header=TRUE) This will produce a comma separated value (csv) file.

35 Linear Algebra Extras Eigenvalues and eigenvectors use the eigen() function. This gives an object that contains both the eigenvalues and eigenvectors Example: eigen(XtX) $values [1] 77901.567997 1375.036486 456.253787 1.847225 $vectors [,1] [,2] [,3] [,4] [1,] -0.03534617 -0.02144023 -0.02084214 0.99892771 [2,] -0.18185911 -0.21669130 -0.95864785 -0.03108754 [3,] -0.54097195 -0.79158676 0.28253402 -0.03023690 [4,] -0.82038239 0.57094272 0.02710023 -0.01620879

36 Summary R is programming environment with many standard programming structures already included. Easy to create functions. No support. Allows users to create a library of functions.

37 Summary All of the R code and files can be found at: www.people.vcu.edu/~elboone2/CSS.htm


Download ppt "Introduction to Programming in R Department of Statistical Sciences and Operations Research Computation Seminar Series Speaker: Edward Boone"

Similar presentations


Ads by Google