STAT 534: Statistical Computing Hari Narayanan

STAT 534: Statistical Computing Hari Narayanan harin@uw.edu

Course objectives Write programs in R and C tailored to specifics of statistics problems you want to solve Familiarize yourself with: – optimization techniques – Markov Chain Monte Carlo (mcmc)

Logistics Class: – Tuesdays and Thursdays 12:00pm – 1:20pm Office hours: – Thursday 2:30pm – 4pm (Padelford B-301) or by appt Textbooks: – Robert & Casella : Introducing Monte Carlo Methods with R – Kernighan & Ritchie : C Programming Language Evaluation: – 4 assignments – 2 quizzes – Final project

x + y addition x - y subtraction x * y multiplication x / y division x ˆ y exponentiation x % y modular arithmetic x %/% y integer division x == y test for equality x <= y test for less-than-or-equal x >= y test for greater-than-or-equal x && y boolean and for scalars x || y boolean or for scalars x & y boolean and for vectors (vector x,y,result) x | y boolean or for vectors (vector x,y,result) !x boolean negation

Quick Quiz- write your answers in your notebook What would be the response if you type the following on the R console? 1. q; ls; assign(a,2+3); “a”<- 2+4, a <- 2+4, v=1,2;

Answer to 1. q, ls will yield descriptions of the corresponding functions whereas q() will yield a prompt whether the session should be stored before quitting, ls( ) will yield the list of objects stored in the workspace. assign(a,2+3) yields an error message while assign(“a”,2+3) assigns to object `a’ the number 2+3. “a”<- 2+4 will leave “a” as the character “a” while a <- 2+4 correctly assigns to `a’ the number 2+4. v=1,2 yields an error message while v= c(1,2) assigns to `v’ the vector 1,2.

Questions 2,3,4 2. v=c(1,2); u=c(-5,v,7);u 3. a=c(1,2);b=c(1,2,3);r=3*a+b-1;r 4. sort(c(4,3,7)); sort(c(3,4,7), decreasing =TRUE)

Answers 2,3,4 2. [1] -5 1 2 7 3. There will be a warning message that the longer object length is not a multiple of the shorter object length but this will be followed by the value of `r’ obtained by recycling `a’ to match the length of `b’ i.e. [1] 3 7 5 4. [1] 3 4 7 [1] 7 4 3

Questions 5,6 5. r<- c(TRUE,FALSE); r; r*2 6. a<-10; 5:a+1

Answers 5,6 5. [1] TRUE FALSE [1] 2 0 (while computing r*2, the vector `r’ is coerced to a numerical vector with TRUE replaced with `1’ and FALSE replaced with `0’.) 6. [1] 6 7 8 9 10 11 (`:’ takes precedence over `+’ so we have (5,6,7,8,9,10)+(1,1,1,1,1,1), otherwise answer would have been 5,6,7,8,9,10,11)

Questions 7,8 7. seq(2,25,by=7) 8. rep(c(1,2,3),c(4,2,1))

Answers 7,8 7. [1] 2 9 16 23 (cannot cross 23, step size 7) 8. [1] 1 1 1 1 2 2 3 (1,2,3 repeated respectively 4,2,1 times.)

Question 9,10 9. a<-seq(3,46,5); a; a[-1:-3]; a[-1]-a[-length(a)] 10. a<-seq(3,46,5);a; dim(a)=c(3,3);a

Answers 9,10 9. [1] 3 8 13 18 23 28 33 38 43 [1] 18 23 28 33 38 43 (first second and third elements of `a’ removed) [1] 5 5 5 5 5 5 5 5 ( (8,13,18,23,28,33,38,43) - (3 8 13 18 23 28 33 38 )) 10. [1] 3 8 13 18 23 28 33 38 43 [,1] [,2] [,3] [1,] 3 18 33 2,] 8 23 38 (columnwise filling of the 3x3 matrix) [3,] 13 28 43

Question 11 11. matrix(c(1,2,3,4,5,6), ncol=3) ; matrix(c(1,2,3,4,5,6), nrow=2) ; matrix(c(1,2,3,4,5,6), nrow=2, byrow=TRUE)

Answers 11 11. [,1] [,2] [,3] [1,] 1 3 5 [2,] 2 4 6 ( matrix filled column by column ) [,1] [,2] [,3] [1,] 1 3 5 [2,] 2 4 6 ( matrix filled column by column ) [,1] [,2] [,3] [1,] 1 2 3 [2,] 4 5 6 (now filled row by row)

Question 12 12. m<- matrix(c(1:24),ncol=4); m; m[1,2]; m[1:4,2:4]; m[c(TRUE,FALSE,TRUE,FALSE,FALSE,FALSE),TRUE];

Answers 12 12. [,1] [,2] [,3] [,4] [1,] 1 7 13 19 [2,] 2 8 14 20 [3,] 3 9 15 21 [4,] 4 10 16 22 [5,] 5 11 17 23 [6,] 6 12 18 24 [1] 7 [,1] [,2] [,3] [1,] 7 13 19 [2,] 8 14 20 [3,] 9 15 21 [4,] 10 16 22 (rows 1,2,3,4 and cols 2,3,4) [,1] [,2] [,3] [,4] [1,] 1 7 13 19 [2,] 3 9 15 21 (rows 1, 3 and all columns)

Inputing matrices: binding cols or rows > cbind(c(1:5),c(5:9),c(9:13)) [,1] [,2] [,3] [1,] 1 5 9 [2,] 2 6 10 [3,] 3 7 11 [4,] 4 8 12 [5,] 5 9 13 >#or bind rows together as below > rbind(c(1:5),c(5:9),c(9:13)) [,1] [,2] [,3] [,4] [,5] [1,] 1 2 3 4 5 [2,] 5 6 7 8 9 [3,] 9 10 11 12 13

Using names for rows and columns > x [,1] [,2] [,3] [,4] [1,] 1 4 7 10 [2,] 2 5 8 11 [3,] 3 6 9 12 > rownames(x)<-LETTERS[1:3] > colnames(x)<-c("Red","Blue","Green","Yellow") > x Red Blue Green Yellow A 1 4 7 10 B 2 5 8 11 C 3 6 9 12

Transpose > cbind(c(1:2),c(5:6),c(9:10)) [,1] [,2] [,3] [1,] 1 5 9 [2,] 2 6 10 > t(cbind(c(1:2),c(5:6),c(9:10))) [,1] [,2] [1,] 1 2 [2,] 5 6 [3,] 9 10

*Altering elements of a matrix >z [,1] [,2] [,3] [,4] [1,] 1 4 7 10 [2,] 2 5 8 11 [3,] 3 6 9 12 > # Suppose we wish to alter z[1,2],z[1,4],z[3,4] respectively to -100,-200,-300

***** (skip first time) > #First build the corresponding index array ># Notice that if you write (1,2),(1,4),(3,4) one below the other and read the result column by column you get the sequence (1,1,3,2,4,4) > I <- array(c(1,1,3,2,4,4), dim=c(3,2)) > i [,1] [,2] [1,] 1 2 [2,] 1 4 [3,] 3 4 > # Now change z[i] > z[i]<-c(-100,-200,-300) > z [,1] [,2] [,3] [,4] [1,] 1 -100 7 -200 [2,] 2 5 8 11 [3,] 3 6 9 -300

Filtering on Matrices > z [,1] [,2] [,3] [,4] [1,] 1 4 7 10 [2,] 2 5 8 11 [3,] 3 6 9 12 z[z[,2] >= 5,] [,1] [,2] [,3] [,4] [1,] 2 5 8 11 [2,] 3 6 9 12 (all rows where second column entry is greater than or equal to 5)

*Apply() f<- function(x) {{y=x^3}} > apply(z,2,f) [,1] [,2] [,3] [,4] [1,] 1 64 343 1000 [2,] 8 125 512 1331 [3,] 27 216 729 1728 (the 2 in apply(z,2,f) refers to column-it means z is processed column by column ; note the result is always displayed column by column) > apply(z,1,f) [,1] [,2] [,3] [1,] 1 8 27 [2,] 64 125 216 [3,] 343 512 729 [4,] 1000 1331 1728 (confusing! Note that after applying the function on z row by row the result is displayed column by column)

Exercise b<-c(5:12); dim(b)= c(4,2) # What is b? d<-t(b); # What is d? dim(d)=c(1,8); # What is d? > x Red Blue Green Yellow A 1 4 7 10 B 2 5 8 11 C 3 6 9 12 #what is t(x)?

Answers > b [,1] [,2] [1,] 5 9 [2,] 6 10 [3,] 7 11 [4,] 8 12 > d [,1] [,2] [,3] [,4] [1,] 5 6 7 8 [2,] 9 10 11 12 > d [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [1,] 5 9 6 10 7 11 8 12 > t(x) A B C Red 1 2 3 Blue 4 5 6 Green 7 8 9 Yellow 10 11 12

*Exercise > z [,1] [,2] [,3] [,4] [1,] 1 4 7 10 [2,] 2 5 8 11 [3,] 3 6 9 12 > y =3];y > f<- function(x){x*x} > apply(y,1,f)

*Answer > y =3];y [,1] [,2] [,3] [1,] 4 7 10 [2,] 5 8 11 [3,] 6 9 12 > apply(y,1,f) [,1] [,2] [,3] [1,] 16 25 36 [2,] 49 64 81 [3,] 100 121 144

*sapply() for vectors > f<- function(x) {return(c(x,x^2))} > z<-c(-2:1) > f(z) [1] -2 -1 0 1 4 1 0 1 (just displays f(z) element by element as a vector) > sapply(z,f) [,1] [,2] [,3] [,4] [1,] -2 -1 0 1 [2,] 4 1 0 1 (displays z and f(z))

Lists Allows us to group objects of different sizes, dimensions and different modes. > x<-10; y<-c(1:5); z<-array(c(1:12),dim=c(3,4)) > lst<-list(x,y,z); lst [[1]] [1] 10 [[2]] [1] 1 2 3 4 5 [[3]] [,1] [,2] [,3] [,4] [1,] 1 4 7 10 [2,] 2 5 8 11 [3,] 3 6 9 12

We could have made it more informative using `names’ > lst <- list("NUM"=x,"VECTOR"=y,"MATRX"=z) > names(lst) [1] "NUM" "VECTOR" "MATRX" #Names are also called `tags’, the object with the tag is its `value’ > lst $NUM [1] 10 $VECTOR [1] 1 2 3 4 5 $MATRX [,1] [,2] [,3] [,4] [1,] 1 4 7 10 [2,] 2 5 8 11 [3,] 3 6 9 12

Operations on lists > lst_red<- lst[-3] (deleted the third object in the list) > lst_red $NUM [1] 10 $VECTOR [1] 1 2 3 4 5 > unlist(lst_red) NUM VECTOR1 VECTOR2 VECTOR3 VECTOR4 VECTOR5 10 1 2 3 4 5 (`looks inside’ each object of the argument list i.e. lst_red)

Accessing/indexing list elements > lst[[2]] [1] 1 2 3 4 5 # (gives the second object of the list) > lst[c(1,3)] $NUM [1] 10 $MATRX [,1] [,2] [,3] [,4] [1,] 1 4 7 10 [2,] 2 5 8 11 [3,] 3 6 9 12 (this is a sublist with first and third objects)

Adding elements to list > lstaug<-lst > lstaug$d<- c("RED","BLUE") #adding a new row > lstaug $NUM [1] 10 $VECTOR [1] 1 2 3 4 5 $MATRX [,1] [,2] [,3] [,4] [1,] 1 4 7 10 [2,] 2 5 8 11 [3,] 3 6 9 12 $d [1] "RED" "BLUE"

Deleting elements from list > lstaug $NUM [1] 10 $VECTOR [1] 1 2 3 4 5 $MATRX [,1] [,2] [,3] [,4] [1,] 1 4 7 10 [2,] 2 5 8 11 [3,] 3 6 9 12 $COLOR [1] "RED" "BLUE“ (lstaug contains four objects above; we will delete three of them) lstaug$MATRX<- NULL;lstaug$COLOR<-NULL;lstaug$NUM<-NULL; lstaug$VECTOR [1] 1 2 3 4 5

Exercise > x<-c("RED","BLUE") > y<-array(c(1:4),dim=c(2,2)) # Prepare a list containing x,y with x named “Character” and y named “vector”. Next add an element z to the list and Call it “Numeric”

Answer > lst<-list("CHARACTER"=x,"MATRIX"=y) > lst $CHARACTER [1] "RED" "BLUE" $MATRIX [,1] [,2] [1,] 1 3 [2,] 2 4 > lst$NUMERIC=10 > lst $CHARACTER [1] "RED" "BLUE" $MATRIX [,1] [,2] [1,] 1 3 [2,] 2 4 $NUMERIC [1] 10

Exercise contd. #From the list delete x Ans > lst<-lst[c(2,3)]; lst $MATRIX [,1] [,2] [1,] 1 3 [2,] 2 4 $NUMERIC [1] 10

Dataframes Intuitively a data frame is like a matrix, with a rows-and-columns structure. List of equal-length vectors, each column is one element of the list. The vectors might have different modes- numeric,logical, character etc.

An Example The scores of batsmen A,B,C,D,E at five venues CHE,MUM,HYD,DEL,KOL in 20 matches Are available in the dataframe ipl_scores. We will illustrate dataframe operations using this example

A quick look through head() > head(ipl_scores) venue A B C D E 1 CHE 19 31 31 37 31 2 MUM 20 31 40 31 31 3 HYD 24 30 31 36 33 4 DEL 24 25 47 30 26 5 KOL 24 32 32 28 29 6 CHE 7 29 48 41 37 (the first six rows with the header)

A quick look through str() #structure of ipl_scores > str(ipl_scores) 'data.frame': 20 obs. of 6 variables: $ venue: Factor w/ 5 levels "CHE","DEL","HYD",..: 1 5 3 2 4 1 5 3 2 4... $ A : num 19 20 24 24 24 7 21 18 13 12... $ B : num 31 31 30 25 32 29 30 23 41 22... $ C : num 31 40 31 47 32 48 34 39 44 43... $ D : num 37 31 36 30 28 41 28 32 39 34... $ E : num 31 31 33 26 29 37 28 31 24 32...

# How is it actually? > ipl_scores venue A B C D E 1 CHE 19 31 31 37 31 2 MUM 20 31 40 31 31 3 HYD 24 30 31 36 33 4 DEL 24 25 47 30 26 5 KOL 24 32 32 28 29 6 CHE 7 29 48 41 37 7 MUM 21 30 34 28 28 8 HYD 18 23 39 32 31 9 DEL 13 41 44 39 24 10 KOL 12 22 43 34 32 11 CHE 15 19 39 34 32 12 MUM 31 21 47 37 23 13 HYD 20 25 43 39 21 14 DEL 21 32 36 35 39 15 KOL 14 25 39 34 29 16 CHE 18 41 38 36 33 17 MUM 19 15 43 34 31 18 HYD 21 27 34 35 26 19 DEL 18 32 45 43 35 20 KOL 14 29 35 37 26

Ipl_scores is not authentic It was built like Ipl_scores2 is built below 1. Built five vectors a,b,d,g,h > a b c d g h<- round(5*rnorm(20)+25) 2. Built a character vector > v<-rep(c("DEL","MUM","CHE","HYD"),5)#ipl_scores has however five venues)

Ipl_scores2 > a [1] 25 23 24 23 21 16 20 17 15 9 13 11 27 20 24 22 21 25 29 11 > v [1] "DEL" "MUM" "CHE" "HYD" "DEL" "MUM" "CHE" "HYD" "DEL" "MUM" "CHE" "HYD" [13] "DEL" "MUM" "CHE" "HYD" "DEL" "MUM" "CHE" "HYD"

Ipl_scores2 >ipl_scores2<- data.frame(cbind("venue"=v,"A"=a,"B"=b,"C"=d," D"=g,"E"=h)) > head(ipl_scores2) venue A B C D E 1 DEL 25 22 50 32 23 2 MUM 23 30 38 33 23 3 CHE 24 29 41 32 19 4 HYD 23 33 49 33 27 5 DEL 21 36 51 45 29 6 MUM 16 30 53 48 25

Getting single columns and rows > ipl_scores$A [1] 19 20 24 24 24 7 21 18 13 12 15 31 20 21 14 18 19 21 18 14 (Column A displayed as a row) > ipl_scores[1,] A B C D E 1 19 31 31 37 31 (Row 1 displayed) names(ipl_scores) [1] "venue" "A" "B" "C" "D" "E"

Computations on rows and columns > colMeans(ipl_scores) A B C D E 18.65 28.00 39.40 35.00 29.85 #Let us now compute row means > rowMeans(ipl_scores) Error in rowMeans(ipl_scores) : 'x' must be numeric #Oops! That went wrong.The `venue’ column is not numeric. > rowMeans(ipl_scores[,-1]) [1] 29.8 30.6 30.8 30.4 29.0 32.4 28.2 28.6 32.2 28.6 27.8 31.8 29.6 32.6 28.2 [16] 33.2 28.4 28.6 34.6 28.2

Making an additional column of rowmeans and an additional row > rowMeans(ipl_scores)->r_M Now we will make r_M an additional column ipl_scoresaug<-cbind(ipl_scores,M=r_M) # Add a row of incomplete information >g<- data.frame(venue="MOH",A=30,B=NA,C=20,D=5, E=37,M=NA) >ipl_scoresaug <- rbind(ipl_scoresaug,g) >ipl_scoresaug

venue A B C D E M 1 CHE 19 31 31 37 31 29.8 2 MUM 20 31 40 31 31 30.6 3 HYD 24 30 31 36 33 30.8 4 DEL 24 25 47 30 26 30.4 5 KOL 24 32 32 28 29 29.0 6 CHE 7 29 48 41 37 32.4 7 MUM 21 30 34 28 28 28.2 8 HYD 18 23 39 32 31 28.6 9 DEL 13 41 44 39 24 32.2 10 KOL 12 22 43 34 32 28.6 # col M is new 11 CHE 15 19 39 34 32 27.8 #row 21 is new 12 MUM 31 21 47 37 23 31.8 13 HYD 20 25 43 39 21 29.6 14 DEL 21 32 36 35 39 32.6 15 KOL 14 25 39 34 29 28.2 16 CHE 18 41 38 36 33 33.2 17 MUM 19 15 43 34 31 28.4 18 HYD 21 27 34 35 26 28.6 19 DEL 18 32 45 43 35 34.6 20 KOL 14 29 35 37 26 28.2 21 MOH 30 NA 20 5 37 NA

Indexing/filtering data.frames Done as in the case of matrices ipl_scoresaug[2,3] will give 2° row 3° column entry. > ipl_scoresaug[4:10, ] -- rows 4 to 10 and all columns. > ipl_scoresaug[-c(1,2),-c(2)] --- interpreted as omit column 2 and rows 1 and 2.

Filtering dataframes > #picking all rows which have venue DEL > del<- b[b[,1]=“DEL”,];del venue A B C D E M 4 DEL 24 25 47 30 26 30.4 9 DEL 13 41 44 39 24 32.2 14 DEL 21 32 36 35 39 32.6 19 DEL 18 32 45 43 35 34.6

Adding a row of means to del g_del<- data.frame(venue="MDEL",A=mean(del[,2]),B =mean(del[,3]),C=mean(del[,4]),D=mean(del[, 5]),E=mean(del[,6]),M=mean(del[,7])) > g_del venue A B C D E M 1MDEL 19 32.5 43 36.75 31 32.45 m_del<-data.frame(rbind(del,g_del))

Dataframe `del’ with an additional `means’ row > m_del venue A B C D E M 4 DEL 24 25.0 47 30.00 26 30.40 9 DEL 13 41.0 44 39.00 24 32.20 14 DEL 21 32.0 36 35.00 39 32.60 19 DEL 18 32.0 45 43.00 35 34.60 5 MDEL 19 32.5 43 36.75 31 32.45

Functions f ) { # Do something } The return value of a function is the last expression in the body of the function that is to be evaluated Can be treated like any other R object. You can assign it to a variable Functions can be passed as arguments to other functions The `output’ of a function could be any R object eg. a list.

return value of a function is the last expression in the body of the function > g<- function(x){y=x+2;z=y^2;z-1} > g(2) [1] 15 # This is the same as > g<- function(x){y=x+2;z=y^2;return(z-1)} > g(2) [1] 15

You can assign a function to a variable > g <- function(x) { return(sin(x)) } > g function(x) { return(sin(x)) } Compare >y <- c(1:5) >y [1] 1 2 3 4 5

Can be passed as argument for other functions g2 <- function(g,a) {g(a)} > g2(sqrt,4) [1] 2 > g2(cos,pi) [1] -1 > g2(exp,1) [1] 2.718282

The `output’ of a function could be a list > g<- function(x,y,z){return (list(x,y*y,z))} > x<-c(1,2) > y<-array(c(1:6),dim=c(3,2)) > z<- c("RED","BLUE") > g(x,y,z) [[1]] [1] 1 2 [[2]] [,1] [,2] [1,] 1 16 [2,] 4 25 [3,] 9 36 (element by element multiplication of y with y) [[3]] [1] "RED" "BLUE"

Function does not alter values of global variables used in its construction > g<- function(x,y){x<-x+y;return ((x+y*y))} > x <- c(2,3) > y <- c(1,8) > g(x,y) [1] 4 75 > x [1] 2 3 > y [1] 1 8 (x,y have not changed)

Ifelse(boolean vector, vector, vector) > a <- seq(3,17,3);a [1] 3 6 9 12 15 > y <- ifelse(a % 4== 0, 1, 0) #y[i] acquires value 1 if a[i] is divisible by 4, else acquires value 0. > a % 4==0 [1] FALSE FALSE FALSE TRUE FALSE > y [1] 0 0 0 1 0

Ifelse(contd.) > a [1] 3 6 9 12 15 > y <- ifelse(a % 4== 0,TRUE,FALSE) > y [1] FALSE FALSE FALSE TRUE FALSE

Ifelse(ifelse..) x<- c(À`,`B`,`C`,`B`,À`,À`) (if you cut and paste Retype quotes!) > z<- ifelse(x=="B",2,3) > z [1] 3 2 3 2 3 3 > y<- ifelse(x=="A",1,ifelse(x=="B",2,3)) > y [1] 1 2 3 2 1 1 # first pass through x gives z, second pass modifies z to y

Exercise >b<- ipl_scoresaug > #change the venues in b as follows:CHE to MUM,DEL to KOL,KOL to CHE, HYD and MOH to MOH > b[,1]<- ifelse(b[,1]=="CHE","MUM",ifelse(b[,1]=="DEL","KOL",if else(b[,1]=="KOL","CHE","MOH"))) > b$venue [1] "MUM" "MOH" "MOH" "KOL" "CHE" "MUM" "MOH" "MOH" "KOL" "CHE" "MUM" "MOH" "MOH" "KOL" "CHE" "MUM" "MOH" "MOH" "KOL" "CHE" "MOH"

Answer > b[,1]<- ifelse(b[,1]=="CHE","MUM",ifelse(b[,1]=="DEL","KOL",ifelse(b[,1]=="KOL","CHE","MOH"))) > ipl_scoresaug[,1] [1] CHE MUM HYD DEL KOL CHE MUM HYD DEL KOL CHE MUM HYD DEL KOL CHE MUM HYD DEL KOL MOH Levels: CHE DEL HYD KOL MUM MOH

For loop a=N for (i in 1:a) {Do something} The `Do something’ could be for (j in fstart(i):fend(i)) {Do some special thing} Here fstart(i),fend(i) could be special functions defined earlier. Example fstart(i)= i-5;fend(i)=i-1;

For loop example m<- array(c(1:100),dim=c(10,10)) N<- 10 for(i in 2:N) { for(j in (i-1):i) {m[i,j]=0} } Makes all diagonal and subdiagonal elements below first row equal to zero

STAT 534: Statistical Computing Hari Narayanan

Similar presentations

Presentation on theme: "STAT 534: Statistical Computing Hari Narayanan"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

STAT 534: Statistical Computing Hari Narayanan

Similar presentations

Presentation on theme: "STAT 534: Statistical Computing Hari Narayanan"— Presentation transcript:

Similar presentations

About project

Feedback