A quick introduction to R prog. 淡江統計 陳景祥 (Steve Chen)
Major features (1) Contains c onstant, simple variables, vector and array variables (2) With Input and Output instructions (3) Has conditional expression ( e.g., if, else ) (4) With do loop structures (e.g., for, while, repeat until ) (5) Can define program modules (e.g., function) Programming => Freedom !
R Output Function output of R are stored in LIST variables, you can choose to see the value of any list variable > x = rnorm(20); y = rnorm(20) # generate 20 samples of the standard normal dist. > lm.result = lm(y ~ x) # A simple regression >lm.result # note that lm.result is variable name. Call: lm(formula = y ~ x) Coefficients: (Intercept) x > names(lm.result) [1] "coefficients" "residuals" "effects" "rank" [5] "fitted.values" "assign" "qr" "df.residual" [9] "xlevels" "call" "terms" "model" > lm.result$coefficients (Intercept) x
Simple prog. examples x = c(1,2,3) y = c(11,12,13) for (i in 1:3) { x[i] = x[i]+100 } x # User defined function: f1 f1 = function(a,b) { c = a+b return(c) } z = f1(x,y) # Now a = x, b = y if (is.vector(z)) { cat("z is a vector! \n") } else { cat("z is NOT a vector! \n") } fit = lm(y ~ x) # Simple regression summary(fit)
R Example 1-(1) > # the scores and sexes of 6 students,stored in scores and gender vector variable > scores = c(60,55,64,66,55,56) > gender = c(“male",“male",”female",”frmale",”male",“female") > scores[2] # the 2 nd element of scores [1] 55 > scores[2:4] # 2:4 = c(2,3,4), the 2,3,4 elements [1] > scores[c(1,3,4,5)] # the 1,3,4, 5 elements of scores [1] > scores[scores > 60] [1] > # compute the average > mean(scores) [1] > # compute the standard deviation > sd(scores) [1]
R Example 1-(2) > # compute variance > var(scores) [1] > # compute the median > median(scores) [1] 58 > # compute the 25th Percentile > quantile(scores,0.25) 25% > scores[gender == “male"] [1] > mean(scores[gender == “male"]) [1] > table(gender, scores) scores gender female male > hist(scores) # histogram
R Example 2 : Regression > ( IQ = round(rnorm(6,110,15)) ) # generate 6 samples from N(110, 15) ,and rounding [1] > ( scores = *IQ + rnorm(6,0,2) ) # course score = * IQ + random error (N(0,2)) [1] > summary( lm(scores ~ IQ) ) # Call: lm(formula = scores ~ IQ) Residuals: Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) IQ *** --- Signif. codes: 0 ‘***’ ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: on 4 degrees of freedom Multiple R-squared: , Adjusted R-squared: F-statistic: on 1 and 4 DF, p-value:
R Input function Read a vector : X = scan(“c:/dir1/sample.txt”) Read a data frame : read.table(“c:/dir1/d.txt”,header=T) read.csv(“c:/dir1/d.csv”, header=T)
R Output function Output a vector : cat(scores, file=“c:/dir2/scores.txt") write(scores, file=“c:/dir2/scores2.txt") Output a data frame : write.table(X, “c:/dir2/data.txt”,row.names = FALSE, sep=“ “ ) write.csv(X, “c:/dir2/data.txt”,row.names = FALSE )
R special operations (1) Vector operations > x = c(1,2,3,4,5) # c stands for concatenate > y = c(10,20,30,40,50) > x + y [1] > x - y [1] > x * y [1] > y ^ x [1] > x [1]
R sepcial operations (2) Data filtering using vectors, matrices and pointer array > x = c(1,2,3,4,5) > y = c(10,20,30,40,50) > x >= 3 [1] FALSE FALSE TRUE TRUE TRUE > ( x.index = x >= 3 ) [1] FALSE FALSE TRUE TRUE TRUE > x[x >= 3] # 或 x[ x.index ] [1] > x[c(F,F,T,T,T)] [1] > x[c(3,4,5)] [1] > x[y >= 20] [1] > z = c("boy","girl","boy","boy","girl") > y[z == "boy"] [1]
R special operations (3) LIST variable : to store the function Output > x = rnorm(20); y = rnorm(20) # 2 N(0,1) samples > lm.result = lm(y ~ x) # simple regression, output is stored in lm.result > lm.result Call: lm(formula = y ~ x) Coefficients: (Intercept) x > names(lm.result) # the components of the output [1] "coefficients" "residuals" "effects" "rank" …….. > lm.result$coefficients (Intercept) x > lm.result$coefficients[2] x
R special operations(4) Factor (factor) variable : for nominal data (categorical data), used in classification > gender = c(“male",”male",”female",”female",”male",”female") > gender2 = as.factor(gender) > gender2 [1] male male female female male female Levels: female male > levels(gender2) [1] “female" “male"
R special operations(5) Data-Frame variables > iris Sepal.Length Sepal.Width Petal.Length Petal.Width Species setosa setosa virginica virginica > iris$Sepal.Length [1] [145] > iris$Species [1] setosa setosa setosa setosa setosa setosa [145] virginica virginica virginica virginica virginica virginica Levels: setosa versicolor virginica
R special operations(6) Function and function as an argument f1 = function(x,f0,…) { result = f0(x,…) return(result) } y = rnorm(100) f1(y,mean,trim=0.05,na.rm=T) # Now f0(x,...) = mean(x,trim=0.05,na.rm=T) [1]
R special operation (7) Use cat in function for explanation > f2 = function(x) { + mean(x) + var(x) } > f2(y) [1] > f2 = function(x){ + cat("mean of X = ",mean(x),"\n") + var(x) + } > f2(y) mean of X = [1]