A quick introduction to R prog. 淡江統計陳景祥 (Steve Chen)

A quick introduction to R prog. 淡江統計陳景祥 (Steve Chen) steve@stat.tku.edu.tw

Major features (1) Contains c onstant, simple variables, vector and array variables (2) With Input and Output instructions (3) Has conditional expression ( e.g., if, else ) (4) With do loop structures (e.g., for, while, repeat until ) (5) Can define program modules (e.g., function) Programming => Freedom ！

R Output Function output of R are stored in LIST variables, you can choose to see the value of any list variable > x = rnorm(20); y = rnorm(20) # generate 20 samples of the standard normal dist. > lm.result = lm(y ~ x) # A simple regression >lm.result # note that lm.result is variable name. Call: lm(formula = y ~ x) Coefficients: (Intercept) x 0.2781 -0.2354 > names(lm.result) [1] "coefficients" "residuals" "effects" "rank" [5] "fitted.values" "assign" "qr" "df.residual" [9] "xlevels" "call" "terms" "model" > lm.result$coefficients (Intercept) x 0.2781229 -0.2353573

Simple prog. examples x = c(1,2,3) y = c(11,12,13) for (i in 1:3) { x[i] = x[i]+100 } x # User defined function: f1 f1 = function(a,b) { c = a+b return(c) } z = f1(x,y) # Now a = x, b = y if (is.vector(z)) { cat("z is a vector! \n") } else { cat("z is NOT a vector! \n") } fit = lm(y ~ x) # Simple regression summary(fit)

R Example 1-(1) > # the scores and sexes of 6 students，stored in scores and gender vector variable > scores = c(60,55,64,66,55,56) > gender = c(“male",“male",”female",”frmale",”male",“female") > scores[2] # the 2 nd element of scores [1] 55 > scores[2:4] # 2:4 = c(2,3,4), the 2,3,4 elements [1] 55 64 66 > scores[c(1,3,4,5)] # the 1,3,4, 5 elements of scores [1] 60 64 66 55 > scores[scores > 60] [1] 64 66 > # compute the average > mean(scores) [1] 59.33333 > # compute the standard deviation > sd(scores) [1] 4.802777

R Example 1-(2) > # compute variance > var(scores) [1] 23.06667 > # compute the median > median(scores) [1] 58 > # compute the 25th Percentile > quantile(scores,0.25) 25% 55.25 > scores[gender == “male"] [1] 60 55 55 > mean(scores[gender == “male"]) [1] 56.66667 > table(gender, scores) scores gender 55 56 60 64 66 female 0 1 0 1 1 male 2 0 1 0 0 > hist(scores) # histogram

R Example 2 ： Regression > ( IQ = round(rnorm(6,110,15)) ) # generate 6 samples from N(110, 15) ，and rounding [1] 118 121 107 108 87 131 > ( scores = 5 + 0.6*IQ + rnorm(6,0,2) ) # course score = 5 + 0.6 * IQ + random error (N(0,2)) [1] 79.09683 77.53507 69.82176 69.17460 55.32896 84.49939 > summary( lm(scores ~ IQ) ) # Call: lm(formula = scores ~ IQ) Residuals: 1 2 3 4 5 6 2.4883 -1.0897 0.6060 -0.7132 -0.4453 -0.8461 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -2.69628 5.05144 -0.534 0.621786 IQ 0.67207 0.04476 15.014 0.000115 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 1.514 on 4 degrees of freedom Multiple R-squared: 0.9826, Adjusted R-squared: 0.9782 F-statistic: 225.4 on 1 and 4 DF, p-value: 0.0001147

R Input function Read a vector ： X = scan(“c:/dir1/sample.txt”) Read a data frame ： read.table(“c:/dir1/d.txt”,header=T) read.csv(“c:/dir1/d.csv”, header=T)

R Output function Output a vector ： cat(scores, file=“c:/dir2/scores.txt") write(scores, file=“c:/dir2/scores2.txt") Output a data frame ： write.table(X, “c:/dir2/data.txt”,row.names = FALSE, sep=“ “ ) write.csv(X, “c:/dir2/data.txt”,row.names = FALSE )

R special operations (1) Vector operations > x = c(1,2,3,4,5) # c stands for concatenate > y = c(10,20,30,40,50) > x + y [1] 11 22 33 44 55 > x - y [1] -9 -18 -27 -36 -45 > x * y [1] 10 40 90 160 250 > y ^ x [1] 10 400 27000 2560000 312500000 > x + 100 [1] 101 102 103 104 105

R sepcial operations (2) Data filtering using vectors, matrices and pointer array > x = c(1,2,3,4,5) > y = c(10,20,30,40,50) > x >= 3 [1] FALSE FALSE TRUE TRUE TRUE > ( x.index = x >= 3 ) [1] FALSE FALSE TRUE TRUE TRUE > x[x >= 3] # 或 x[ x.index ] [1] 3 4 5 > x[c(F,F,T,T,T)] [1] 3 4 5 > x[c(3,4,5)] [1] 3 4 5 > x[y >= 20] [1] 2 3 4 5 > z = c("boy","girl","boy","boy","girl") > y[z == "boy"] [1] 10 30 40

R special operations (3) LIST variable ： to store the function Output > x = rnorm(20); y = rnorm(20) # 2 N(0,1) samples > lm.result = lm(y ~ x) # simple regression, output is stored in lm.result > lm.result Call: lm(formula = y ~ x) Coefficients: (Intercept) x 0.2781 -0.2354 > names(lm.result) # the components of the output [1] "coefficients" "residuals" "effects" "rank" …….. > lm.result$coefficients (Intercept) x 0.2781229 -0.2353573 > lm.result$coefficients[2] x -0.2353573

R special operations(4) Factor (factor) variable ： for nominal data (categorical data), used in classification > gender = c(“male",”male",”female",”female",”male",”female") > gender2 = as.factor(gender) > gender2 [1] male male female female male female Levels: female male > levels(gender2) [1] “female" “male"

R special operations(5) Data-Frame variables > iris Sepal.Length Sepal.Width Petal.Length Petal.Width Species 1 5.1 3.5 1.4 0.2 setosa 2 4.9 3.0 1.4 0.2 setosa 3............................................................................................. 149 6.2 3.4 5.4 2.3 virginica 150 5.9 3.0 5.1 1.8 virginica > iris$Sepal.Length [1] 5.1 4.9 4.7 4.6 5.0 5.4 4.6 5.0 4.4 4.9 5.4 4.8 4.8 4.3 5.8 5.7 5.4 5.1.................................................................................................................. [145] 6.7 6.7 6.3 6.5 6.2 5.9 > iris$Species [1] setosa setosa setosa setosa setosa setosa................................................................................................. [145] virginica virginica virginica virginica virginica virginica Levels: setosa versicolor virginica

R special operations(6) Function and function as an argument f1 = function(x,f0,…) { result = f0(x,…) return(result) } y = rnorm(100) f1(y,mean,trim=0.05,na.rm=T) # Now f0(x,...) = mean(x,trim=0.05,na.rm=T) [1] -0.01223996

R special operation (7) Use cat in function for explanation > f2 = function(x) { + mean(x) + var(x) } > f2(y) [1] 0.9221541 > f2 = function(x){ + cat("mean of X = ",mean(x),"\n") + var(x) + } > f2(y) mean of X = -0.01626653 [1] 0.9221541

A quick introduction to R prog. 淡江統計陳景祥 (Steve Chen)

Similar presentations

Presentation on theme: "A quick introduction to R prog. 淡江統計陳景祥 (Steve Chen)"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

A quick introduction to R prog. 淡江統計 陳景祥 (Steve Chen)

Similar presentations

Presentation on theme: "A quick introduction to R prog. 淡江統計 陳景祥 (Steve Chen)"— Presentation transcript:

Similar presentations

About project

Feedback

A quick introduction to R prog. 淡江統計陳景祥 (Steve Chen)

Presentation on theme: "A quick introduction to R prog. 淡江統計陳景祥 (Steve Chen)"— Presentation transcript: