R programming language Dardan Xhymshiti 15-JUN-2016
What is R? Statistical programming language: Statistical computing Graphics R was created by Ross Ihaka and Robert Gentleman at the University of Auckland, New Zealand. R is an open source project licensed under GNU General Public License and currently being developed by R Development Core Team.
R statistical features There are a lot of libraries out there for R. These libraries implement a wide variety of statistical and graphical techniques: Linear and nonlinear modeling, Classical statistical tests, Time-series analysis, Classification, Clustering etc. Able to create publication-quality graphs, including math symbols. Dynamic and interactive graphs. R has Rd, its own LaTeX-like document format.
R programming features: Interpreted programming language Dynamically typed language Syntax similar to MATLAB Data structures: Vectors, matrices, arrays, data frames (similar to tables in relational databases), lists. A scalar is represented as a vector with length one. R has more than 7,801 additional libraries.
Downloading and installing 1. Open https://www.r-project.org/ in your browser 2. Click on CRAN. You will see a list of mirror sites, organized by country. 3. Select a site near you. 4. Click “Download R for windows” 5. Click on “base”. 6. Click on the link for downloading the latest version of R. 7. When the download completes, double-click on the .exe file and answer the usual questions.
1. Entering commands To get started, treat R as a calculator. Write: 1+1 max(1, 5, 10) min(1, 5, + 10) 5*4-3+1
2. Getting help on a Function Display the documentation for the function help(function_name) Use args for a quick remainder of the function arguments args(function_name) Use example to see examples of using the function example(function_name)
3. Printing something If you enter the variable name or expression at the command prompt, R will print the value: pi sqrt(2) Print function knows how to format any R value for printing, including structured values such as matrices and lists: print(matrix(c(1,2,3,4),2,2)) print(list("a", "b", "c")) Use cat instead of print, to combine multiple items into a continuous output: cat("The zero occurs at", 2*pi, "radians.", "\n")
3. Setting variables There is no need to declare variables. To initialize a variable use the assignment operator (<−): Assigning different values to the same variable: x <- 3 x <- 4 z <- sqrt(x^2 + y^2) print(z) x <- 3 print(x) [1] 3 x <- c(“one”, “two”, “three”) [1] <- “one” “two” “three”
3. Setting variables Setting a global variable: x≪−3 Other assigning operators: = −> Write mode(variable_name) to get the runtime type of the variable > foo = 3 > print(foo) [1] 3 > 5 -> fum > print(fum) [1] 5
4. Creating a vector To create a vector we use the 𝑐(…) A vector can contain either numbers, strings, or logical values but not a mixture. C(1,1,2,3,5,8,13,21) [1] 1 1 2 3 5 8 13 21 C(1*pi, 2*pi, 3*pi, 4*pi) [1] 3.141593, 6.283185, 9.424778, 12.566371 C(“one”, “two”, “three”) [1] “one” “two” “three” C(TRUE, TRUE, FALSE, TRUE) [1] TRUE TRUE FALSE TRUE
5. Computing basic statistics Mean Mean(x) Median Median(x) Standard deviation Sd(x) Variance Var(x) Correlation Cor(x,y) Covariance Cov(x,y)
6. Creating sequences Use n:m expression to create the simple sequence 𝑛, 𝑛+1, 𝑛+2, …,𝑚 1:5 [1] 1 2 3 4 5 Seq(from=1, to=5, by=2) [1] 1 3 5 Rep(1, times=5) [1] 1 1 1 1 1
7. Comparing vectors Comparing a vector with another one, or comparing a vector with a scalar. Comparison operators: ==, !=, <, >, <=, >= Result: a vector of TRUE and FALSE logical values: v <- c(3, pi, 4) w <- c(pi, pi, pi) v == w [1] FALSE TRUE FALSE v < w [1] TRUE FALSE FALSE v != w [1] TRUE FALSE TRUE
8. Defining a function Create a function by using the function keyword followed by a list of parameters and the function body. Syntax: function(param_1,..., param_n) { expressions } A function can be passed as parameter to another function. fn <- function(x) sd(x)/mean(x) fn(1:10) [1] 0.5504819 gcd <- function(a,b){ + if(b==0) return (a) + else return(gcd(b,a%%b)) +}
9. Accessing Built-in Datasets Dataset package comes already with R installation. Dataset package contains 104 datasets. You can list the available datasets by writing: data() To access datasets in other packages, use the data function like this: data(dataset_name, package=“package_name”) If you want to know more about a dataset write: help(dataset_name) Get some statistics about a dataset by writing: summary(dataset_name)
10. Packages library() View the list of installed packages install.packages(“package_name”) Installing packages from CRAN
11. Getting and Setting the Working Directory getwd() Get the workspace directory setwd(“path”) Set the workspace directory
12. Reading tabular data files Insert a dataset.txt in your directory. phones <- read.table(“phones.txt”) phones <- read.table(“phones.txt”, sep=“:“) phones <- read.table(“phones.txt”, stringsAsFactor=FALSE) By default, every attribute is considered as factor. Defining stringsAsFactor we can tell R to show the real attribute types. Use class(ph$Vi)where i=1,…,d to get the type of the attribute phones <- read.table(“phones_header.txt”, header=TRUE, stringsAsFactor=FALSE) It considers the names of attributes.