Introduction to R 02.10.2017 Samal Dharmarathna.

Slides:



Advertisements
Similar presentations
Introduction to R Brody Sandel. Topics Approaching your analysis Basic structure of R Basic programming Plotting Spatial data.
Advertisements

R for Macroecology Aarhus University, Spring 2011.
Introducing JavaScript
How to improve your Data Analysis Processes in your Web Application / ERP using RClass Juan Antonio Breña Moral
Training on R For 3 rd and 4 th Year Honours Students, Dept. of Statistics, RU Empowered by Higher Education Quality Enhancement Project (HEQEP) Department.
MATLAB – What is it? Computing environment / programming language Tool for manipulating matrices Many applications, you just need to get some numbers in.
Basics of Using R Xiao He 1. AGENDA 1.What is R? 2.Basic operations 3.Different types of data objects 4.Importing data 5.Basic data manipulation 2.
Experiences in Integration of the 'R' System into Kepler Dan Higgins – National Center for Ecological Analysis and Synthesis (NCEAS), UC Santa Barbara.
Working with JavaScript. 2 Objectives Introducing JavaScript Inserting JavaScript into a Web Page File Writing Output to the Web Page Working with Variables.
JavaScript, Third Edition
How to Use the R Programming Language for Statistical Analyses Part I: An Introduction to R Jennifer Urbano Blackford, Ph.D. Department of Psychiatry Kennedy.
What is R By: Wase Siddiqui. Introduction R is a programming language which is used for statistical computing and graphics. “R is a language and environment.
Introduction to MATLAB Session 1 Prepared By: Dina El Kholy Ahmed Dalal Statistics Course – Biomedical Department -year 3.
732A44 Programming in R.  Self-studies of the course book  2 Lectures (1 in the beginning, 1 in the end)  Labs (computer). Compulsory submission of.
Hands-on Introduction to R. Outline R : A powerful Platform for Statistical Analysis Why bother learning R ? Data, data, data, I cannot make bricks without.
Data, graphics, and programming in R 28.1, 30.1, Daily:10:00-12:45 & 13:45-16:30 EXCEPT WED 4 th 9:00-11:45 & 12:45-15:30 Teacher: Anna Kuparinen.
Objectives Understand what MATLAB is and why it is widely used in engineering and science Start the MATLAB program and solve simple problems in the command.
Arko Barman with modification by C.F. Eick COSC 4335 Data Mining Spring 2015.
Data Objects in R Vector1 dimensionAll elements have the same data types Data types: numeric, character logic, factor Matrix2 dimensions Array2 or more.
XP Tutorial 10New Perspectives on Creating Web Pages with HTML, XHTML, and XML 1 Working with JavaScript Creating a Programmable Web Page for North Pole.
Piotr Wolski Introduction to R. Topics What is R? Sample session How to install R? Minimum you have to know to work in R Data objects in R and how to.
Installing R CRAN: –(R homepage: –Windows 95 and later  Base –rw2001.exe.
Hands-on Introduction to R. We live in oceans of data. Computers are essential to record and help analyse it. Competent scientists speak C/C++, Java,
R Programming Yang, Yufei. Normal distribution.
Introduction to R Introductions What is R? RStudio Layout Summary Statistics Your First R Graph 17 September 2014 Sherubtse Training.
STAT 534: Statistical Computing Hari Narayanan
Matlab Introduction  Getting Around Matlab  Matrix Operations  Drawing Graphs  Calculating Statistics  (How to read data)
NET 222: COMMUNICATIONS AND NETWORKS FUNDAMENTALS ( NET 222: COMMUNICATIONS AND NETWORKS FUNDAMENTALS (PRACTICAL PART) Tutorial 2 : Matlab - Getting Started.
Math 252: Math Modeling Eli Goldwyn Introduction to MATLAB.
Hands-on Introduction to R. We live in oceans of data. Computers are essential to record and help analyse it. Competent scientists speak C/C++, Java,
Lecture 11 Introduction to R and Accessing USGS Data from Web Services Jeffery S. Horsburgh Hydroinformatics Fall 2013 This work was funded by National.
XP Tutorial 10New Perspectives on HTML, XHTML, and DHTML, Comprehensive 1 Working with JavaScript Creating a Programmable Web Page for North Pole Novelties.
Pinellas County Schools
16BIT IITR Data Collection Module If you have not already done so, download and install R from download.
Introduction to R and Data Science Tools in the Microsoft Stack Jamey Johnston.
Introduction to R.
Introduction to R and Data Science Tools in the Microsoft Stack
Block 1: Introduction to R
R Brown-Bag Seminar 2.1 Topic: Introduction to R Presenter: Faith Musili ICRAF-Geoscience Lab.
ECE 1304 Introduction to Electrical and Computer Engineering
Data Tools: R and RStudio
Lecture 2: Introduction to R
Programming in R Intro, data and programming structures
R basics workshop Sohee Kang Math and Stats Learning Centre
Introduction to R Fish 552: Lecture 1.
R programming language
Arko Barman COSC 6335 Data Mining Fall 2014
Introduction to R Carolina Salge March 29, 2017.
Introduction to R.
By Dr. Madhukar H. Dalvi Nagindas Khandwala college
MatLab Programming By Kishan Kathiriya.
Introduction Osborn.
Introduction to R Studio
R Programming.
R basics workshop Sohee Kang Math and Stats Learning Centre
INTRODUCTION TO BASIC MATLAB
MATLAB DENC 2533 ECADD LAB 9.
Lab 1 Introductions to R Sean Potter.
Introduction to R.
Introduction to Python
Use of Mathematics using Technology (Maltlab)
Communication and Coding Theory Lab(CS491)
Installing Packages Introduction to R, Part II
Basics of R, Ch Functions Help Managing your Objects
Demo of Basic Matlab Features
Vectors and Matrices In MATLAB a vector can be defined as row vector or as a column vector. A vector of length n can be visualized as matrix of size 1xn.
MIS2502: Data Analytics Introduction to R and RStudio
Programming For Big Data
R Course 1st Lecture.
Data analysis with R and the tidyverse
Presentation transcript:

Introduction to R 02.10.2017 Samal Dharmarathna

Today’s Class; A brief introduction to programing language R Installation Objects Operators Generating & manipulating data Functions Plotting and slight touch to packages Elementary Analysis Representing data in useful manner Hands on R

What is R? R is a language and environment for statistical computing and graphics, is similar to the S language and environment. R provides a wide variety of statistical (linear and nonlinear modelling, classical statistical tests, time-series analysis, classification, clustering, …) and graphical techniques, and is highly extensible. The S language is often the vehicle of choice for research in statistical methodology, and R provides an Open Source route to participation in that activity. Well-designed publication-quality plots can be produced, including mathematical symbols and formulae R is available as a Free Software Source: https://www.r-project.org/about.html

Why R is interesting? R is an interpreted language, not a compiled one, meaning that all commands typed on the keyboard are directly executed without requiring to build a complete program like in most computer languages (C, Fortran, Java, . . . ). R's syntax is very simple and intuitive. R manual R packages Very active R community Stackoverflow R-bloggers etc.

What is Rstudio? RStudio is a set of integrated tools designed to help you be more productive with R. It includes; A console Syntax-highlighting editor that supports direct code execution Variety of robust tools for plotting, viewing history, debugging and managing your workspace. For more features on Rstudio; https://www.rstudio.com/products/rstudio/features/

Installing R and RStudio Download R from the following link (windows, Mac, Linux) https://cran.r-project.org/ Download RStudio from the following link (windows, Mac, Linux) https://www.rstudio.com/products/rstudio/download/

Creating Objects Simple calculations Some objects [1] 15 > 25 -> b > b [1] 25 > c <- a + b > c [1] 40 > (10 + 5)*5 [1] 75 Identifies lowercase and UPPERCASE Objects in the memory > ls() [1] "a" "b" "c" "y" "Y" > y <- 5 > Y <- 50 > y [1] 5 > Y [1] 50 Clear objects in the memory rm(list = ls(all=TRUE))

Creating Objects All objects have two intrinsic attributes Mode Length The basic type of the elements of the object. There are four main modes as numeric, character, complex, and logical (FALSE or TRUE). Length The number of elements of the object.

Mode and Length > q <- "Hello" > s <- TRUE > p <- 7 > mode(p) [1] "numeric" > length(p) [1] 1 > q <- "Hello" > mode(q) [1] "character" > length(q) [1] 1 > s <- TRUE > mode(s) [1] "logical" > length(s) [1] 1 > t <- 5i > mode(t) [1] "complex" > length(t) [1] 1 > w <- (1:5) > w [1] 1 2 3 4 5 > mode(w) [1] "numeric" > length(w) [1] 5

Overview of the type of objects representing data Modes Possibility of several modes in same object vector Numeric, character, complex, logical No factor Numeric, character array matrix data frame Yes ts list Source: R for Beginners by Emmanuel Paradis

Reading and writing data in a file read.table(file, header = FALSE, sep = "") file : name of the file, possibly with its path, or a remote access to a file of type URL (http://...) header : a logical (FALSE or TRUE) indicating if the file contains the names of the variables on its first line the field separator used in the file, for instance sep=“\t" if it is a tabulation write.table(x, file = "", append = FALSE, quote = TRUE, sep= " ", row.names = TRUE, col.names= TRUE) x : the name of the object to be written file : the name of the file append : if TRUE, adds the data without erasing those possibly existing in the file

Reading and writing data in a file cont’d…. write.table(x, file = "", append = FALSE, quote = TRUE, sep= " ", row.names = TRUE, col.names= TRUE) quote : a logical or a numeric vector; if TRUE the variables of mode character and the factors are written within “ ", otherwise the numeric vector indicates the numbers of the variables to write within “ " (in both cases the names of the variables are written within “ " but not if quote = FALSE) sep : the field separator used in the file row.names : a logical indicating whether the names of the rows are written in the file col.names : same for the names of the columns

Operators Source: R for Beginners by Emmanuel Paradis

Generating Data Regular Sequences A regular sequence of integers > x <- 1:20 > x [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 The operator ‘:’ has priority on the arithmetic operators within an expression: > 1:(10-2) [1] 1 2 3 4 5 6 7 8 > 1:10-2 [1] -1 0 1 2 3 4 5 6 7 8

Generating Data cont’d…. Random Sequences It is useful in statistics to be able to generate random data, and R can do it for a large number of probability density functions. These functions are of the form rfunc(n, p1, p2, ...), where func indicates the probability distribution, n the number of data generated, and p1, p2, . . . are the values of the parameters of the distribution. > rnorm(10, mean = 0, sd = 1)

Accessing the values of an object: the indexing system The indexing system is an efficient and flexible way to access selectively the elements of an object. > z <- 1:10 > z [1] 1 2 3 4 5 6 7 8 9 10 > z[5] [1] 5 > z[7] <- 70 [1] 1 2 3 4 5 6 70 8 9 10 > m <- matrix(1:6, 2, 3) > m [,1] [,2] [,3] [1,] 1 3 5 [2,] 2 4 6 > m[,3] [1] 5 6 > m[2,] [1] 2 4 6

Arithmetic's and simple functions There are numerous functions in R to manipulate data. The simplest one c, concatenates the objects listed in parentheses. > c(1:5, seq(10,11,0.2)) [1] 1.0 2.0 3.0 4.0 5.0 10.0 10.2 10.4 10.6 10.8 11.0 > p <- 1:4 > p [1] 1 2 3 4 > q <- rep(1,4) > q [1] 1 1 1 1 > r <- p+q > r [1] 2 3 4 5 > p <- 1:4 > p [1] 1 2 3 4 > s <- 10 > s [1] 10 > t <- p*s > t [1] 10 20 30 40

Matrix computation R has facilities for matrix computation and manipulation. The functions rbind and cbind bind matrices with respect to the rows and the columns, respectively. > m1 <- matrix(1, nrow = 2, ncol = 2) > m1 [,1] [,2] [1,] 1 1 [2,] 1 1 > m2 <- matrix(2, nrow = 2, ncol = 2) > m2 [1,] 2 2 [2,] 2 2 > rbind(m1,m2) [,1] [,2] [1,] 1 1 [2,] 1 1 [3,] 2 2 [4,] 2 2 > cbind(m1,m2) [,1] [,2] [,3] [,4] [1,] 1 1 2 2 [2,] 1 1 2 2

The operator for the product of two matrices is ‘%*%’ > m3 <- rbind(m1,m2) %*% cbind(m1,m2) > m3 [,1] [,2] [,3] [,4] [1,] 2 2 4 4 [2,] 2 2 4 4 [3,] 4 4 8 8 [4,] 4 4 8 8 The diagonal; > diag(m3) [1] 2 2 8 8 The transposition; > m5 <- matrix(1:8, nrow = 2, ncol = 4) > m5 [,1] [,2] [,3] [,4] [1,] 1 3 5 7 [2,] 2 4 6 8 > t (m5) [,1] [,2] [1,] 1 2 [2,] 3 4 [3,] 5 6 [4,] 7 8 > m4 <- cbind(m1,m2) %*% rbind(m1,m2) > m4 [,1] [,2] [1,] 10 10 [2,] 10 10

Plotting with R R offers a remarkable variety of graphics. Each graphical function has a large number of options for making the production of graphics very flexible. There are two kinds of graphical functions: the high-level plotting functions which create a new graph, and the low-level plotting functions which add elements to an existing graph. The plots can save as image or pdf files. > hist(y) (histogram of the frequencies of y)

R Packages A list of packages are distributed with base installation of R In addition to the default packages, a number of contributed packages are available Available on CRAN web site http://cran.r-project.org/src/contrib/PACKAGES.html

Hands on R and RStudio

Let’s Install R and RStudio Download R from the following link (windows, Mac, Linux) https://cran.r-project.org/ Download RStudio from the following link (windows, Mac, Linux) https://www.rstudio.com/products/rstudio/download/

Create Some Objects Simple calculations Some objects [1] 25 > 35 -> b > b [1] 35 > c <- a + b > c [1] 60 > (20 + 5)*4 [1] 100 Identifies lowercase and UPPERCASE Objects in the memory > ls() [1] "a" "b" "c" "y" "Y" > y <- 10 > Y <- 30 > y [1] 10 > Y [1] 30 Clear objects in the memory rm(list = ls(all=TRUE))

Percentiles and Quartiles > w <- seq(1,15,2) > w [1] 1 3 5 7 9 11 13 15 > quantile(w) 0% 25% 50% 75% 100% 1.0 4.5 8.0 11.5 15.0 > quantile(w, c(0.3,0.6,0.9)) 30% 60% 90% 5.2 9.4 13.6

Mean and Median > z <- c(seq(1,9,2),seq(10,16,2)) > z [1] 1 3 5 7 9 10 12 14 16 > summary(z) Min. 1st Qu. Median Mean 3rd Qu. Max. 1.000 5.000 9.000 8.556 12.000 16.000 > median(z) [1] 9 > mean(z) [1] 8.555556

Simple Functions > f1 <- function(x,y){ x+y } > f1(5,6) [1] 11 > f2 <- function(x,y){ x*y } > f2(5,6) [1] 30

Functions with multiple tasks > f3 <- function(x,y){ z1 <- 2*x + y z2 <- x + 2*y z3 <- 2*x + 2*y z4 <- x/y return(c(z1,z2,z3,z4)) } > f3(1,2) [1] 4.0 5.0 6.0 0.5

List indices (double square brackets) > f4 <- function(x,y){ z5 <- x + y z6 <- x + 2*y list(z5, z6) } f4(2,5) #Answer will list both z5 & z6 > f4(2,5) [[1]] [1] 7 [[2]] [1] 12

> f4(2,5)[[1]] #Only display z5 [1] 7 > f4(2,5)[[2]] #Only display z6 [1] 12

Elementary Analysis of a data set …..Some tips No. of person trips in different modes (as a percentage) A B C D E F G

Observations: Mean:…………… Median:…………. Mean:…………… Median:…………. Conclusions:

Mode distribution for different purposes A B C D E F G H I Purpose 0 20 40 60 80 100 Percentage % Observations:…………………… Conclusions:…………………….

Analysis of mode sharing for different purposes a. Purpose A b. Purpose B Observations:…………………… Conclusions:…………………….