R workshop for Advanced users

Slides:



Advertisements
Similar presentations
R for Macroecology Functions and plotting. A few words on for  for( i in 1:10 )
Advertisements

Chapter 8 and 9 Review: Logical Functions and Control Structures Introduction to MATLAB 7 Engineering 161.
Tutorial 12 Working with Arrays, Loops, and Conditional Statements
Python plotting for lab folk Only the stuff you need to know to make publishable figures of your data. For all else: ask Sourish.
Programming Concepts MIT - AITI. Variables l A variable is a name associated with a piece of data l Variables allow you to store and manipulate data in.
Java Unit 9: Arrays Declaring and Processing Arrays.
Mathcad Variable Names A string of characters (including numbers and some “special” characters (e.g. #, %, _, and a few more) Cannot start with a number.
Basic R Programming for Life Science Undergraduate Students Introductory Workshop (Session 1) 1.
Chapter 5. Loops are common in most programming languages Plus side: Are very fast (in other languages) & easy to understand Negative side: Require a.
R-Graphics Day 2 Stephen Opiyo. Basic Graphs One of the main reasons data analysts turn to R is for its strong graphic capabilities. R generates publication-ready.
Sendplot Lori Shepherd April 06, sendplot The Problem: The Approach: How can we quickly generate useful data and get it back to scientists/collaborators.
Piotr Wolski Introduction to R. Topics What is R? Sample session How to install R? Minimum you have to know to work in R Data objects in R and how to.
A Brief Introduction to R Programming Darren J. Fitzpatrick, PhD The Bioinformatics Support Team 27/08/2015.
An Introduction to R graphics Cody Chiuzan Division of Biostatistics and Epidemiology Computing for Research I, 2012.
Hello.java Program Output 1 public class Hello { 2 public static void main( String [] args ) 3 { 4 System.out.println( “Hello!" ); 5 } // end method main.
R-Graphics Stephen Opiyo. Basic Graphs One of the main reasons data analysts turn to R is for its strong graphic capabilities. R generates publication-ready.
Lecture 26: Reusable Methods: Enviable Sloth. Creating Function M-files User defined functions are stored as M- files To use them, they must be in the.
Scientific Computing (w1) R Computing Workshops An Introduction to Scientific Computing workshop 1.
Ggplot2 A cool way for creating plots in R Maria Novosolov.
Plotting Complex Figures Using R
Digital Image Processing Introduction to M-function Programming.
STAT 534: Statistical Computing Hari Narayanan
INTRODUCTION TO MATLAB DAVID COOPER SUMMER Course Layout SundayMondayTuesdayWednesdayThursdayFridaySaturday 67 Intro 89 Scripts 1011 Work
Lattice- xyplot Yufeng Lin Haipeng Yao. How to use xyplot Xyplot(formula, data = parent.frame(), panel = if (is.null(groups)) "panel.xyplot" else "panel.superpose",
PHP Tutorial. What is PHP PHP is a server scripting language, and a powerful tool for making dynamic and interactive Web pages.
Pinellas County Schools
Review > mean(humidity, na.rm=T) > humidity[!is.na(humidity)] > X x Y Y[[1]][2,3] > zone.fac
16BIT IITR Data Collection Module If you have not already done so, download and install R from download.
Statistical Programming Using the R Language Lecture 2 Basic Concepts II Darren J. Fitzpatrick, Ph.D April 2016.
Introduction to R and Data Science Tools in the Microsoft Stack Jamey Johnston.
Lecture 2 Functions Trevor A. Branch FISH 553 Advanced R School of Aquatic and Fishery Sciences University of Washington.
PHP using MySQL Database for Web Development (part II)
Jefferson Davis Research Analytics
Introduction to R and Data Science Tools in the Microsoft Stack
PH2150 Scientific Computing Skills
ECE 1304 Introduction to Electrical and Computer Engineering
Conditional Expressions
Programming in R Intro, data and programming structures
Using R Graphs in R.
Digital Text and Data Processing
Array, Strings and Vectors
Introduction to R.
Python: Control Structures
Statistical Programming Using the R Language
Chapter 5 - Control Structures: Part 2
CS 115 Lecture 8 Structured Programming; for loops
Other Kinds of Arrays Chapter 11
Introduction to R Studio
2) Platform independent 3) Predefined functions
Even More on Dimensions
INTRODUCTION TO BASIC MATLAB
MATLAB DENC 2533 ECADD LAB 9.
11/10/2018.
PH2150 Scientific Computing Skills
Introduction to MATLAB
Use of Mathematics using Technology (Maltlab)
MATLAB Tutorial Dr. David W. Graham.
Principal Component Analysis (PCA)
PHP.
CSCI N207 Data Analysis Using Spreadsheet
Logical Operations In Matlab.
INTRODUCTION TO MATLAB
Lecture 7 – Delivering Results with R
Announcements P3 due today
R Course 1st Lecture.
Introduction to Programming
PHP an introduction.
R course 4th lecture.
R programming.
R for Statistics and Graphics
Presentation transcript:

R workshop for Advanced users Sohee Kang, PhD Math and Stats Learning Centre & Computer and Mathematical Sciences

Subscripting Subscripting can be used to access and manipulate the elements of objects like vectors, matrices, arrays, data frames and lists. Subscripting operations are fast and efficient, and should be the preferred method when dealing with data in R.

Numeric subscripts In R, the first element of an object has subscript 1. A vector of subscripts can be used to access multiple elements of an object. > x <- 1:10 > x [1] 1 2 3 4 5 6 7 8 9 10 > x[c(1,3,5)] [1] 1 3 5 Negative subscripts extract all elements of an object except the one specified. > x[-c(1,3,5)] [1] 2 4 6 7 8 9 10

character subscripts If a subscriptable object has names associated to it, a character string or vector of character strings can be used as subscripts. > x <- 1:10 > names(x) <- letters[1:10] > x[c("a", "b", "c")] a b c 1 2 3 Note: Negative character subscripts are not allowed.

logical subscripts We can use logical values to choose which elements of the object to access. Elements corresponding to TRUE in the logical vector are included, and elements corresponding to FALSE are ignored. > x <- 1:10; names(x) <- letters[1:10] > x>5 a b c d e f g h i j FALSE FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE TRUE > x[x > 5] f g h i j 6 7 8 9 10 # using logical subscript to modify the object > x[x > 5] <- 0 > x a b c d e f g h i j 1 2 3 4 5 0 0 0 0 0

subscripting multidimensional objects For multidimensional objects, subscripts can be provided for each dimension. To select all elements of a given dimension, use the “empty" subscript. > mat <- matrix(1:12, 3, 4, byrow=TRUE) > mat [,1] [,2] [,3] [,4] [1,] 1 2 3 4 [2,] 5 6 7 8 [3,] 9 10 11 12 > mat[5] [1] 6 > mat[2,2] > mat[1, ] [1] 1 2 3 4 > mat[c(1,3),] [,1] [,2] [,3] [,4] [1,] 1 2 3 4 [2,] 9 10 11 12

the order() function sorting rows of a matrix arbitrarily # sort the iris data frame by Sepal.Length > iris.sort <- iris[order(iris[,"Sepal.Length"]),] > head(iris.sort) Sepal.Length Sepal.Width Petal.Length Petal.Width Species 14 4.3 3.0 1.1 0.1 setosa 9 4.4 2.9 1.4 0.2 setosa 39 4.4 3.0 1.3 0.2 setosa 43 4.4 3.2 1.3 0.2 setosa 42 4.5 2.3 1.3 0.3 setosa 4 4.6 3.1 1.5 0.2 setosa Try to sort the iris data frame in decreasing order with respect to Sepal.Width.

the drop= argument avoiding dimension reduction By default, subscripting operations reduce the dimensions of an array whenever possible. To avoid that, we can use the drop=FALSE argument. > s1 <- mat[1,]; s1 [1] 1 2 3 4 > dim(s1) NULL > s2 <- mat[1,,drop=FALSE] > > s2 <- mat[1,,drop=FALSE]; s2 [,1] [,2] [,3] [,4] [1,] 1 2 3 4 > dim(s2) [1] 1 4

combined selections for matrices Suppose we want to get all the columns for which the element at the first row is less than 3: > mat[ , mat[1, ] <3] [,1] [,2] [1,] 1 2 [2,] 5 6 [3,] 9 10

complex logical expressions subscripting data frames > dat <- data.frame(a = seq(5, 20, by=3), b = c(8, NA, 12, 15, NA, 21)) > dat a b 1 5 8 2 8 NA 3 11 12 4 14 15 5 17 NA 6 20 21 > dat[dat$b < 10, ] 1 5 8 NA NA NA NA.1 NA NA > > # removing the missing values > dat[!is.na(dat$b) & (dat$b < 10), ] a b 1 5 8

the function subset() subscripting data frames The function subset()allows one to perform selections of the elements in a data frame in very simple way. > dat <- data.frame(a = seq(5, 20, by=3), b = c(8, NA, 12, 15, NA, 21)) > subset(dat, b < 10) a b 1 5 8 Note: The subset() function always returns a new data frame, matrix of vector, and is not adequate for modifying elements of a data frame.

Exercise 1

Loops and Functions A loop allows the program to repeatedly execute commands. Loops are common to many programming languages and their use may facilitate the implementation of many operations. There are three kinds of loops in R: `for' loops `while' loops `repeat' loops Note: Loops can be very inefficient in R. For that reason, their use is not advised, unless necessary.

`for' loops General form: for (variable in sequence) { set_of_expressions } > for(i in 1:10) { print(sqrt(i)) } [1] 1 [1] 1.414214 [1] 1.732051 ... [1] 3.162278

Easy Example: col.v <- rainbow(100) cex.v <- seq(1, 10, length.out=100) plot(0:1, 0:1, type="n") for(i in 1:200) { print(i) points(x=runif(1), y=runif(1), pch=16, col=sample(col.v, size=1), cex=sample(cex.v, size=1)) Sys.sleep(0.1) }

`while' loops General form: while (condition) { set_of_expressions } > a <- 0; b <- 1 > while(b < 10) { print(b) temp <- a+b a <- b b <- temp } [1] 1 [1] 2 [1] 3 [1] 5 [1] 8

`repeat' loops General form: repeat (condition) { set_of_expressions if (condition) { break } } > a <- 0; b <- 1 > repeat { print(b) temp <- a+b a <- b b <- temp if(b>=10){break} } [1] 1 [1] 2 [1] 3 [1] 5 [1] 8 Note: The loop is terminated by the break command.

cleaning the mess To have a cleaner version when working with loops, we can do: # Arithmetic Progression > x <- 1; d <- 2 > while (length(x) < 10) { position <- length(x) new <- x[position]+d x <- c(x,new) } > print(x) [1] 1 3 5 7 9 11 13 15 17 19

writing functions A function is a collection of commands that perform a specific task. General form: function.name <- function (arguments){ set_of_expressions return (answer) }

writing functions Example: Arithmetic Progression > AP <- function(a, d, n){ x <- a while (length(x) < n){ position <- length(x) new <- x[position]+d x <- c(x, new) } return(x)

writing functions Once you run this code, you will have available a new function called AP. To run the function, type on the console: > AP(1,2,10) [1] 1 3 5 7 9 11 13 15 17 19 > AP(1,0,10) [1] 1 1 1 1 1 1 1 1 1 1 Note that for d==0 the function is returning a sequence of ones. We can easily x this with an if statement.

the `if' statement General form: if (condition) { set_of_expressions } We can also combine the `if' with the `else' statement: } else {

the `if' statement > AP <- function(a, d, n){ if(d ==0) { return("Error: argument `d' should not be 0") break } else { x <- a while (length(x) < n){ position <- length(x) new <- x[position]+d x <- c(x, new) } return(x) > AP(1, 0, 3) [1] "Error: argument `d' should not be 0"

Exercise 2

Graphics R offers and incredible variety of graphs. Type this code to get a sense of what is possible: demo(graphics) x <- 10*(1:nrow(volcano)) y <- 10*(1:ncol(volcano)) image(x, y, volcano, col = terrain.colors(100), axes = FALSE) contour(x, y, volcano, levels = seq(90, 200, by = 5), add = TRUE, col = "peru") axis(1, at = seq(100, 800, by = 100)) axis(2, at = seq(100, 600, by = 100)) box() title(main = "Maunga Whau Volcano", font.main = 4) http://cran.r-project.org/web/views/Graphics.html

Managing graphics and graphical devices: opening multiple graphical devices data(iris) plot(iris$Sepal.Length, iris$Sepal.Width, pch=19) dev.new() plot(iris$Sepal.Length, iris$Petal.Length, pch=19) #you can also use "X11()", but it may not work in some Mac computers

jpeg(file="SepalLenght_vs_SepalWidth. jpeg") plot(iris$Sepal jpeg(file="SepalLenght_vs_SepalWidth.jpeg") plot(iris$Sepal.Length, iris$Sepal.Width, pch=19) dev.off #closes graphical device png(file="SepalLenght_vs_SepalWidth.png") dev.off() pdf(file="SepalLenght_vs_SepalWidth.pdf") postscript(file="SepalLenght_vs_SepalWidth.ps") # often used for publication

High level graphical functions (create a graph) iris[1:5,] plot(iris$Sepal.Length) plot(iris$Sepal.Length, iris$Sepal.Width) plot(iris$Petal.Length, iris$Petal.Width) plot(iris$Petal.Length, iris$Petal.Width, xlab="Sepal length (cm)", ylab="Petal Width (cm)", cex.axis=1.5, cex.lab=1.5, bty="n", pch=19) boxplot(iris$Sepal.Length ~ iris$Species) boxplot(iris$Sepal.Length ~ iris$Species, names=expression(italic("Iris setosa"), italic("Iris versicolor"), italic("Iris virginica")), ylab="Sepal length (cm)", cex.axis=1.5, cex.lab=1.5) coplot(iris$Petal.Length ~ iris$Petal.Width | iris$Sepal.Length, overlap=0, pch=19) pairs(iris) hist(iris$Sepal.Length)

Low level graphical functions (affect an existing graph) plot(iris$Petal.Length, iris$Petal.Width, xlab="Sepal length (cm)", ylab="Petal Width (cm)", cex=1.3, cex.axis=1.5, cex.lab=1.5, bty="n") points(iris$Petal.Length[iris$Species=="setosa"], iris$Petal.Width[iris$Species=="setosa"], cex=1.3, pch=19, col="red") points(iris$Petal.Length[iris$Species=="versicolor"], iris$Petal.Width[iris$Species=="versicolor"], cex=1.3, pch=19, col="blue") points(iris$Petal.Length[iris$Species=="virginica"], iris$Petal.Width[iris$Species=="virginica"], cex=1.3, pch=19, col="green") legend("topleft", c("Iris setosa", "I. versicolor", "I. virginica"), pch=19, col=c("red", "blue", "green"), cex=1.3) legend("bottomright", c(expression(italic("Iris setosa")), expression(italic("Iris versicolor")), expression(italic("Iris virginica"))), legend(1.3, 1.9, c(expression(italic("Iris setosa")),

Plotting in 2d The cars data frame is a two-column data set of cars speeds and stopping distances from the 1920s head(cars) speed dist 1 4 2 2 4 10 3 7 4 4 7 22 5 8 16 6 9 10

Plotting in 2d By default plot() produces a scatterplot >plot(cars) Axis labels are from the names in the data frame Axis scale is from the range of the data

Plotting in 2d To add details it’s better to use low-level functions. plot(cars,type="p") line(lowess(cars),col="red")

Combining plots To show multiple plots in one graphics window use the par() command with the mfrow Parameter: par(mfrow=c(2,2)) plot(cars,type="p") plot(cars,type="l") plot(cars,type="h") plot(cars,type="s")

Managing graphics and graphical devices: partitioning a graphical device layout(matrix(1:4, 2, 2)) #see help for "layout“ layout.show(4) #see help for "layout.show" plot(iris$Sepal.Length, iris$Sepal.Width, pch=19, cex.lab=1.5, cex.axis=1.5, xlab="Sepal length (cm)", ylab="Sepal width (cm)") plot(iris$Sepal.Length, iris$Petal.Length, pch=19, cex.lab=1.5, cex.axis=1.5, xlab="Sepal length (cm)", ylab="Petal length (cm)") plot(iris$Sepal.Length, iris$Petal.Width, pch=19, cex.lab=1.5, cex.axis=1.5, xlab="Sepal length (cm)", ylab="Petal width (cm)") plot(iris$Sepal.Length, iris$Petal.Width, pch=19, type="n", axes=F, bty="n", xlab="", ylab="") mtext("Sepal length", line=-3, cex=1.5) mtext("versus", line=-5, cex=1.5) mtext("other variables", line=-7, cex=1.5) mtext("in Anderson's Iris", line=-9, cex=1.5) layout(matrix(1:6, 3, 2)) layout.show(6) plot(iris$Sepal.Length, iris$Sepal.Width, pch=19) plot(iris$Sepal.Length, iris$Petal.Length, pch=19) plot(iris$Sepal.Length, iris$Petal.Width, pch=19) plot(iris$Sepal.Width, iris$Petal.Length, pch=19) plot(iris$Sepal.Width, iris$Petal.Width, pch=19) plot(iris$Petal.Length, iris$Petal.Width, pch=19)

Plotting in 3D Some data is well-suited to three dimensional plots The matrix volcano records elevations of the volcano Maunga Whau in New Zealand volcano[1:4,1:4] dim(volcano) x<-1:87 y<-1:61 #Default are a heat map image(x,y,volcano)

Plotting in 3D #Changing the color map image(x,y,volcano,col=terrain.colors(range(volcano)))

Plotting in 3D #A contour map contour(x,y,volcano) #A perspective plot persp(x,y,volcano)

The package ggplot2 The ggplot2 package was created in 2005 by Hadley Wickham It implements the object oriented design ideas of Leland Wilkinson’s The Grammar of Graphics.

The Package ggplot2 The function qplot() is a ggplot2 version of the regular function plot() library(ggplot2) qplot(speed,dist,data=cars)

The package ggplot2 Combining geoms can produce more complex plots library(ggplot2) library(foreign) dat <- read.dta("http:// www.ats.ucla.edu/stat/data/ ologit.dta") ggplot(dat, aes(x = apply, y = gpa)) + geom_boxplot(size = .75)

The package ggplot2 p<-ggplot(dat, aes(x = apply,y = gpa))+geom_boxplot(size = .75) p<-p+geom_jitter(alpha = .5) p