Presentation is loading. Please wait.

Presentation is loading. Please wait.

Statistical Programming Using the R Language

Similar presentations


Presentation on theme: "Statistical Programming Using the R Language"— Presentation transcript:

1 Statistical Programming Using the R Language
Lecture 2 Basic Concepts II Darren J. Fitzpatrick, Ph.D June 2017

2 Lecture I - Recap Yesterday: Basic usage of RStudio
Some programming concepts Variables, Data Types, Data Structures, et.c Basic R syntax Dealing with data frames – indexing Reading and Writing Files

3 Lecture 2 - Overview Loops & Conditionals the WHILE loop the FOR loop
the if(){} statemnt Plotting Packages installing, loading

4 Loops & Control I Programming often deals with repetitive tasks.
We could code these tasks repetitively or encapsulate them in a loop – one piece of code does the same task a predetermined number of times. Loops - constructs that allow the automation of repetitive tasks without repeating the writing of code. Iteration – each pass through a loop. Control – the creation of a condition that determines the termination of a loop.

5 Create a loop to add 1 to variable x while x < 10
Loops & Control II The WHILE loop Create a loop to add 1 to variable x while x < 10 Tedious Solution x <- 0 x <- x + 1 . While Loop x <- 0 while(x < 10){ x <- x + 1} while(condition){do something}

6 Loops & Control III for (i in start:finish){do something} The FOR loop
Tedious Solution x <- 0 x <- x + 1 . For Loop x <- 0 for (i in 1:10){ x <- x + 1 } for (i in start:finish){do something}

7 Conditionals I if (condition){do something}
Similar to the WHILE loop, conditionals allow commands to be executed only when that condition is met. if (condition){do something} a <- 10 b <- 5 if (a >= b){ c <- a + b } What would happen if the condition a >= b were not true, say, a <= b?

8 Conditionals II if (condition 1){ do something }else if (condition 2){
The conditional if statement can be extended to any number of conditions. The else if() portion of the conditional can be repeated as often as required. In lecture one, we covered logical operators - conditions if (condition 1){ do something }else if (condition 2){ }else{ do something}

9 Some Examples – but first the preliminaries...
Yesterday you saved an RScript (problems.R) and an R session (problems.RData) in your R_Course folder. We need to: Reload the R session (.RData) Open the script (.R) if it does not open automatically Reset the the working directory

10 Preliminaries I Load the session from yesterday – problems.RData

11 Preliminaries II Open your script (problems.R)

12 Preliminaries III Set the working directory (wd) to be the R_Course folder. To set the wd, follow the above and navigate to the R_Course folder.

13 Preliminaries IV Yesterday, we read in a file called colon_cancer_data_set.txt and generated two dataframes, affected and unaffected from that data. df <- read.table('colon_cancer_data_set.txt', header=T) affected <- df[which(df$Status=='A'), 1:7464] unaffected <- df[which(df$Status=='U'), 1:7464] These variables should be available in the session problems.RData that you just loaded. Note! You can list the variables in your work space by running the ls() command in the console.

14 Problem I Iterate over the columns of the affected data and calculate the mean of each column. for (i in 1:ncol(affected)){ mean_exp <- mean(affected[,i]) print(mean_exp) } Printing the values illustrates the point but it doesn't allow you to store them in memory.

15 Problem II Iterate over the columns of the affected data, calculate the mean of each column and store the results as a variable. mean_holder <- c() for (i in 1:ncol(affected)){ mean_exp <- mean(affected[,i]) mean_holder <- c(mean_holder, mean_exp) }

16 FOR loops & apply() mean_holder <- c() for (i in 1:ncol(affected)){
mean_exp <- mean(affected[,i]) mean_holder <- c(mean_holder, mean_exp) } mean_a <- apply(affected, 2, mean) The output from the FOR loop is equivalent to the apply() function. In R, loops are sometimes necessary but R has tricks to avoid them. This can have enormous implications for compute time on large data sets. R loops are inefficient!

17 Basic Plotting R is suitable for making publication quality graphics.
R can generally create simple plots using a single function. We will look at the following plots: histograms (hist()) boxplots (boxplot()) scatterplots (plot(), scatterplot())

18 Random Data To illustrate the plotting functions, I am just going to use some random data. Randomly generate 1000 data points pulled from a normal distribution. var1 <- rnorm(1000) var2 <- rnorm(1000) Note, random data is very useful if you want to figure out how a function works.

19 Histograms I To produce histograms, we use the hist() function.
var1 <- rnorm(1000) var2 <- rnorm(1000) hist(var1)

20 Histograms II hist(var1, main='Distribution of Random Data',
xlab='Variable 1', col='darkgrey' ) abline(v=mean(var1), col='red')

21 Histograms III Using the par() function, it is possible to partition the plotting window into multiple squares to as to view multiple plots simultaneously. par(mfrow=c(1, 2)) # 1 rows, 2 columns hist(var1, xlab='Variable 1', col='darkgrey') abline(v=mean(var1), col='red') hist(var2, xlab='Variable 2', col='brown') abline(v=mean(var2), col='red')

22 Histograms IV Using the par()function, it is possible to partition the plotting window into multiple squares in order to view multiple plots simultaneously.

23 Colours R has an extensive repertoire of colour options for plots.
Plot colours are typically indicated by the col argument, e.g., col = 'darkred' col = 'gold' col = 'darksalmon'

24 Annotating Plots with Text
It is possible to add text to plots using the text() function. hist(var1, xlab='Variable 1', col='darkgrey') abline(v=mean(var1), col='red') text(0.5, 187, as.character(round(mean(var1), 2))) In my experience, the text() function is more hassle than it's worth and such changes are best made manually using something like photoshop.

25 Setting the limits on the x- and y-axes
hist(var1, xlab='Variable 1', col='darkgrey', xlim=c(-6, 6), ylim=c(0, 200)) abline(v=mean(var1), col='red') text(0.7, 200, as.character(round(mean(var1), 2)))

26 Boxplots I Boxplots (or box and whisker plots) are also a useful way of visualising the distribution of data. Boxplots show the median, the quartiles and the outliers. Boxplots also clearly demarcate outliers. Boxplots are compact – you can visualise many of them together to get an overview of multiple distributions

27 Notice the use of vectors, c(), to specify multiple values.
Boxplots II boxplot(var1, var2, names=c('Variable 1', 'Variable 2'), col=c('darkgrey', 'lightgrey')) Notice the use of vectors, c(), to specify multiple values.

28 Boxplots III Different ways of looking at the same data.
Do they capture the same information?

29 Scatterplots I plot(var1, var2, main='Scatterplot', xlab='Variable 1',
ylab='Variable 2') plot(var1, var2, main='Scatterplot', xlab='Variable 1', ylab='Variable 2', col='red', pch=20, # point type cex=0.2)# point size

30 Scatterplots II For plots that position points, the arguments pch and cex determine the point type and size, respectively. A selection of point types that can be set using pch argument.

31 Additional Plotting Functions
We have looked at the hist(), boxplot() and plot() functions. R has other 'base package' functions for plotting that work similarly to the above, e.g. barplot() scatterplot() pie() pairs() stripchart() dotchart()

32 Packages The base package in R consists of a repertoire of functions that come automatically with R. R has thousands of additional packages created by developers free of charge. We will install a third party plotting package called ggplot2. install.packages('ggplot2') # To install package R will prompt you a couple of times to install ggplot2 as a local library – type y (yes) for each prompt. library(ggplot2) # Load package for use

33 Slightly More Advanced Plotting
ggplot2 is perhaps the most elegant way of creating graphs in R. ggplot2 is a course in itself – I will give some examples of how it works. To read further: The quick way to using ggplot2 is the use of qplot() function which is part of the ggplot2 package. qplot(x, y, data=, color=, shape=, size=, alpha=, geom=, method=, formula=, facets=, xlim=, ylim= xlab=, ylab=, main=, sub=) The qplot() function

34 Slightly More Advanced Plotting – qplot() example
Make some data. var1 <- rnorm(1000) var2 <- rnorm(1000) lab1 <- rep('Variable_1', 1000) lab2 <- rep('Variable_2', 1000) var_df <- data.frame(vars= c(var1, var2), labs= c(lab1, lab2)) qplot(labs, vars, data=var_df, geom="boxplot", fill=labs, main='qplot() example', xlab='', ylab='Random Variables')

35 Slightly More Advanced Plotting – qplot() example
qplot(labs, vars, data=var_df, geom="boxplot", fill=labs, main='qplot() example', xlab='', ylab='Random Variables') ggplot2 is subject in itself. Below as a good starting point:

36 Lecture 2 – problem sheet
A problem sheet entitled lecture_2_problems.pdf is located on the course website ( Some of the code required for the problem sheet has been covered in this lecture. Consult the help pages if unsure how to use a function. Please attempt the problems for the next mins. We will be on hand to help out. Solutions will be posted this afternoon.

37 Thank You


Download ppt "Statistical Programming Using the R Language"

Similar presentations


Ads by Google