Statistical Programming Using the R Language Lecture 2 Basic Concepts II Darren J. Fitzpatrick, Ph.D April 2016.

Slides:



Advertisements
Similar presentations
Introduction to R Brody Sandel. Topics Approaching your analysis Basic structure of R Basic programming Plotting Spatial data.
Advertisements

Introduction to Programming using Matlab Session 2 P DuffourJan 2008.
R for Macroecology Aarhus University, Spring 2011.
Statistical Methods Lynne Stokes Department of Statistical Science Lecture 7: Introduction to SAS Programming Language.
Introduction to Matlab
Introduction to Matlab Workshop Matthew Johnson, Economics October 17, /13/20151.
MATLAB – What is it? Computing environment / programming language Tool for manipulating matrices Many applications, you just need to get some numbers in.
R tutorial g/methods2.2010/R-intro.pdf.
R graphics  R has several graphics packages  The plotting functions are quick and easy to use  We will cover:  Bar charts – frequency, proportion 
Chapter 8 and 9 Review: Logical Functions and Control Structures Introduction to MATLAB 7 Engineering 161.
PHP (2) – Functions, Arrays, Databases, and sessions.
Lecture 2 LISAM. Statistical software.. LISAM What is LISAM? Social network for Creating personal pages Creating courses  Storing course materials (lectures,
Python plotting for lab folk Only the stuff you need to know to make publishable figures of your data. For all else: ask Sourish.
Baburao Kamble (Ph.D) University of Nebraska-Lincoln Data Analysis Using R Week5: Charts/Plots in R.
Extending MATLAB Write your own scripts and/or functions Scripts and functions are plain text files with extension.m (m-files) To execute commands contained.
Introduction to programming in MATLAB MATLAB can be thought of as an super-powerful graphing calculator Remember the TI-83 from calculus? With many more.
Programming For Nuclear Engineers Lecture 12 MATLAB (3) 1.
1 Chapter One A First Program Using C#. 2 Objectives Learn about programming tasks Learn object-oriented programming concepts Learn about the C# programming.
Advanced Web 2012 Lecture 4 Sean Costain PHP Sean Costain 2012 What is PHP? PHP is a widely-used general-purpose scripting language that is especially.
Basic R Programming for Life Science Undergraduate Students Introductory Workshop (Session 1) 1.
AN INTRODUCTION TO GRAPHICS IN R. Today Overview Overview –Gallery of R Graph examples High-Level Plotting Functions High-Level Plotting Functions Low-Level.
Baburao Kamble (Ph.D) University of Nebraska-Lincoln
11 Getting Started with C# Chapter Objectives You will be able to: 1. Say in general terms how C# differs from C. 2. Create, compile, and run a.
R-Graphics Day 2 Stephen Opiyo. Basic Graphs One of the main reasons data analysts turn to R is for its strong graphic capabilities. R generates publication-ready.
Introduction to Advanced UNIX March Kevin Keay.
1 Experimental Statistics - week 4 Chapter 8: 1-factor ANOVA models Using SAS.
Python: An Introduction
BMTRY 789 Introduction to SAS Programming Lecturer: Annie N. Simpson, MSc.
A Brief Introduction to R Programming Darren J. Fitzpatrick, PhD The Bioinformatics Support Team 27/08/2015.
An Introduction to R graphics Cody Chiuzan Division of Biostatistics and Epidemiology Computing for Research I, 2012.
I❤RI❤R Kin Wong (Sam) Game Plan Intro R Import SPSS file Descriptive Statistics Inferential Statistics GraphsQ&A.
Vectors and Matrices In MATLAB a vector can be defined as row vector or as a column vector. A vector of length n can be visualized as matrix of size 1xn.
Outline Comparison of Excel and R R Coding Example – RStudio Environment – Getting Help – Enter Data – Calculate Mean – Basic Plots – Save a Coding Script.
STAT 251 Lab 1. Outline Lab Accounts Introduction to R.
R-Graphics Stephen Opiyo. Basic Graphs One of the main reasons data analysts turn to R is for its strong graphic capabilities. R generates publication-ready.
Introduction to R Introductions What is R? RStudio Layout Summary Statistics Your First R Graph 17 September 2014 Sherubtse Training.
Ggplot2 A cool way for creating plots in R Maria Novosolov.
ENG College of Engineering Engineering Education Innovation Center 1 Basic For Loops in MATLAB Programming in MATLAB / Chapter 6 Topics Covered:
Introduction to Graphics in R 3/12/2014. First, let’s get some data Load the Duncan dataset It’s in the car package. Remember how to get it? – library(car)
R objects  All R entities exist as objects  They can all be operated on as data  We will cover:  Vectors  Factors  Lists  Data frames  Tables 
Introduction to plotting data Fish 552: Lecture 4.
Statistical Programming Using the R Language Lecture 3 Hypothesis Testing Darren J. Fitzpatrick, Ph.D April 2016.
Chris Knight Beginners’ workshop.
Lecture 11 Introduction to R and Accessing USGS Data from Web Services Jeffery S. Horsburgh Hydroinformatics Fall 2013 This work was funded by National.
Statistical Programming Using the R Language Lecture 1 Basic Concepts I Darren J. Fitzpatrick, Ph.D April 2016.
Statistical Programming Using the R Language Lecture 5 Introducing Multivariate Data Analysis Darren J. Fitzpatrick, Ph.D April 2016.
1-2 What is the Matlab environment? How can you create vectors ? What does the colon : operator do? How does the use of the built-in linspace function.
Statistical Programming Using the R Language
Statistical Programming Using the R Language
EMPA Statistical Analysis
Statistical Programming Using the R Language
Using R Graphs in R.
Computer Application in Engineering Design
Matlab Training Session 4: Control, Flow and Functions
Topics Introduction to Repetition Structures
Statistical Programming Using the R Language
PYTHON: AN INTRODUCTION
CS1371 Introduction to Computing for Engineers
Summary Statistics in R Commander
R Assignment #4: Making Plots with R (Due – by ) BIOL
While Loops BIS1523 – Lecture 12.
Exploring Microsoft Excel
Code is on the Website Outline Comparison of Excel and R
Introduction to Advanced UNIX
Loop Statements & Vectorizing Code
Vectors and Matrices In MATLAB a vector can be defined as row vector or as a column vector. A vector of length n can be visualized as matrix of size 1xn.
Lecture 7 – Delivering Results with R
R Course 1st Lecture.
R course 6th lecture.
R tutorial
Presentation transcript:

Statistical Programming Using the R Language Lecture 2 Basic Concepts II Darren J. Fitzpatrick, Ph.D April 2016

Lecture I - Recap Yesterday: Basic usage of RStudio Some programming concepts Variables, Data Types, Data Structures, et.c Basic R syntax Dealing with data frames – indexing Reading and Writing Files

Trinity College Dublin, The University of Dublin Lecture 2 - Overview Loops & Conditionals the WHILE loop the FOR loop the if(){} statemnt Plotting Packages installing, loading

Trinity College Dublin, The University of Dublin Loops & Control I Programming often deals with repetitive tasks. We could code these tasks repetitively or encapsulate them in a loop – one piece of code does the same task a predetermined number of times. Loops - constructs that allow the automation of repetitive tasks without repeating the writing of code. Iteration – each pass through a loop. Control – the creation of a condition that determines the termination of a loop.

Trinity College Dublin, The University of Dublin Loops & Control II Tedious Solution x <- 0 x <- x + 1. x <- x + 1 While Loop x <- 0 while(x < 10){ x <- x + 1} Create a loop to add 1 to variable x while x < 10 while( condition ){ do something } The WHILE loop

Trinity College Dublin, The University of Dublin Loops & Control III For Loop x <- 0 for (i in 1:10){ x <- x + 1 } The FOR loop Tedious Solution x <- 0 x <- x + 1. x <- x + 1 for (i in start:finish ){ do something }

Trinity College Dublin, The University of Dublin Conditionals I Similar to the WHILE loop, conditionals allow commands to be executed only when that condition is met. a <- 10 b <- 5 if (a >= b){ c <- a + b } if ( condition ){ do something } What would happen if the condition a >= b were not true, say, a <= b ?

Trinity College Dublin, The University of Dublin Conditionals II The conditional if statement can be extended to any number of conditions. The else if() portion of the conditional can be repeated as often as required. In lecture one, we covered logical operators - conditions if ( condition 1 ){ do something }else if ( condition 2 ){ do something }else{ do something }

Trinity College Dublin, The University of Dublin Some Examples – but first the preliminaries... Yesterday you saved an RScript (problems.R) and an R session (problems.RData) in your R_Course folder. We need to: Reload the R session (.RData) Open the script (.R) if it does not open automatically Reset the the working directory

Trinity College Dublin, The University of Dublin Preliminaries I Load the session from yesterday – problems.RData

Trinity College Dublin, The University of Dublin Preliminaries II Open your script (problems.R)

Trinity College Dublin, The University of Dublin Preliminaries III To set the wd, follow the above and navigate to the R_Course folder. Set the working directory (wd) to be the R_Course folder.

Trinity College Dublin, The University of Dublin Preliminaries IV Yesterday, we read in a file called colon_cancer_data_set.txt and generated two dataframes, affected and unaffected from that data. df <- read.table('colon_cancer_data_set.txt', header=T) affected <- df[which(df$Status=='A'), 1:7464] unaffected <- df[which(df$Status=='U'), 1:7464] These variables should be available in the session problems.RData that you just loaded. Note! You can list the variables in your work space by running the ls() command in the console.

Trinity College Dublin, The University of Dublin Problem I Iterate over the columns of the affected data and calculate the mean of each column. for (i in 1:ncol(affected)){ mean_exp <- mean(affected[,i]) print(mean_exp) } Printing the values illustrates the point but it doesn't allow you to store them in memory.

Trinity College Dublin, The University of Dublin Problem II Iterate over the columns of the affected data, calculate the mean of each column and store the results as a variable. mean_holder <- c() for (i in 1:ncol(affected)){ mean_exp <- mean(affected[,i]) mean_holder <- c(mean_holder, mean_exp) }

Trinity College Dublin, The University of Dublin FOR loops & apply() mean_holder <- c() for (i in 1:ncol(affected)){ mean_exp <- mean(affected[,i]) mean_holder <- c(mean_holder, mean_exp) } mean_a <- apply(affected, 2, mean) } The output from the FOR loop is equivalent to the apply() function. In R, loops are sometimes necessary but R has tricks to avoid them. This can have enormous implications for compute time on large data sets. R loops are inefficient!

Trinity College Dublin, The University of Dublin R is suitable for making publication quality graphics. R can generally create simple plots using a single function. We will look at the following plots: histograms ( hist() ) boxplots ( boxplot() ) scatterplots ( plot(), scatterplot() ) Basic Plotting

Trinity College Dublin, The University of Dublin Random Data To illustrate the plotting functions, I am just going to use some random data. var1 <- rnorm(1000) var2 <- rnorm(1000) Randomly generate 1000 data points pulled from a normal distribution. Note, random data is very useful if you want to figure out how a function works.

Trinity College Dublin, The University of Dublin Histograms I To produce histograms, we use the hist() function. var1 <- rnorm(1000) var2 <- rnorm(1000) hist(var1)

Trinity College Dublin, The University of Dublin Histograms II hist(var1, main='Distribution of Random Data', xlab='Variable 1', col='darkgrey' ) abline(v=mean(var1), col='red')

Trinity College Dublin, The University of Dublin Histograms III Using the par() function, it is possible to partition the plotting window into multiple squares to as to view multiple plots simultaneously. par(mfrow=c(1, 2)) # 1 rows, 2 columns hist(var1, xlab='Variable 1', col='darkgrey') abline(v=mean(var1), col='red') hist(var2, xlab='Variable 2', col='brown') abline(v=mean(var2), col='red')

Trinity College Dublin, The University of Dublin Histograms IV Using the par() function, it is possible to partition the plotting window into multiple squares in order to view multiple plots simultaneously.

Trinity College Dublin, The University of Dublin Colours R has an extensive repertoire of colour options for plots. Plot colours are typically indicated by the col argument, e.g., col = 'darkred' col = 'gold' col = 'darksalmon'

Trinity College Dublin, The University of Dublin Annotating Plots with Text It is possible to add text to plots using the text() function. hist(var1, xlab='Variable 1', col='darkgrey') abline(v=mean(var1), col='red') text(0.5, 187, as.character(round(mean(var1), 2))) In my experience, the text() function is more hassle than it's worth and such changes are best made manually using something like photoshop.

Trinity College Dublin, The University of Dublin Setting the limits on the x- and y-axes hist(var1, xlab='Variable 1', col='darkgrey', xlim=c(-6, 6), ylim=c(0, 200)) abline(v=mean(var1), col='red') text(0.7, 200, as.character(round(mean(var1), 2)))

Trinity College Dublin, The University of Dublin Boxplots I Boxplots (or box and whisker plots) are also a useful way of visualising the distribution of data. Boxplots show the median, the quartiles and the outliers. Boxplots also clearly demarcate outliers. Boxplots are compact – you can visualise many of them together to get an overview of multiple distributions

Trinity College Dublin, The University of Dublin Boxplots II boxplot(var1, var2, names=c('Variable 1', 'Variable 2'), col=c('darkgrey', 'lightgrey')) Notice the use of vectors, c(), to specify multiple values.

Trinity College Dublin, The University of Dublin Boxplots III Different ways of looking at the same data. Do they capture the same information?

Trinity College Dublin, The University of Dublin Scatterplots I plot(var1, var2, main='Scatterplot', xlab='Variable 1', ylab='Variable 2') plot(var1, var2, main='Scatterplot', xlab='Variable 1', ylab='Variable 2', col='red', pch=20, # point type cex=0.2)# point size

Trinity College Dublin, The University of Dublin Scatterplots II For plots that position points, the arguments pch and cex determine the point type and size, respectively. A selection of point types that can be set using pch argument.

Trinity College Dublin, The University of Dublin Additional Plotting Functions We have looked at the hist(), boxplot() and plot() functions. R has other 'base package' functions for plotting that work similarly to the above, e.g. barplot()scatterplot() pie()pairs() stripchart()dotchart()

Trinity College Dublin, The University of Dublin Packages The base package in R consists of a repertoire of functions that come automatically with R. R has thousands of additional packages created by developers free of charge. We will install a third party plotting package called ggplot2. install.packages('ggplot2') # To install package R will prompt you a couple of times to install ggplot2 as a local library – type y (yes) for each prompt. library(ggplot2) # Load package for use

Trinity College Dublin, The University of Dublin Slightly More Advanced Plotting ggplot2 is perhaps the most elegant way of creating graphs in R. ggplot2 is a course in itself – I will give some examples of how it works. To read further: The quick way to using ggplot2 is the use of qplot() function which is part of the ggplot2 package. qplot(x, y, data=, color=, shape=, size=, alpha=, geom=, method=, formula=, facets=, xlim=, ylim= xlab=, ylab=, main=, sub=) The qplot() function

Trinity College Dublin, The University of Dublin Slightly More Advanced Plotting – qplot() example var1 <- rnorm(1000) var2 <- rnorm(1000) lab1 <- rep('Variable_1', 1000) lab2 <- rep('Variable_2', 1000) var_df <- data.frame(vars= c(var1, var2), labs= c(lab1, lab2)) Make some data. qplot(labs, vars, data=var_df, geom="boxplot", fill=labs, main='qplot() example', xlab='', ylab='Random Variables')

Trinity College Dublin, The University of Dublin Slightly More Advanced Plotting – qplot() example qplot(labs, vars, data=var_df, geom="boxplot", fill=labs, main='qplot() example', xlab='', ylab='Random Variables') ggplot2 is subject in itself. Below as a good starting point: graphs/ggplot2.html

Lecture 2 – problem sheet A problem sheet entitled lecture_2_problems.pdf is located on the course website ( Some of the code required for the problem sheet has been covered in this lecture. Consult the help pages if unsure how to use a function. Please attempt the problems for the next mins. We will be on hand to help out. Solutions will be posted this afternoon.

Thank You