Introduction to R Workshop By Stephen O. Opiyo MCIC-Computational Biology Laboratory Day 1.

Slides:



Advertisements
Similar presentations
Introduction to R Brody Sandel. Topics Approaching your analysis Basic structure of R Basic programming Plotting Spatial data.
Advertisements

R for Macroecology Aarhus University, Spring 2011.
Training on R For 3 rd and 4 th Year Honours Students, Dept. of Statistics, RU Empowered by Higher Education Quality Enhancement Project (HEQEP) Department.
 Statistics package  Graphics package  Programming language  Can be used to share/reproduce analyses  Many new packages being created - can be downloaded.
Basics of Using R Xiao He 1. AGENDA 1.What is R? 2.Basic operations 3.Different types of data objects 4.Importing data 5.Basic data manipulation 2.
Introduction to GTECH 201 Session 13. What is R? Statistics package A GNU project based on the S language Statistical environment Graphics package Programming.
R for Research Data Analysis using R Day1: Basic R Baburao Kamble University of Nebraska-Lincoln.
An introduction to R Honors 207 Cognitive Science (These Slides were Shamelessly Stolen from Dr. Pablo Gomez, DePaul University)
R – a brief introduction Johannes Freudenberg Cincinnati Children’s Hospital Medical Center
R – a brief introduction Johannes Freudenberg Cincinnati Children’s Hospital Medical Center
Lecture 2 LISAM. Statistical software.. LISAM What is LISAM? Social network for Creating personal pages Creating courses  Storing course materials (lectures,
How to Use the R Programming Language for Statistical Analyses Part I: An Introduction to R Jennifer Urbano Blackford, Ph.D. Department of Psychiatry Kennedy.
Introduction to Array The fundamental unit of data in any MATLAB program is the array. 1. An array is a collection of data values organized into rows and.
Introduction to R: The Basics Rosales de Veliz L., David S.L., McElhiney D., Price E., & Brooks G. Contributions from Ragan. M., Terzi. F., & Smith. E.
Introduction to R Statistical Software Anthony (Tony) R. Olsen USEPA ORD NHEERL Western Ecology Division Corvallis, OR (541)
MATLAB and SimulinkLecture 11 To days Outline  Introduction  MATLAB Desktop  Basic Features  Branching Statements  Loops  Script file / Commando.
Basic R Programming for Life Science Undergraduate Students Introductory Workshop (Session 1) 1.
1 An Introduction – UCF, Methods in Ecology, Fall 2008 An Introduction By Danny K. Hunt & Eric D. Stolen Getting Started with R (with speaker notes)
R – a brief introduction Statistical physics – lecture 11 Szymon Stoma.
Introduction to MATLAB Session 1 Prepared By: Dina El Kholy Ahmed Dalal Statistics Course – Biomedical Department -year 3.
732A44 Programming in R.  Self-studies of the course book  2 Lectures (1 in the beginning, 1 in the end)  Labs (computer). Compulsory submission of.
General Computer Science for Engineers CISC 106 Lecture 02 Dr. John Cavazos Computer and Information Sciences 09/03/2010.
Introduction to SPSS Edward A. Greenberg, PhD
Data, graphics, and programming in R 28.1, 30.1, Daily:10:00-12:45 & 13:45-16:30 EXCEPT WED 4 th 9:00-11:45 & 12:45-15:30 Teacher: Anna Kuparinen.
Introduction to to R Emily Kalah Gade University of Washington Credit to Kristin Siebel for development of much of this PowerPoint.
Arko Barman with modification by C.F. Eick COSC 4335 Data Mining Spring 2015.
IC3003 B ASIC S CIENTIFIC C OMPUTING Lecture 1 Monday 08:30-11:30 U204a.
Sébastien Lê Agrocampus Rennes A very short introduction to “R” The “Rcmdr” package and its environment.
 A database is a collection of data that is organized so that its contents can easily be accessed, managed, and updated. What is Database?
1 Lab of COMP 406 Teaching Assistant: Pei-Yuan Zhou Contact: Lab 1: 12 Sep., 2014 Introduction of Matlab (I)
MEGN 536 – Computational Biomechanics MATLAB: Getting Started Prof. Anthony J. Petrella Computational Biomechanics Group.
Piotr Wolski Introduction to R. Topics What is R? Sample session How to install R? Minimum you have to know to work in R Data objects in R and how to.
Using the ‘R’ Language for Bioinformatics
R-Studio and Revolution Analytics have built additional functionality on top of base R.
R Programming Yang, Yufei. Normal distribution.
ISU Basic SAS commands Laboratory No. 1 Computer Techniques for Biological Research Animal Science 500 Ken Stalder, Professor Department of Animal Science.
R packages/libraries Data input/output Rachel Carroll Department of Public Health Sciences, MUSC Computing for Research I, Spring 2014.
Introduction to R Introductions What is R? RStudio Layout Summary Statistics Your First R Graph 17 September 2014 Sherubtse Training.
Scientific Computing (w1) R Computing Workshops An Introduction to Scientific Computing workshop 1.
Introduction to Matlab  Matlab is a software package for technical computation.  Matlab allows you to solve many numerical problems including - arrays.
STAT 534: Statistical Computing Hari Narayanan
Digital Image Processing Introduction to MATLAB. Background on MATLAB (Definition) MATLAB is a high-performance language for technical computing. The.
© 2015 by Wade Rogers Introduction to R Cytomics Workshop December, 2015.
PROGRAMMING IN R Introduction to R. In this session I will: Introduce you to the R program and windows Show how to install R Write basic programs in R.
1 An Introduction to R © 2009 Dan Nettleton. 2 Preliminaries Throughout these slides, red text indicates text that is typed at the R prompt or text that.
S-PLUS Lecture 4 Jaeyong Lee Pennsylvania State University.
1 PEER Session 02/04/15. 2  Multiple good data management software options exist – quantitative (e.g., SPSS), qualitative (e.g, atlas.ti), mixed (e.g.,
Data & Graphing vectors data frames importing data contingency tables barplots 18 September 2014 Sherubtse Training.
Math 252: Math Modeling Eli Goldwyn Introduction to MATLAB.
Lecture 11 Introduction to R and Accessing USGS Data from Web Services Jeffery S. Horsburgh Hydroinformatics Fall 2013 This work was funded by National.
Introduction to R Chris Free. Introduction to R Free! Superior (if not comparable) to commercial alternatives Available on all platforms Not just for.
Review > x[-c(1,4,6)] > Y[1:3,2:8] > island.data fishData$weight[1] > fishData[fishData$weight < 20 & fishData$condition.
16BIT IITR Data Collection Module If you have not already done so, download and install R from download.
Working with data in R 2 Fish 552: Lecture 3. Recommended Reading An Introduction to R (R Development Core Team) –
Introduction to R user-friendly and absolutely free
Sihua Peng, PhD Shanghai Ocean University
Programming in R Intro, data and programming structures
R programming language
Second Annual Cytomics Workshop April, 2017
Introduction to R Carolina Salge March 29, 2017.
Naomi Altman Department of Statistics (Based on notes by J. Lee)
Introduction to R Studio
MATLAB DENC 2533 ECADD LAB 9.
Lab 1 Introductions to R Sean Potter.
An introduction to data analysis using R
Code is on the Website Outline Comparison of Excel and R
INTRODUCTION TO MATLAB
MIS2502: Data Analytics Introduction to R and RStudio
R Course 1st Lecture.
Data analysis with R and the tidyverse
Presentation transcript:

Introduction to R Workshop By Stephen O. Opiyo MCIC-Computational Biology Laboratory Day 1

Outline R workshop outline ( tps.osu.edu/Routline.pdf) Introduction to R (Day 1) Graphics (data visualization) in R (Day 2) Basic Statistics using R (Day 2)

Getting Started Assume that R for Windows and Macs already installed on your laptop. (Instructions for installations given at the webpage below) Open R on your Windows or Mac

R on Windows

R on MACs

History of R Idea of R came from S developed at Bell Labs since 1976 S intended to support research and data analysis projects S to S-Plus licensed to Insightful (“S-Plus”) R: Open source platform similar to S developed by Robert Gentleman and Ross Ihaka (U of Auckland, NZ) during the 1990s. Since 1997: international “R-core” developing team Updated versions available every two months

What is R? Data handling and storage: numeric, textual Matrix algebra Tables and regular expressions Graphics Data analysis

R is not –a database –a collection of “black boxes” –a spreadsheet software package –commercially supported

Useful reading materials R for Beginners An Introduction to R” by Longhow Lam Practical Regression and Anova using R An R companion to ‘Experimental Design The R Guide R for Biologists

Useful reading materials Multilevel Modeling in R R reference cards R reference card data mining RStudio - Documentation

Useful books Learning Rstudio for R Statistical Computing by Mark Van Der Loo, Edwin De Jonge Paperback, 126 pages Published December 25th 2012 by Packt Publishing ISBN (ISBN13: ) Getting Started with RStudio By: John Verzani Publisher: O'Reilly Media, Inc. Pub. Date: September 22, 2011 Print ISBN-13: R Graphics Cookbook by Winston Chang (Jan 3, 2013) R For Dummies by Meys, Joris, de Vries The R Book by Michael J. Crawley

RStudio

RStudio is a free open source integrated development environment for R ( Assume RStudio for Windows and Macs already installed on your laptop. (Instructions for installations given at the webpage below)

Using RStudio Script editor View help, plots & files; manage packages View variables in workspace and history file R console

Open RStudio and set up a working directory

Datasets and R scripts Save all the files to R_workshop directory - Day 1 R files (D1_Example_1.R to D1_Example_12.R ) = 12 files - Day 1 data files (D1_Data_1.csv or txt to D1_Data_2.csv or txt) = 4 files - Day 2 R files (D2_Example_1.R to D2_Example_3.R ) = 3 files - Day 2 data files (D1_Data_1.csv to D1_Data_3.csv) = 3 files

R: Session management

R: session management You can enter a command at the command prompt in a console (>). To quit R, use >q(). Result is stored in an object using the assignment operator: (<-) or the equal character (=). Test<-2 and Test=2 Test is an object with a value of 2 To print (show) the object just enter the name of the object Test

Naming object in R Object names cannot contain `strange' symbols like !, +, -, #. A dot (.) and an underscore (_) are allowed, also a name starting with a dot. Object names can contain a number but cannot start with a number.(E.g., Example_1, not 1Example_1) R is case sensitive, X and x are two different objects, as well as temp.1 and temP.1

R_Example_1 Example1<-c(1,2,3,4,5) Example1 1Example<-c(1,2,3,4,5) Example2<-c("Second", "Example") Example2 2Example<-c("Second", "Example") Temp.1<-c("Second", "Example") Temp.1

R: session management Your R objects are stored in a workspace Workspace is not saved on disk unless you tell R to do so. Objects are lost when you close R and not save the objects. R session are saved in a file.RData.

R: session management Just checking what the current working directory type getwd() To list the objects in your workspace: ls() To remove objects you no longer need: rm() To remove ALL objects in your workspace: rm(list=ls()) or use Remove all objects To save your workspace to a file, you may type save.image() or use Save Workspace… in the File menu The default workspace file is called.RData

Basic data types

Basic data types: Logical, numeric, and character, etc.

Basic data type (Example 2) Logical (True or False) Numerical (relating to numbers) Characters (letters, numerical digits, common punctuation marks (such as "." or "-”), etc.

Vectors and Matrices A vector –Ordered collection of data of the same type –Example: last names of all students in this workshop –In R, single number is a vector of length 1 A matrix –Rectangular table of data of the same type –Example: Mean intensities of all genes measured during a microarray experiment

Vectors (Example 3) Vector: Ordered collection of data of the same data type > x <- c(5.2, 1.7, 6.3) > log(x) [1] > y <- 1:5 > z <- seq(1, 1.4, by = 0.1) > yz<-y + z [1] > length(y) [1] 5 > M<-mean(y + z) [1] 4.2

Operation on vector elements (Example 4) Mydata <- c(2,3.5,-0.2) Vector (c=“concatenate”) > Mydata [1] > x4 0 [1] TRUE TRUE FALSE > x5 0] [1] > x6<-Mydata[-c(1,3)] [1] 3.5 Test on the elements Extract the positive elements Remove elements 1 and 3

Operation on vector elements (Example 4) > Colors <- c("Red","Green","Red") Character vector > Colors[2] [1] "Green" > x1 <- 25:30 > x1 [1] Number sequences > x2<-x1[3:5] [1] Various elements 3 to 5 >x3<-x1[c(2,6)]Elements 2 and 6

> x <- c(5,-2,3,-7) > y <- c(1,2,3,4)*10Operation on all the elements > y [1] > sort(x)Sorting a vector in ascending order [1] > sort(x, decreasing=TRUE) Sort in descending order [1] > order(x) [1] Element order for sorting > y[order(x)] [1] Operation on all the components > rev(x)Reverse a vector [1] Operation on vector elements (Example 5)

Matrices (Example 6) Matrix: Rectangular table of data of the same type > m <- matrix(1:12) Create matrix using the “matrix function” m [,1] [1,] 1 [2,] 2 [3,] 3 [4,] 4 [5,] 5 [6,] 6 [7,] 7 [8,] 8 [9,] 9 [10,] 10 [11,] 11 [12,] 12 V<-1:12 V1<-c(1,2,3,4,5,6,7,8,9,10,11,12)

Matrices (Example 6) Matrix: Rectangular table of data of the same type > mr <- matrix(1:12, 4) four rows mr [,1] [,2] [,3] [1,] [2,] [3,] [4,]

Matrices (Example 6) Matrix: Rectangular table of data of the same type > mm <- matrix(1:12, 4, byrow = T); m By row creation [,1] [,2] [,3] [1,] [2,] [3,] [4,] > tmm<-t(mm) t is transpose [,1] [,2] [,3] [,4] [1,] [2,] [3,] > dim(mm) dim is dimension [1] 4 3four rows and three columns > dim(t(mm)) [1] 3 4 three rows and four columns

Matrices (Example 6) Matrix: Rectangular table of data of the same type > x <- c(3,-1,2,0,-3,6) > x.mat <- matrix(x,ncol=2) Matrix with 2 cols > x.mat [,1] [,2] [1,] 3 0 [2,] [3,] 2 6  x.matr <- matrix(x,ncol=2, byrow=T)By row creation > x.matr [,1] [,2] [1,] 3 -1 [2,] 2 0 [3,] -3 6

0peration on matrices (Example 6) > x.matr[,2] 2 nd col [1] > x.matr[c(1,3),]1 st and 3 rd lines [,1] [,2] [1,] 3 -1 [2,] -3 6 > x.mat[-2,]remove second row (No 2 nd line) [,1] [,2] [1,] 3 -1 [2,] -3 6

R as a calculator (Example 7) In R, plus (+), minus (-), division(/), and multiplication (*) > 5 + (6 + 7) * pi^2 [1] > log(exp(1)) [1] 1 > sin(pi/3)^2 + cos(pi/3)^2 [1] 1 > Sin(pi/3)^2 + cos(pi/3)^2 Error: couldn't find function "Sin” Take few minutes to do any calculations

Lists and data frames

Lists A list is an ordered collection of data of arbitrary types. Lists don’t need to be of the same mode or type and they can be a numeric vector, a logical value and a function and so on. A component of a list can be referred as aa[[I]] or aa$x, where aa is the name of the list and x is a name of a component of aa.

Lists (Example 8) aa[[1]] is the first component of aa, while aa[1] is the sublist consisting of the first component of aa only. list: an ordered collection of data of arbitrary types. > doe = list(name="john", age=28, married=F) doe[[1]] [1] "john" > doe[1] $name [1] "john” > doe$name [1] "john“ > doe$age [1] 28 Typically, vector elements are accessed by their index (an integer), list elements by their name (a character string). But both types support both access methods.

Lists are very flexible (Example 8) > my.list <- list(c(5,4,-1),c("X1","X2","X3")) > my.list [[1]]: [1] [[2]]: [1] "X1" "X2" "X3" > my.list[[1]] [1] > my.list1 <- list(c1=c(5,4,-1),c2=c("X1","X2","X3")) > my.list1$c2[2:3] [1] "X2" "X3"

Lists: Session (Example 8) Empl <- list(employee=“Anna”, spouse=“Fred”, children=3, child.ages=c(4,7,9)) Empl[[4]] [1] Empl$child.ages [1] Empl[4] # a sublist consisting of the 4 th component of Empl $child.ages [1] #letters is a function for a to z #toupper(letters) for capital A to Z #LETTERS for capital A to Z names(Empl) <- letters[1:4] names(Empl) [1] "a" "b" "c" "d" Empl1 <- c(Empl, service=8) $a [1] "Anna" $b [1] "Fred" $c [1] 3 $d [1] $service [1] 8 unl<-unlist(Empl) # converts it to a vector. Mixed types will be converted to character, giving a character vector. unl a b c d1 d2 d3 service "Anna" "Fred" "3" "4" "7" "9" "8"

More lists (Example 8) > x <- c(3,-1,2,0,-3,6) > x.mat <- matrix(x,ncol=2) Matrix with 2 cols > x.mat [,1] [,2] [1,] 3 -1 [2,] 2 0 [3,] -3 6 > dimnames(x.mat) <- list(c("L1","L2","L3"), c("R1","R2")) > x.mat R1 R2 L L2 2 0 L3 -3 6

Data frame Data frame: Rectangular table with rows and columns; data within each column has the same type (e.g. number, text, logical), but different columns may have different types. Example of a data frame with 10 rows and 3 columns

Importing and exporting data in R

Importing Data (Exercise 9) The easiest way to enter data in R is to work with a text file, in which the columns are separated by tabs; or comma-separated values (csv) files. Example of importing data are provided below. mydata <- read.table("D1_Data_1.csv", header=TRUE) mydata_correct <- read.table("D1_Data_1.csv", sep=”, ", header=TRUE) mydata <- read.csv("D1_Data_1.csv", header=TRUE) mydatab<- read.table("D1_Data_1.txt", header=TRUE) Mydatab_correct<- read.table("D1_Data_1.txt",, sep=”,\t", header=TRUE) mydatab<- read.delim("D1_Data_1.txt", header=TRUE) Import from Web URL Data_URL<-read.table(file= header=TRUE, sep=“,”) Importing data in Rstudio using (Import Dataset) on the Workspace Importing data from Excel, SAS, SPSS see the link below

Keyboard Input (Exercise 10) # create a data frame from scratch age <- c(25, 30, 56, 49, 12, 16, 60, 34, 45, 22) gender <- c("male", "female", "male","male", "female", "male","male", "female", "male","male") weight <- c(160, 110, 220, 100, 65, 120, 179, 134, 165, 153) mydata <- data.frame(age,gender,weight)

Viewing Data (Exercise 10) There are a number of functions for listing the contents of an object or dataset. # list the variables in mydata names(mydata) # list the structure of mydata str(mydata) # dimensions of an object dim(mydata)

Viewing Data (Example 10) # class of an object (numeric, matrix, dataframe, etc) class(mydata) # print mydata mydata # print first 6 rows of mydata head(mydata) # print first 2 rows of mydata head(mydata, n=2) print last 6 rows of mydata tail(mydata) # print last 2 rows of mydata tail(mydata, n=2)

Missing Data (Example 11) In R, missing values are represented by the symbol NA (not available). Impossible values (e.g., dividing by zero) are represented by the symbol NaN (not a number). Testing for Missing Values is.na(x) # returns TRUE of x is missing y <- c(1,2,3,NA) is.na(y) # returns a vector (F F F T) Excluding missing values from analyses Arithmetic functions on missing values yield missing values. x <- c(1,2,NA,3) mean(x) # returns NA mean(x, na.rm=TRUE) # returns 2

Exporting Data (Example 12) To A csv File write.table(mydata, "c: mydata.csv", sep=", ") To To A Tab Delimited Text File write.table(mydata, "c: mydata.csv”,, sep=“\t ") Exporting R objects into other formats. For SPSS, SAS and Stata. you will need to load the foreign packages. foreign For Excel, you will need the xlsReadWrite package.xlsReadWrite

Exercise

Exercise 1 1.Download/import data D1_Data_2.csv and name it Data_1. 2.Get the names of the variables in the Data_1. 3.2)Create a new data from the first four column of Data_1 and named it Data_2. 4.Get the names of the variable of Data_2. 5.What is the length of Data_2? 6.Remove the second column from Data_2 and create a new data called Data_3. 7.List the first six rows (lines) of Data_3. 8.List the last six rows of Data_3. 9.List the first 4 rows of Data_3. 10.Sort the third column of Data_3 and called it Data_4.