Introduction to the R language

Slides:



Advertisements
Similar presentations
An Introduction to R: Logic & Basics. The R language Command line Can be executed within a terminal Within Emacs using ESS (Emacs Speaks Statistics)
Advertisements

R for Macroecology Aarhus University, Spring 2011.
Training on R For 3 rd and 4 th Year Honours Students, Dept. of Statistics, RU Empowered by Higher Education Quality Enhancement Project (HEQEP) Department.
Introduction to M ATLAB Programming Ian Brooks Institute for Climate & Atmospheric Science School of Earth & Environment
Statistical Methods Lynne Stokes Department of Statistical Science Lecture 7: Introduction to SAS Programming Language.
MATLAB – What is it? Computing environment / programming language Tool for manipulating matrices Many applications, you just need to get some numbers in.
R tutorial g/methods2.2010/R-intro.pdf.
Introduction to GTECH 201 Session 13. What is R? Statistics package A GNU project based on the S language Statistical environment Graphics package Programming.
R for Research Data Analysis using R Day1: Basic R Baburao Kamble University of Nebraska-Lincoln.
CS Lecture 03 Outline Sed and awk from previous lecture Writing simple bash script Assignment 1 discussion 1CS 311 Operating SystemsLecture 03.
An introduction to R Honors 207 Cognitive Science (These Slides were Shamelessly Stolen from Dr. Pablo Gomez, DePaul University)
Introduction to microarray data analysis with Bioconductor Katherine S. Pollard March 11, 2004 © Copyright 2004, all rights reserved.
Alternative text for elementary statistics –Elementary Concepts –Basic Statistics.
Guide To UNIX Using Linux Third Edition
EGR 105 Foundations of Engineering I Session 3 Excel – Basics through Graphing Fall 2008.
How to Use the R Programming Language for Statistical Analyses Part I: An Introduction to R Jennifer Urbano Blackford, Ph.D. Department of Psychiatry Kennedy.
Introduction to Array The fundamental unit of data in any MATLAB program is the array. 1. An array is a collection of data values organized into rows and.
Introduction to R: The Basics Rosales de Veliz L., David S.L., McElhiney D., Price E., & Brooks G. Contributions from Ragan. M., Terzi. F., & Smith. E.
Statistical Software An introduction to Statistics Using R Instructed by Jinzhu Jia.
AN ENGINEER’S GUIDE TO MATLAB
MATLAB and SimulinkLecture 11 To days Outline  Introduction  MATLAB Desktop  Basic Features  Branching Statements  Loops  Script file / Commando.
Basic R Programming for Life Science Undergraduate Students Introductory Workshop (Session 1) 1.
AN INTRODUCTION TO GRAPHICS IN R. Today Overview Overview –Gallery of R Graph examples High-Level Plotting Functions High-Level Plotting Functions Low-Level.
Introduction to MATLAB Session 1 Prepared By: Dina El Kholy Ahmed Dalal Statistics Course – Biomedical Department -year 3.
Hands-on Introduction to R. Outline R : A powerful Platform for Statistical Analysis Why bother learning R ? Data, data, data, I cannot make bricks without.
Data, graphics, and programming in R 28.1, 30.1, Daily:10:00-12:45 & 13:45-16:30 EXCEPT WED 4 th 9:00-11:45 & 12:45-15:30 Teacher: Anna Kuparinen.
Introduction to to R Emily Kalah Gade University of Washington Credit to Kristin Siebel for development of much of this PowerPoint.
Copyright © 2012 Pearson Education, Inc. Publishing as Pearson Addison-Wesley C H A P T E R 6 Value- Returning Functions and Modules.
Session 3: More features of R and the Central Limit Theorem Class web site: Statistics for Microarray Data Analysis.
Piotr Wolski Introduction to R. Topics What is R? Sample session How to install R? Minimum you have to know to work in R Data objects in R and how to.
Eng Ship Structures 1 Introduction to Matlab.
Introduction to MATLAB Session 3 Simopekka Vänskä, THL Department of Mathematics and Statistics University of Helsinki 2011.
1 System Administration Introduction to Scripting, Perl Session 3 – Sat 10 Nov 2007 References:  chapter 1, The Unix Programming Environment, Kernighan.
Introduction to R / sma / Bioconductor Statistics for Microarray Data Analysis The Fields Institute for Research in Mathematical Sciences May 25, 2002.
Vectors and Matrices In MATLAB a vector can be defined as row vector or as a column vector. A vector of length n can be visualized as matrix of size 1xn.
Matlab Programming for Engineers Dr. Bashir NOURI Introduction to Matlab Matlab Basics Branching Statements Loops User Defined Functions Additional Data.
Installing R CRAN: –(R homepage: –Windows 95 and later  Base –rw2001.exe.
Hands-on Introduction to R. We live in oceans of data. Computers are essential to record and help analyse it. Competent scientists speak C/C++, Java,
R Programming Yang, Yufei. Normal distribution.
R packages/libraries Data input/output Rachel Carroll Department of Public Health Sciences, MUSC Computing for Research I, Spring 2014.
Lecture 26: Reusable Methods: Enviable Sloth. Creating Function M-files User defined functions are stored as M- files To use them, they must be in the.
STAT 534: Statistical Computing Hari Narayanan
INTRODUCTION TO MATLAB Dr. Hugh Blanton ENTC 4347.
Digital Image Processing Introduction to MATLAB. Background on MATLAB (Definition) MATLAB is a high-performance language for technical computing. The.
© 2015 by Wade Rogers Introduction to R Cytomics Workshop December, 2015.
Postgraduate Computing Lectures PAW 1 PAW: Physicist Analysis Workstation What is PAW? –A tool to display and manipulate data. Learning PAW –See ref. in.
R objects  All R entities exist as objects  They can all be operated on as data  We will cover:  Vectors  Factors  Lists  Data frames  Tables 
Data & Graphing vectors data frames importing data contingency tables barplots 18 September 2014 Sherubtse Training.
CS100A, Fall 1998, Lecture 201 CS100A, Fall 1998 Lecture 20, Tuesday Nov 10 More Matlab Concepts: plotting (cont.) 2-D arrays Control structures: while,
PHP Tutorial. What is PHP PHP is a server scripting language, and a powerful tool for making dynamic and interactive Web pages.
Lecture 11 Introduction to R and Accessing USGS Data from Web Services Jeffery S. Horsburgh Hydroinformatics Fall 2013 This work was funded by National.
Arrays Chap. 9 Storing Collections of Values 1. Introductory Example Problem: Teachers need to be able to compute a variety of grading statistics for.
1-2 What is the Matlab environment? How can you create vectors ? What does the colon : operator do? How does the use of the built-in linspace function.
1 Introduction to R A Language and Environment for Statistical Computing, Graphics & Bioinformatics Introduction to R Lecture 3
16BIT IITR Data Collection Module If you have not already done so, download and install R from download.
Programming in R Intro, data and programming structures
Introduction to R Samal Dharmarathna.
Digital Text and Data Processing
Introduction to R.
Introduce to R chris.
Introduction to Python
Use of Mathematics using Technology (Maltlab)
PHP.
Web DB Programming: PHP
Basics of R, Ch Functions Help Managing your Objects
Topics Introduction to Value-returning Functions: Generating Random Numbers Writing Your Own Value-Returning Functions The math Module Storing Functions.
Vectors and Matrices In MATLAB a vector can be defined as row vector or as a column vector. A vector of length n can be visualized as matrix of size 1xn.
Data analysis with R and the tidyverse
Course: Statistics in Bioinformatics Date: 指導教授: 陳光琦 學生: 吳昱賢
R tutorial
Presentation transcript:

Introduction to the R language Ahmed Rebai Bioinformatics and Comparative Genome Analysis

Websites R www.r-project.org software; documentation; RNews. Bioconductor www.bioconductor.org software, data, and documentation; training materials from short courses; www.bioconductor.org/workshops/UCSC03/ucsc03.html mailing list.

R language: Overview Open source and open development. Design and deployment of portable, extensible, and scalable software. Interoperability with other languages: C, XML. Variety of statistical and numerical methods. High quality visualization and graphics tools. Effective, extensible user interface. Innovative tools for producing documentation and training materials: vignettes. Supports the creation, testing, and distribution of software and data modules: packages.

R user interface Batch or command line processing Graphics windows bash$ R to start R> q() to quit Graphics windows > X11() > postscript() > dev.off() File path is relative to working directory > getwd() > setwd() Load a package library with library() GUIs, tcltk

Getting Help Details about a specific command whose name you know (input arguments, options, algorithm): > ? t.test > help(t.test) See an example of usage: > demo(graphics) > example(mean) mean> x <- c(0:10, 50) mean> xm <- mean(x) mean> c(xm, mean(x, trim = 0.1)) [1] 8.75 5.50

Getting Help HTML search engine lets you search for topics with regular expressions: > help.search Find commands containing a regular expression or object name: > apropos("var") [1] "var.na" ".__M__varLabels:Biobase" [3] "varLabels" "var.test" [5] "varimax" "all.vars" [7] "var" "variable.names" [9] "variable.names.default" "variable.names.lm"

Getting Help Vignettes contain text and executable code: > library(tkWidgets) > vExplorer() > openVignette() Created using the Sweave() function. .Rnw files produce a PDF file and a vignette. To see code for a function, type the name with no parentheses or arguments: > plot

R as a Calculator > log2(32) [1] 5 > print(sqrt(2)) [1] 1.414214 > pi [1] 3.141593 > seq(0, 5, length=6) [1] 0 1 2 3 4 5 > 1+1:10 [1] 2 3 4 5 6 7 8 9 10 11

R as a Graphics Tool > plot(sin(seq(0, 2*pi, length=100)))

Variables > a <- 49 numeric > sqrt(a) [1] 7 > b <- "The dog ate my homework" > sub("dog","cat",b) [1] "The cat ate my homework" > c <- (1+1==3) > c [1] FALSE > as.character(b) [1] "FALSE" numeric character string logical

Missing Values Variables of each data type (numeric, character, logical) can also take the value NA: not available. o NA is not the same as 0 o NA is not the same as “” o NA is not the same as FALSE o NA is not the same as NULL Operations that involve NA may or may not produce NA: > NA==1 [1] NA > 1+NA > max(c(NA, 4, 7)) > max(c(NA, 4, 7), na.rm=T) [1] 7 > NA | TRUE [1] TRUE > NA & TRUE [1] NA

Vectors vector: an ordered collection of data of the same type > a <- c(1,2,3) > a*2 [1] 2 4 6 Example: the mean spot intensities of all 15488 spots on a microarray is a numeric vector In R, a single number is the special case of a vector with 1 element. Other vector types: character strings, logical

Matrices and Arrays matrix: rectangular table of data of the same type Example: the expression values for 10000 genes for 30 tissue biopsies is a numeric matrix with 10000 rows and 30 columns. array: 3-,4-,..dimensional matrix Example: the red and green foreground and background values for 20000 spots on 120 arrays is a 4 x 20000 x 120 (3D) array.

Lists list: ordered collection of data of arbitrary types. Example: > doe <- list(name="john",age=28,married=F) > doe$name [1] "john“ > doe$age [1] 28 > doe[[3]] [1] FALSE Typically, vector elements are accessed by their index (an integer) and list elements by $name (a character string). But both types support both access methods. Slots are accessed by @name.

Data Frames data frame: rectangular table with rows and columns; data within each column has the same type (e.g. number, text, logical), but different columns may have different types. Represents the typical data table that researchers come up with – like a spreadsheet. Example: > a <-data.frame(localization,tumorsize,progress,row.names=patients) > a localization tumorsize progress XX348 proximal 6.3 FALSE XX234 distal 8.0 TRUE XX987 proximal 10.0 FALSE

What type is my data? Class from which object inherits (vector, matrix, function, logical, list, … ) mode Numeric, character, logical, … storage.mode typeof Mode used by R to store object (double, integer, character, logical, …) is.function Logical (TRUE if function) is.na Logical (TRUE if missing) names Names associated with object dimnames Names for each dim of array slotNames Names of slots of BioC objects attributes Names, class, etc.

Subsetting Individual elements of a vector, matrix, array or data frame are accessed with “[ ]” by specifying their index, or their name > a localization tumorsize progress XX348 proximal 6.3 0 XX234 distal 8.0 1 XX987 proximal 10.0 0 > a[3, 2] [1] 10 > a["XX987", "tumorsize"] > a["XX987",] XX987 proximal 10 0

Example: subset rows by a vector of indices localization tumorsize progress XX348 proximal 6.3 0 XX234 distal 8.0 1 XX987 proximal 10.0 0 > a[c(1,3),] > a[-c(1,2),] > a[c(T,F,T),] > a$localization [1] "proximal" "distal" "proximal" > a$localization=="proximal" [1] TRUE FALSE TRUE > a[ a$localization=="proximal", ] subset rows by a vector of indices subset rows by a logical vector subset columns comparison resulting in logical vector subset the selected rows

Functions and Operators Functions do things with data “Input”: function arguments (0,1,2,…) “Output”: function result (exactly one) Example: add <- function(a,b) { result <- a+b return(result) } Operators: Short-cut writing for frequently used functions of one or two arguments.

Frequently used operators <- Assign + Sum - Difference * Multiplication / Division ^ Exponent %% Mod %*% Dot product %/% Integer division %in% Subset | Or & And < Less > Greater <= Less or = >= Greater or = ! Not != Not equal == Is equal

Frequently used functions Concatenate cbind,rbind Concatenate vectors min Minimum max Maximum length # values dim # rows, cols floor Max integer in which TRUE indices table Counts summary Generic stats Sort, order, rank Sort, order, rank a vector print Show value cat Print as char paste c() as char round Round apply Repeat over rows, cols

Statistical functions rnorm, dnorm, pnorm, qnorm Normal distribution random sample, density, cdf and quantiles lm, glm, anova Model fitting loess, lowess Smooth curve fitting sample Resampling (bootstrap, permutation) .Random.seed Random number generation mean, median Location statistics var, cor, cov, mad, range Scale statistics svd, qr, chol, eigen Linear algebra

Graphical functions plot Generic plot eg: scatter points Add points lines, abline Add lines text, mtext Add text legend Add a legend axis Add axes box Add box around all axes par Plotting parameters (lots!) colors, palette Use colors

Branching if (logical expression) { statements } else { alternative statements else branch is optional { } are optional with one statement ifelse (logical expression, yes statement, no statement)

Loops When the same or similar tasks need to be performed multiple times; for all elements of a list; for all columns of an array; etc. for(i in 1:10) { print(i*i) } i<-1 while(i<=10) { i<-i+sqrt(i) Also: repeat, break, next

Regular Expressions Tools for text matching and replacement which are available in similar forms in many programming languages (Perl, Unix shells, Java) > a <- c("CENP-F","Ly-9", "MLN50", "ZNF191", "CLH-17") > grep("L", a) [1] 2 3 5 > grep("L", a, value=T) [1] "Ly-9" "MLN50" "CLH-17" > grep("^L", a, value=T) [1] "Ly-9" > grep("[0-9]", a, value=T) [1] "Ly-9" "MLN50" "ZNF191" "CLH-17" > gsub("[0-9]", "X", a) [1] "CENP-F" "Ly-X" "MLNXX" "ZNFXXX" "CLH-XX"

Storing Data Every R object can be stored into and restored from a file with the commands “save” and “load”. This uses the XDR (external data representation) standard of Sun Microsystems and others, and is portable between MS-Windows, Unix, Mac. > save(x, file=“x.Rdata”) > load(“x.Rdata”)

Importing and Exporting Data There are many ways to get data in and out. Most programs (e.g. Excel), as well as humans, know how to deal with rectangular tables in the form of tab-delimited text files. > x <- read.delim(“filename.txt”) Also: read.table, read.csv, scan > write.table(x, file=“x.txt”, sep=“\t”) Also: write.matrix, write

Importing Data: caveats Type conversions: by default, the read functions try to guess and auto convert the data types of the different columns (e.g. number, factor, character). There are options as.is and colClasses to control this. Special characters: the delimiter character (space, comma, tabulator) and the end-of-line character cannot be part of a data field. To circumvent this, text may be “quoted”. However, if this option is used (the default), then the quote characters themselves cannot be part of a data field. Except if they themselves are within quotes… Understand the conventions your input files use and set the quote options accordingly.

Bioconductor R software project for the analysis and comprehension of biomedical and genomic data. Gene expression arrays (cDNA, Affymetrix) Pathway graphs Genome sequence data Started in 2001 by Robert Gentleman, Dana Farber Cancer Institute. About 25 core developers, at various institutions in the US and Europe. Tools for integrating biological metadata from the web (annotation, literature) in the analysis of experimental metadata. End-user and developer packages.

Example R/BioC Packages methods Class/method tools tools tkWidgets Sweave(),gui tools marrayTools, marrayPlots Spotted cDNA array analysis affy Affymetrix array analysis annotate Link microarray data to metadata on the web mva, cluster, clust, class Clustering and classification t.test, prop.test, wilcox.test Statistical tests