Presentation is loading. Please wait.

Presentation is loading. Please wait.

Using the ‘R’ Language for Bioinformatics

Similar presentations


Presentation on theme: "Using the ‘R’ Language for Bioinformatics"— Presentation transcript:

1 Using the ‘R’ Language for Bioinformatics
Based on “R Programming” lecture notes by Stephen Eglen Eglen SJ. A quick guide to teaching R programming to computational biology students. PLoS Comput Biol Aug;5(8). PMID:

2 What is R? Computing environment, similar to Matlab.
Very popular in many areas of statistics, computational biology. Interactive data analysis tool & Programming language for scripts/functions Extensive set of built-in statistical functions & graphical display tools Publication of data analysis methods via Modules

3 History S language came from Bell Labs (Becker, Chambers, and Wilks). Commercial version S-plus (1988). R developed as a combination of S and Scheme: Ross Ihaka & Robert Gentleman (NZ). 1993: first announcement. 1995: 0.60 release, now under GPL. Dec 2011: release (stable, multi-platform). R-core now ~20 people, key academics in field, including John Chambers.

4 Strengths of R GPL’d, available on many platforms.
Excellent development team with Apr/Oct release cycle. Source always available to examine/edit. Fast for vectorized calculations. Foreign-language interface (C/Fortran) when speed crucial, or for interfacing with existing code.. Good collection of numerical/statistical routines. Comprehensive R Archive Network (CRAN) ∼ 1550 packages. On-line doc, with examples. High-quality graphics (pdf, postscript, quartz, x11, bitmaps). Often used just for plotting . . .

5 R Graphics Jean YH Yang; gpQuality

6 Using R R can run on the server (command line only)
Nicer to install an R application on your computer – gives some menu commands, a bit of a GUI, and History file. Package manager to install modules Online Help

7 Your first R session Open R and type the following:
x <− rnorm(50, mean=4) x mean(x) range(x) hist(x) ## check help −− how to change title? ?hist hist(x, main=”my first plot”) q()

8 Objects & Functions R manipulates objects. Each object has a name and a type (vector, matrix, list, ...) Object names contain letters (case sensitive), digits, period, must start with a letter. Objects are set by way of assignment. Use the assignment operator <- rather than = (Does “i = i+1” make sense?) x <− 200 h a l f . x <− x / 2 threshold <− age <− c(15, 19, 30) age[2] ## use [] for accessing an element in a list length(age) ## use () for calling a function

9 Functions have Arguments
Usage: round(x, digits = 0) x <− c (2.091 , , 7.925) round() ## required arg is missing round(x) round(x, digits = 2)

10 Operators Most operators will be familiar, but some may not:
x <− 10 x == 4 ## test for equality x != ## not equal? 7 %/% 2 ## division , ignoring remainder 7 %% 2 ## remainder x <− ## assignment Raising to a power can be done in two ways all.equal( 10.1 ∗∗ 2.5, 10.1ˆ2.5)

11 Vectors Vectors are a fundamental object for R.
Scalars (single values) are treated as a vector of length 1. y <− c(10, 20, 40) ## c() function assigns a set of values to vector y y[2] ## recall 2nd value from y length(y) x <− 5 length(x) Some operations work element by element, others on the whole vector. Try the following: y <− c(20, 49, 16, 60, 100) min(y) range(y) sqrt(y) log(y)

12 Strings Strings are text. Vectors can contain strings or numbers, but not both. String operators: nchar, substr (like Perl), grep (like Unix) s <− c(’apple’, ’bee’, ’cars’, ’danish’, ’egg’) nchar(s) substr(s, 2,3) grep(’e’, s) grep(’ˆe’, s) ## regexps

13 Data frames Data frame is a special kind of list; all elements are vectors of same length. This is like a matrix, but each column can be of a different type. Useful for reading in tabular data from a file (see read.csv). names <− c(”joe” , ”fred” , ”harry”) a <− c(24, 19, 30) ht <− c(1.7, 1.8, 1.75) s <− c(TRUE, FALSE, TRUE) d <− data.frame(name=names, age=a, height=ht, student=s) d$age names(d) d[2,] ## access 2nd row

14 Creating Graphs Plot function x <− seq(from=0, to=2∗pi , len=1000)
y <− cos(2∗x) ## just provide data; sensible labelling plot(x,y)


Download ppt "Using the ‘R’ Language for Bioinformatics"

Similar presentations


Ads by Google