Presentation is loading. Please wait.

Presentation is loading. Please wait.

John D. Cook M. D. Anderson Cancer Center

Similar presentations


Presentation on theme: "John D. Cook M. D. Anderson Cancer Center"— Presentation transcript:

1 John D. Cook M. D. Anderson Cancer Center www.JohnDCook.com
Why and how people use R John D. Cook M. D. Anderson Cancer Center

2 What is R? Open source statistical language
De facto standard for statistical research Grew out of Bell Labs’ S (1976, 1988) Influenced by Scheme, Fortran Quirky, flawed, and an enormous success Fortran: unit offset vectors, column-major storage, cryptic names. As Dennis Ritchie said about C.

3 R in data analysis Languages used in Kaggle.com data analysis competition 2011 Source:

4 R in bioinformatics (2012) Bioinformatics means a lot of different things. Imagine that those using Perl are not doing statistics.

5 So what is using R like?

6 "Using R is a bit akin to smoking.
The beginning is difficult, one may get headaches and even gag the first few times. But in the long run, it becomes pleasurable and even addictive. Yet, deep down, for those willing to be honest, there is something not fully healthy in it.“ -- Francois Pinard

7 “… R has a unique and somewhat prickly syntax and tends to have a steeper learning curve than other languages.” Drew Conway John Myles White

8 The R Inferno by Patrick Burns pages/Tutor/R_inferno.pdf

9 What about speed? Refer to Julia slides

10 Maybe 100x slower than C++, though it varies greatly.

11 What about tool support?

12 Limited compared to, for example, first release of Visual Studio (1995).

13 So why do statisticians use R?
“The best thing about R is that it was written by statisticians. The worst thing about R ...” Bo Cowgill, Google

14 R is a DSL To understand a DSL, start with D, not L.
The alternative to R isn’t Python or C#, it’s SAS. People love their DSL, and will use it outside of its domain.

15 What are statisticians like?
Different priorities than software developers Different priorities than mathematicians Learn bits of R in parallel with statistics

16 Why a statistical DSL? Statistical functions easily accessible
Convenient manipulation of tables Vector operations Smooth handling of missing data Patterns for common tasks Bruce Payette: It’s a shell!

17 Some advantages of R Batteries included, one namespace
Contrast Python + matplotlib + SciPy + IPython Designed for interactive data analysis Easier to program than, e.g., SAS Open source, interpreted, portable Succinct notation for querying and filtering Succinct notation for linear regression Open source: i.e. can avoid purchasing bureaucracy, students and colleagues in poor areas can use same software

18 Examples Set all NA elements of x to 0.
x[ is.na(x) ] <- 10 Fit a linear regression model to w as a function of x, y, and z, including a constant term and all first order interaction terms except xz. model ~ lm(w ~ (x + y + z)^2 – x:z)

19 Simple regression growth tannin 12 0 10 1 8 2 11 3 6 4 7 5 2 6 3 7 3 8

20 Regression example > data <- read.table("regression.txt", header=T) > attach(data) > names(data) [1] "growth" "tannin" > model <- lm( growth ~ tannin ) > summary(model) ... Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) e-06 *** tannin *** Residual standard error: on 7 degrees of freedom Multiple R-squared: , Adjusted R-squared: F-statistic: on 1 and 7 DF, p-value:

21 Recycling Strange, dangerous, or convenient, depending on your perspective If you add two vectors of different lengths, the shorter will be recycled to be as long as the longer vector. Example: > 1:6 + c(10, 100)

22 Recycling cont. > x <- matrix(1:6, ncol = 2) > x [,1] [,2]
[1,] [2,] [3,] > x + c(10, 100) [1,] [2,] [3,]

23 Lessons from R for language designers
Users may have very different priorities than computer scientists. Users will put up with a lot of crap to get their work done. Users will a familiar tool over a better tool if at all feasible.

24 Resources http://www.r-project.org/
R_language_for_programmers.html “The Art of R Programming” by Normal Matloff @RLangTip


Download ppt "John D. Cook M. D. Anderson Cancer Center"

Similar presentations


Ads by Google