Presentation is loading. Please wait.

Presentation is loading. Please wait.

Computing for Research I Spring 2013 Primary Instructor: Elizabeth Garrett-Mayer Introduction to R March 5.

Similar presentations


Presentation on theme: "Computing for Research I Spring 2013 Primary Instructor: Elizabeth Garrett-Mayer Introduction to R March 5."— Presentation transcript:

1 Computing for Research I Spring 2013 Primary Instructor: Elizabeth Garrett-Mayer Introduction to R March 5

2 Check out online resources http://people.musc.edu/~elg26/teaching/methods2.2010/ R-intro.pdf http://www.ats.ucla.edu/stat/r/ http://www.statmethods.net/about/learningcurve.html http://www.mayin.org/ajayshah/KB/R/index.html http://processtrends.com/Learn_R_Toolkit.htm

3 R. Kabacoff on learning R after SPSS and SAS (http://www.statmethods.net/about/learningcurve.html) Why R has A Steep Learning Curve : A long answer to a simple question... I have been a hardcore SAS and SPSS programmer for more than 25 years, a Systat programmer for 15 years and a Stata programmer for 2 years. But when I started learning R recently, I found it frustratingly difficult. Why? I think that there are two reasons why R can be challenging to learn quickly. First, while there are many introductory tutorials (covering data types, basic commands, the interface), none alone are comprehensive. In part, this is because much of the advanced functionality of R comes from hundreds of user contributed packages. Hunting for what you want can be time consuming, and it can be hard to get a clear overview of what procedures are available. The second reason is more ephemeral. As users of statistical packages, we tend to run one proscribed procedure for each type of analysis. Think of PROC GLM in SAS. We can carefully set up the run with all the parameters and options that we need. When we run the procedure, the resulting output may be a hundred pages long. We then sift through this output pulling out what we need and discarding the rest. The paradigm in R is different. Rather than setting up a complete analysis at once, the process is highly interactive. You run a command (say fit a model), take the results and process it through another command (say a set of diagnostic plots), take those results and process it through another command (say cross-validation), etc. The cycle may include transforming the data, and looping back through the whole process again. You stop when you feel that you have fully analyzed the data. It may sound trite, but this reminds me of the paradigm shift from top-down procedural programming to object oriented programming we saw a few years ago. It is not an easy mental shift for many of us to make. In that in the end, however, I believe that you will feel much more intimately in touch with your data and in control of your work. And it's fun!

4 Installing R http://cran.r-project.org/ Choose appropriate interface – windows – Mac – Linux Follow install instructions

5 R interface batching file: File -> open script run commands: Ctrl-R Save session: sink([filename])….sink() Quit session: q()

6 General Syntax result <- function(object(s), options…) function(object(s), options…) Object-oriented programming Note that ‘result’ is an object

7 First things first: help([function]) or ?function help.search(“linear model”) or ??”linear model” help.start()

8 Choosing your default setwd(“[pathname for directory]”) getwd() need “\\” instead of “\” when giving paths.Rdata.Rhistory

9 Start with data read.table read.csv scan dget

10 Extracting variables from data Use $: data$AGE note it is case-sensitive! attach([data]) and detach([data])

11 Descriptive statistics summary mean, median var quantile range, max, min

12 Missing values sometimes cause ‘error’ message na.rm=T na.option=na.omit

13 Data Objects data.frame, as.data.frame, is.data.frame – names([data]) – row.names([data]) matrix, as.matrix, is.matrix – dimnames([data]) factor, as.factor, is.factor – levels([factor]) arrays lists functions vectors scalars

14 Creating and manipulating combine: c cbind: combine as columns rbind: combine as rows list: make a list rep(x,n): repeat x n times seq(a,b,i): create a sequence between a and b in increments of i seq(a,b, length=k): create a sequence between a and b with length k with equally spaced increments

15 ifelse ifelse(condition, true, false) – agelt50 <- ifelse(data$AGE<50,1,0) – for equality must use “==“ – “or” is indicated by `|’ e.g., young.or.old 65,1,0) cut(x, breaks) – agegrp <- cut(data$AGE, breaks=c(0,50,60,130)) – agegrp <- cut(data$AGE, breaks=c(0,50,60,130), labels=c(0,1,2)) – agegrp <- cut(data$AGE, breaks=c(0,50,60,130), labels=F)

16 Looking at objects dim length sort attributes

17 Subsetting Use [ ] Vectors – data$AGE[data$REGION==1] – data$AGE[data$LOS<10] Matrices & Dataframes – data[data$AGE<50, ] – data[, 2:5] – data[data$AGE<50, 2:5]

18 Some math abs(x) sqrt(x) x^k log(x) (natural log, by default) choose(n,k)

19 Matrix Manipulation Matrix multiplication: A%*%B transpose: t(X) diag(X)

20 Table table(x,y) tabulate(x)

21 Statistical Tests and CI’s t.test fisher.test and binom.exact wilcox.test

22 Plots hist boxplot plot – pch, type, lwd – xlab, ylab – xlim, ylim – xaxt, yaxt axis

23 Plot Layout par(mfrow=c(2,1)) par(mfrow=c(1,1)) par(mfcol=c(2,2)) help(par)

24 Probability Distributions Normal: – rnorm(N,m,s): generate random normal data – dnorm(x,m,s): density at x for normal with mean m, std dev s – qnorm(p,m,s): quantile associated with cumulative probability of p for normal with mean m, std dev s – pnorm(q,m,s): cumulative probability at quantile q for normal with mean m, std dev s Binomial – rbinom – etc.

25 Libraries Additional packages that can be loaded (next lecture) Example: epitools library library(help=[libname])

26 Keeping things tidy ls() and objects() rm() rm(list=ls())


Download ppt "Computing for Research I Spring 2013 Primary Instructor: Elizabeth Garrett-Mayer Introduction to R March 5."

Similar presentations


Ads by Google