Download presentation
Presentation is loading. Please wait.
Published byCody Little Modified over 9 years ago
1
Ann Arbor ASA ‘Up and Running’ With R Prepared by volunteers of the Ann Arbor chapter of the American statistical association, in cooperation with the department of statistics and the center for statistical consultation and research of the university of Michigan October 27 th, 2010
2
R Class Agenda Brief Introduction to R Using R Help Introduction to Functions Available in R Working with Data Importing/Exporting Data Graphs Simple Models Writing Functions/Programming Ann Arbor ASA (Up and Running with R) 2
3
What is R? R is a computing language commonly used for statistical analysis R is open source which means that the source code is available to all users R is a free software package, download it at http://www.r-project.org/http://www.r-project.org/ Ann Arbor ASA (Up and Running with R) 3
4
More About R Most statistical analysis is done using pre- defined functions in R. These functions are available in many different packages. When you download R, you have access to many functions from the ‘base’ package. More advanced functions will require that you download other packages. Ann Arbor ASA (Up and Running with R) 4
5
What can you do with R? Topics in statistics are readily available such as linear modeling, linear mixed modeling, multivariate analysis, clustering, non- parametric methods, and classification R is well known to produce high quality graphics. Simple plots are easy and with a little more practice, users can produce publishable graphics! Ann Arbor ASA (Up and Running with R) 5
6
Time to Launch R Find R on your computer: Start>Statistical Software Packages>R Go to the file menu and double click ‘New script’ Here is the editor window where we will type our script It is more convenient to type here than in your workspace Try typing in both the workspace and the editor window Ann Arbor ASA (Up and Running with R) 6
7
Data Objects in R Users create different data objects in R Data objects refer to variables, arrays of numbers, character strings, functions and other more complicated data manipulations ‘<-’ allows you to assign data objects with names of your choice Type ‘a<-7’ in your editor window Submit this command by highlighting it and pressing ctrl+r Practice creating different data objects and submit them to the workspace Ann Arbor ASA (Up and Running with R) 7
8
Data Objects in R Type ‘objects ()’ This allows you to see that you have created the object ‘a’ during this R session You can view previously submitted commands by using the up/down arrow on your computer You can remove this object by typing ‘rm(a)’ Try removing some objects you created and then type ‘objects()’ to see if they are listed Ann Arbor ASA (Up and Running with R) 8
9
Getting Help in R To get help on any specific function: Type ‘help(name of function)’ OR type ‘?(name of function)’ Sometimes help is not available from the packages you have downloaded Type ‘??(name of function)’ Try searching for help on ‘hist’ or ‘lm’ Two popular R resource websites: Rseek.org nabble.com Ann Arbor ASA (Up and Running with R) 9
10
A Simple Example to Get You Started To set up a vector named x use the R command: ‘x<-c(5,4,3,6)’ This is an assignment statement using the function c() which creates a vector by concatenating its arguments Perform vector/matrix arithmetic: ‘v<- 3*x - 5’ Ann Arbor ASA (Up and Running with R) 10
11
R Reference Card *created by Tom Short There are thousands of available functions in R, but this Reference Card provides a strong working knowledge Let’s take a minute to look at the organization of the Reference Card and try out a few of the functions available! Ann Arbor ASA (Up and Running with R) 11
12
Generating Sequences/Replicating Objects Sequences : submit the following commands ‘seq(-5, 5, by=.2)’ ‘seq(length=51, from=-5, by=.2)’ Both produce a sequence from -5 to 5 with a distance of.2 between objects Replications : submit the following commands ‘rep(x, times=5)’ ‘rep(x, each=5) ‘ Both produce x replicated 5 times Ann Arbor ASA (Up and Running with R) 12
13
Working with Data Sets There are many data sets available for use in R Type ‘data()’ to see what’s available We will work with the trees data set Type ‘data(trees)’ This data set is now ready to use in R The following are useful commands: ‘summary(trees)’ – summary of variables ‘dim(trees)’ – dimension of data set ‘names(trees)’ – see variable names ‘attach(trees)’ – attach the variable names for use in R Ann Arbor ASA (Up and Running with R) 13
14
Extracting Data R has saved the data set trees as a data frame object Check this by typing ‘class(trees)’ R stores this data in matrix row/column format: data.frame[rows,columns] Type ‘trees[c(1:2),2]’ – we see the first 2 rows and 2 nd column Type ‘trees[3,c(“Height”,”Girth”)]’ – can also reference column names Type ‘trees[-c(10:20),”Height”]’ – skips rows 10-20 for variable Height Ann Arbor ASA (Up and Running with R) 14
15
Extracting Data (continued) The subset() command is very useful to extract data in a logical manner. 1 st argument is data, 2 nd argument is logical subset requirement ‘subset(trees, Height>80)’ – subset where all tree heights >80 ‘subset(trees, Height 10) ‘– subset where all tree heights 10 ‘subset(trees, Height 11)’ – subset where all tree heights 11 Ann Arbor ASA (Up and Running with R) 15
16
Importing Data The most common (and easiest) file to import is a text file with the read.table() command R needs to be told where the file is located You can set the working directory which tells R where all your files are located by typing ‘setwd("C:\\Users\\hicksk\\Desktop")’ OR you can physically point to the working directory by going to File<Change dir… and choosing the location of your files OR you can include the physical location of your file in your read.table() command Ann Arbor ASA (Up and Running with R) 16
17
Using the read.table() command Go to ASA Ann Arbor Chapter’s website here and look under the R Classes section, open ‘furniture.zip’ and save the files to your desktophere Remember we must tell R where these files are located to read them in properly read.table("C:\\Users\\hicksk\\Desktop\\furnit ure.txt",header=TRUE,sep=“”) Important to use double slashes \\ rather than single slash \ Tell R whether you have column names on your data with header=TRUE or header=FALSE Ann Arbor ASA (Up and Running with R) 17
18
Using read.table() (cont’d) Remember, another way of specifying the file’s location is to set the working directory first and then read in the file setwd(“C:\\Users\\hicksk\\Desktop”) read.table(“furniture.txt”,header=TRUE,sep=“”) OR we had the option of physically pointing the location by going to File>Change dir… and pointing to the file’s location. We would then be able to read the file similar to above by typing ‘read.table(“furniture.txt”,header=TRUE,sep=“”)’ Ann Arbor ASA (Up and Running with R) 18
19
read.table(), read.csv() and Missing Values It is also popular to import csv files since excel files are easily converted to csv files read.csv() and read.table() are very similar although they handle missing values differently read.csv() automatically assign an ‘NA’ to missing values read.table() will not load data with missing values, so you must assign ‘NA’ to missing values before reading it into R Ann Arbor ASA (Up and Running with R) 19
20
read.table(), read.csv() and Missing Values (cont’d) Let’s remove a data entry from both “furniture.txt” and “furniture.csv” From the first row, erase 100 from the Area column Now try to read in the data from these two files using read.table() and read.csv() You should see that you cannot read the data in using the read.table() command unless you input an entry for the missing value Ann Arbor ASA (Up and Running with R) 20
21
Other Options for Importing Data When you download R, you should have automatically obtained the foreign package By submitting ‘library(foreign)’, you will have many more options for importing data: read.xport(), read.spss(), read.dta(), read.mtp() For more information on these options, simply submit ‘help(read.XXXX)’ Ann Arbor ASA (Up and Running with R) 21
22
Exporting Data You can export data by using the write.table() command ‘write.table(trees, “treesDATA.txt”, row.names=FALSE, sep=“,”)’ Specify that we want the trees data set exported Type in name of file to be exported. The default is that it will write the file to the working directory already specified unless you give a location. row.names=FALSE tells R that we do not wish to preserve the row names sep=“,” tells R that the data set is comma delimited Ann Arbor ASA (Up and Running with R) 22
23
Furniture Data Set Let’s assign a name to the furniture data set as we read it in so we can do some analysis furn<-read.table(“furniture.txt”,sep=“”,h=T) To get a better understanding of our data set, use some useful commands: dim(furn) summary(furn) names(furn) attach(furn) Ann Arbor ASA (Up and Running with R) 23
24
Graphs in R Using the Furniture Data R can produce both very simple and very complex graphs We will only get a brief introduction today but I encourage you to investigate further Let’s start by making a simple scatter plot of the Area and Cost variables from our furniture data set plot(Area,Cost,main=“Area vs Cost”, xlab=“Area”,ylab=“Cost”) We have told R to put Area on the x-axis, Cost on the y-axis and provided a title and label axes Ann Arbor ASA (Up and Running with R) 24
25
Graphs in R Let’s look at the distribution of our variables using some different graphs in R hist(Area) – histogram of Area hist(Cost) – histogram of Cost boxplot(Cost ~ Type) – boxplot of Cost by Type We can make the boxplot much prettier boxplot(Cost ~ Type, main=“Boxplot of Cost by Type”, col=c(“orange”, “green”, “blue”), xlab=“Type”, ylab = “Cost”) Ann Arbor ASA (Up and Running with R) 25
26
Graphs in R We can also look at a scatter plot matrix of all variables in a data set by using the pairs() function pairs(furn) Or we can look at a correlation/covariance matrix of the numeric variables cor(furn[,c(2:3)]) cov(furn[,c(2:3)]) Ann Arbor ASA (Up and Running with R) 26
27
Graphs in R/Simple Models Let’s perform a simple linear regression using the furniture data set m1<-lm(Cost ~ Area) summary(m1) coef(m1) fitted.values(m1) residuals(m1) We can also plot the residuals against the fitted values plot(fitted.values(m1), residuals(m1)) Ann Arbor ASA (Up and Running with R) 27
28
Graphs in R/Simple Models Let’s continue with our scatter plot of Area and Cost plot(Area, Cost, main = “Cost Regression Example”, xlab=“Cost”, ylab=“Area”) abline(lm(Cost~Area), col=3, lty=1) lines( lowess(Cost~Area), col=3, lty=2) Now let’s interactively add a legend legend(locator(1), c(“Linear”, “Lowess”), lty=c(1,2), col=2) You can point to your graph and place the legend where you wish! Ann Arbor ASA (Up and Running with R) 28
29
Graphs in R/Simple Models Now let’s identify different points on the graph identify(Area, Cost, row.names(furn)) Makes it easy to identify outliers We can use the locator() command to quantify differences between the regression fit and the loess line locator(2) Now let’s compare predicted values of Cost when Area is equal to 250 Ann Arbor ASA (Up and Running with R) 29
30
Multivariate Analysis Now let’s do a multivariate regression using both Area and Type as predictors in the model m2<-lm(Cost ~ Area + Type) summary(m2) Now let’s see if our multivariate model is significantly better than the simple model by using ANOVA anova(m1, m2) The ANOVA table compares the two nested regression models by testing the null hypothesis that the Type predictor did not need to be in the model. Since the p-value<.05, we have evidence to conclude that Type is an important predictor. Ann Arbor ASA (Up and Running with R) 30
31
Writing Functions You can easily write your own programs and functions in R Type in the following function named f1: f1<-function(m,n) { result<-m + n return(result)} Now type ‘f1(3,5)’ and you should see that your function ran for the values 3,5 as specified Ann Arbor ASA (Up and Running with R) 31
32
Working with If-Then Statements Here’s an example of how if-then works in R: You’ll see since 10>5, it printed “GO BLUE” You can tell R to do multiple items using the following structure if (logical condition) {do this and this and this} Ann Arbor ASA (Up and Running with R) 32
33
If-Else Conditions We can make If-then statements slightly more complex using If-Else Conditions. Here’s an example: if(4>5){print("Happy Halloween") print(" BOO’’)} else{ print(‘’Merry XMAS’’) print(‘’HO HO HO’’)} Ann Arbor ASA (Up and Running with R) 33
34
For Loop/While Loop For loops can be quite helpful when writing functions. Here’s an example: for (i in 1:5){ print(i+1)} While loops are also quite handy. Here’s an example: f2<-function (x){ while( x<5){x<- x+1 print(x)}} f2(-5) Ann Arbor ASA (Up and Running with R) 34
35
Practice Problem #1 Create a sequence that starts at 0 and goes to 5 with a step of 0.5 Replicate ‘a b c’ 3 times Replicate ‘a’ 3 times, ‘b’ 3 times, ‘c’ 3 times in one command Ann Arbor ASA (Up and Running with R) 35
36
Practice Problem #2 Make a histogram of the “Girth” variable from the ‘trees’ data set. Include a title. Make a boxplot of the “Height” variable from the ‘trees’ data set. Color it blue and label your axes. Make a scatter plot of Girth and Height. Add the regression line. Ann Arbor ASA (Up and Running with R) 36
37
Practice Problem #3 Create a simple linear model with Girth as the predictor and Height as the response. Extract the coefficients. Now add Volume to the model. How can we tell if this model is preferred to the simpler model? Ann Arbor ASA (Up and Running with R) 37
38
Practice Problem #4 Fix x at a number smaller than 5. Use a ‘while loop’ to create a sequence that starts at x and increases by 2 until you reach 20. Create a function that will return the product of any two numbers. Ann Arbor ASA (Up and Running with R) 38
39
Thank you for your attention! Additional R Resources: R project home http://www.r-project.orghttp://www.r-project.org R documentation http://www.r-project.org/other-docs.htmlhttp://www.r-project.org/other-docs.html R help forum http://www.nabble.com/R-help-f13820.htmlhttp://www.nabble.com/R-help-f13820.html R Journal http://journal.r-project.org/http://journal.r-project.org/ R Graphical Gallery http://addictedtor.free.fr/graphiques/http://addictedtor.free.fr/graphiques/ R Graphical Manual http://bm2.genes.nig.ac.jp/RGM2/http://bm2.genes.nig.ac.jp/RGM2/ R Seek http://www.rseek.org/http://www.rseek.org/ Ann Arbor ASA (Up and Running with R) 39
40
Acknowledgements/References Thank you to Brady West for allowing the use of his R introductory materials. http://www.r-project.org http://www.r-project.org http://addictedtor.free.fr/graphiques/ http://addictedtor.free.fr/graphiques/ Ann Arbor ASA (Up and Running with R) 40
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.