Presentation is loading. Please wait.

Presentation is loading. Please wait.

5/2/2015330 Lecture 41 STATS 330: Lecture 4. 5/2/2015330 Lecture 42 Housekeeping My contact details…. Plus much else on course web page www.stat.auckland.ac.nz/~lee/330/

Similar presentations


Presentation on theme: "5/2/2015330 Lecture 41 STATS 330: Lecture 4. 5/2/2015330 Lecture 42 Housekeeping My contact details…. Plus much else on course web page www.stat.auckland.ac.nz/~lee/330/"— Presentation transcript:

1 5/2/2015330 Lecture 41 STATS 330: Lecture 4

2 5/2/2015330 Lecture 42 Housekeeping My contact details…. Plus much else on course web page www.stat.auckland.ac.nz/~lee/330/ Or via Cecil

3 5/2/2015330 Lecture 43

4 5/2/2015330 Lecture 44 Today’s lecture: R for graphics Aim of the lecture: To show you how to use R to produce the plots shown in the last few lectures

5 5/2/2015330 Lecture 45 Getting data into R  In 330, as in many cases, data comes in 2 main forms As a text file As an Excel spreadsheet  Need to convert from these formats to R  Data in R is organized in data frames Row by column arrangement of data (as in Excel) Variables are columns Rows are cases (individuals)

6 5/2/2015330 Lecture 46 Text files to R  Suppose we have the data in the form of a text file  Edit the text file (use Notepad or similar) so that The first row consists of the variable names Each row of data (i.e. data on a complete case) corresponds to one line of the file  Suppose data fields are separated by spaces and/or tabs  Then, to create a data frame containing the data, we use the R function read.table

7 5/2/2015330 Lecture 47 Example: the cherry tree data Suppose we have a text file called cherry.txt (probably created using Notepad or maybe Word, but saved as a text file) First line: variable names Data for each tree on a separate line, separated by “white space” (spaces or tabs)

8 5/2/2015330 Lecture 48 Creating the data frame In R, type cherry.df = read.table(file.choose(), header=TRUE) and press the return key This brings up the dialog to select the file cherry.txt containing the data. Click here to select file Click here to load data

9 5/2/2015330 Lecture 49 Check all is OK!

10 5/2/2015330 Lecture 410 Getting data from a spreadsheet (1) Create the spreadsheet in Excel Save it as Comma Delimited Text (CSV) This is a text file with all cells separated by commas File is called cherry.csv

11 5/2/2015330 Lecture 411 Getting data from a spreadsheet (2) In R, type cherry.df = read.table(file.choose(), header=TRUE, sep=“,”) and proceed as before

12 Getting data from the R330 package  The package R330 contains several data sets used in the course, including the cherry tree data  To access the data frame: Install the R330 package (see Appendix A.10 of the coursebook) In R, type > library(R330) > data(cherry.df) 5/2/2015330 Lecture 412

13 5/2/2015330 Lecture 413 Data frames and variables  Suppose we have read in data and made a data frame  At this point R doesn’t know about the variables in the data frame, so we can’t use e.g. the variable diameter in R commands  We need to say attach(cherry.df) to make the variables in cherry.df visible to R.  Alternatively, say cherry.df$diameter (better)

14 5/2/2015330 Lecture 414 Scatterplots In R, there are 2 distinct sets of functions for graphics, one for ordinary graphics, one for trellis. Eg for scatterplots, we use either plot (ordinary R) or xyplot (Trellis) In the next few slides, we discuss plot.

15 5/2/2015330 Lecture 415 Simple plotting plot(cherry.df$height, cherry.df$volume, xlab=“Height (feet)”, ylab=“Volume (cubic feet)”, main = “Volume versus height for 31 black cherry trees”) i.e. label axes (give units if possible), give a title

16 5/2/2015330 Lecture 416

17 Alternative form of plot plot(volume ~ height, xlab=“Height (feet)”, ylab=“Volume (cubic feet)”, main = “Volume versus height for 31 black cherry trees”, data = cherry.df) Don’t need use the $ notation with this form, note reversal of x,y 5/2/2015330 Lecture 417

18 5/2/2015330 Lecture 418 Colours, points, etc par(bg="darkblue") plot(cherry.df$height, cherry.df$volume, xlab="Height (feet)", ylab="Volume (cubic feet)", main = "Volume versus height for 31 black cherry trees", pch=19,fg="white", col.axis=“lightblue",col.main="white", col.lab=“white",col="white",cex=1.3) Type ?par for more info

19 5/2/2015330 Lecture 419

20 5/2/2015330 Lecture 420 Lines  Suppose we want to join up the rats on the rats plot. (see data next slide)  We could try plot(rats.df$day, rats.df$growth, type=“l”) but this won’t work  Points are plotted in order they appear in the data frame and each point is joined to the next

21 5/2/2015330 Lecture 421 Rats: the data > rats.df growth group rat change day 1 240 1 1 1 1 2 250 1 1 1 8 3 255 1 1 1 15 4 260 1 1 1 22 5 262 1 1 1 29 6 258 1 1 1 36 7 266 1 1 2 43 8 266 1 1 2 44 9 265 1 1 2 50 10 272 1 1 2 57 11 278 1 1 2 64 12 225 1 2 1 1 12 230 1 2 1 8... More data

22 5/2/2015330 Lecture 422 Don’t want this!

23 5/2/2015330 Lecture 423 Solution Various solutions, but one is to plot each line separately, using subsetting plot(day,growth,type="n") lines (day[rat==1],growth[rat==1]) lines (day[rat==2],growth[rat==2]) and so on …. (boring!), or (better) for(j in 1:16){ lines (day[rat==j],growth[rat==j]) } Draw axes, labels only

24 5/2/2015330 Lecture 424 Indicating groups Want to plot the litters with different colours, add a legend: Rats 1-8 are litter 1, 9-12 litter 2, 13-16 litter 3 plot(day,growth,type="n") for(j in 1:8)lines(day[rat==j], growth[rat==j],col="white") # litter 1 for(j in 9:12)lines (day[rat==j], growth[rat==j],col="yellow") # litter 2 for(j in 13:16)lines (day[rat==j], growth[rat==j],col="purple") # litter 3 Set colour of line

25 5/2/2015330 Lecture 425 legend legend(13,380, legend = c(“Litter 1”, “Litter 2”, “Litter 3”), col = c("white","yellow","purple"), lwd = c(2,2,2), horiz = TRUE, cex = 0.7) (Type ?legend for a full explanation of these parameters)

26 5/2/2015330 Lecture 426

27 Points and text x=1:25 y=1:25 plot(x,y, type="n") points(x,y,pch=1:25, col="red", cex=1.2) 5/2/201527330 Lecture 4

28 5/2/201528330 Lecture 4

29 Points and text (3) x=1:26 y=1:26 plot(x,y, type="n") text(x,y, letters, col="blue", cex=1.2) 5/2/201529330 Lecture 4

30 5/2/201530330 Lecture 4

31 Use of pos 5/2/2015330 Lecture 431 x = 1:10 y = 1:10 plot(x,y) position = rep(c(2,4), 5) mytext = rep(c(“Left",“Right"), 5) text(x,y,mytext, pos=position)

32 5/2/2015330 Lecture 432

33 5/2/2015330 Lecture 433 Trellis  Must load trellis library first library(lattice)  General form of trellis plots xyplot(y~x|W*Z, data=some.df)  Don’t need to use the $ form,, trellis functions can pick out the variables, given the data frame

34 5/2/2015330 Lecture 434 Main trellis functions  dotplotfor dotplots, use when X is categorical, Y is continuous  bwplotfor boxplots, use when X is categorical, Y is continuous  xyplotfor scatter plots, use when both x and y are continuous  equal.countuse to turn continuous conditioning variable into groups

35 Changing background colour To change trellis background to white trellis.par.set(background = list(col="white")) To change plotting symbols trellis.par.set(plot.symbol = list(pch=16, col="red", cex=1)) 5/2/2015330 Lecture 435

36 5/2/2015330 Lecture 436 Equal.count xyplot(volume~height|diameter, data=cherry.df)

37 5/2/2015330 Lecture 437 Equal.count (2) diam.gp<-equal.count(diameter,number=4,overlap=0) xyplot(volume~height|diam.gp, data=cherry.df)

38 Changing plotting symbols To change plotting symbols trellis.par.set(plot.symbol = list(pch=16, col="red", cex=1)) 5/2/2015330 Lecture 438

39 5/2/2015330 Lecture 439

40 5/2/2015330 Lecture 440 Non-trellis version coplot(volume~height|diameter, data=cherry.df)

41 5/2/2015330 Lecture 441 Non-trellis version (2) coplot(volume~height|diameter, data=cherry.df,number=4,overlap=0)

42 5/2/2015330 Lecture 442 Other useful functions  Regular R scatterplot3d (3d scatter plot, load library scatterplot3d) contour, persp (draws contour plots, surfaces) pairs  Trellis cloud (3d scatter plot)

43 Rotating plots  You need to install the R330 package Create a data frame e.g. called data.df with the response in the first column  Then, type reg3d(data.df) 5/2/2015330 Lecture 443


Download ppt "5/2/2015330 Lecture 41 STATS 330: Lecture 4. 5/2/2015330 Lecture 42 Housekeeping My contact details…. Plus much else on course web page www.stat.auckland.ac.nz/~lee/330/"

Similar presentations


Ads by Google