Presentation is loading. Please wait.

Presentation is loading. Please wait.

Data Manipulation in R Fish 552: Lecture 6. Recommended reading Data Manipulation with R (Phil Specter, 2006)

Similar presentations


Presentation on theme: "Data Manipulation in R Fish 552: Lecture 6. Recommended reading Data Manipulation with R (Phil Specter, 2006)"— Presentation transcript:

1 Data Manipulation in R Fish 552: Lecture 6

2 Recommended reading Data Manipulation with R (Phil Specter, 2006) http://www.springerlink.com/content/t19776/ –Chapter 4: 4.1 –Chapter 8

3 Combining data sources We often want to combine two sources of data by merging a common variable or observation –e.g. Data measured by different instruments at overlapping times, but on different times scales {1,5,10,15,20,.... } and {1, 10, 20,.... } Such tasks can be very difficult code only using logicals, rbind(), cbind(), etc.. merge() combines data very effectively Go through help function –?merge

4 merge() Two sources of data with overlapping time measurements > station1 Time1 Measurements1 1 1 -0.81413329 2 2 -0.36694960 3 3 -0.08176642 4 4 0.47305645 5 5 -2.57701675.. 99 99 -1.23167543 100 100 -0.03074547 > station2 Time2 Measurements2 1 1 1 2 5 3 3 10 2 4 15 1 5 20 3.. 20 95 2 21 100 3

5 merge() Create a single data set with both variables having a common time observation > merge(station1, station2, by.x = "Time1", by.y = "Time2") Time1 Measurements1 Measurements2 1 1 -0.81413329 1 2 5 -2.57701675 3 3 10 0.40208770 2 4 15 0.60144334 1 5 20 0.71858431 3

6 Behind the scenes of merge() merge() relies on several other functions in R which can also be used for other data manipulation tasks Finding common elements in vectors : intersect() > intersect(1:10, 7:20) [1] 7 8 9 10 Matching positions of common elements in vectors: match() –match(x, table, nomatch = NA) > match(1:10, c(1,3,5,9)) [1] 1 NA 2 NA 3 NA NA NA 4 NA Value given to elements that don’t match

7 Hands-on Exercise 1 Write code that returns a TRUE/FALSE vector indicating whether the elements in one vector matches any of the elements in the other vector (TRUE). Continue to use the two vectors –1:10 –c(1, 3, 5, 9) Hint: Use the match function. How can you change the results of match to TRUE/FALSE?

8 Dates in R R has built in class, "Date" to handle data entered as a date in various formats The as.Date() function allows a variety of input formats through the format = argument. > as.Date('1983/09/22') [1] "1983-09-22"

9 Dates in R If your input dates are not in the standard format, a format string can be composed CodeValue %d Day of the month (decimal number) %m Month (decimal number) %b Month (abbreviated) %B Month (full name) %y Year (2 digit) %Y Year (4 digit)

10 Dates in R > as.Date('9/22/1983', format = '%m/%d/%Y') [1] "1983-09-22" > as.Date('September 22, 1983', format = '%B %d, %Y') [1] "1983-09-22" > as.Date('22SEP83', format = '%d%b%y') [1] "1983-09-22"

11 Dates in R Other components of the days can easily be extracted –Arguments to these functions must be of the "POSIXt" or "Date" class. > weekdays(as.Date('1983/09/22')) [1] "Thursday" > months(as.Date('1983/09/22')) [1] "September" > quarters(as.Date('1983/09/22')) [1] "Q3"

12 Dates in R Dates read from instruments usually have a finer time scale (hours, minutes, seconds) The POSIXct class in R should be used for these types of inputs. Default input format for POSIX dates consists of the year, followed by month and day, separated by slashes or dashes. The time values may be followed by white space and a time in the form hour:minutes:seconds or hour:minutes. e.g. –1983/9/22 23:20:05 –If the dates are not in this format see help on strptime()

13 Dates in R > as.POSIXlt("1983-9-22 23:20:05") [1] "1983-09-22 23:20:05” > aDate <- as.POSIXlt("1983-9-22 23:20:05")

14 Dates in R Many common functions can accept objects of a date class –min(), max(), mean(),... The difftime() function computes the difference between two time dates > difftime(as.Date('2009/09/17'), as.Date('1983/09/22')) Time difference of 9492 days

15 Dates in R Dates can also be coded as factors > everyday <- seq(from = as.Date('2009-1-1'), + to = as.Date('2009-12-31'), by = 'day') > month <- months(everyday) > month <- factor(month, levels = unique(month), ordered = TRUE) > table(months) months January February March April May June... 31 28 31 30 31 30.. seq() can take objects of various classes

16 Hands-on Exercise 2 Choose any two dates you like –Save them as two objects in the month/day/year format Apply R functions to determine –The weekdays –The month –The difference between the two dates

17 Packages in R In the homework, you loaded the MASS package –This package contains functions and data sets from Venables and Ripley’s Modern Applied Statistics with S Packages in R are one of the programs salient features and the user benefits from others contributions Before performing an arduous programming task its often useful to search contributed packages that might already have the features you need built in

18 Installing packages Packages must be downloaded from the web via a mirror site –The nearest one to us to the Fred Hutchinson Cancer Research Institute http://cran.fhcrc.org/ (This is just going to a mirror of the R website)http://cran.fhcrc.org/ Set the CRAN mirror –chooseCRANmirror() If you know the name of the package you want to install –install.packages(package) To select a package from a list...

19 List of installed packages Click on the name of a package to get a list of its functions Install a package

20 Loading packages Once the package has been installed, it needs to be loaded into R with the function library(package) or require(package) –For right now in the class, these functions are equivalent R should return a warning if the package was compiled on a newer version R that the user is currently using or if a newer version is not compatible with an older package

21 How to find packages? Ask around Search (rseek.org) Papers Task views: http://cran.r-project.org/web/views/http://cran.r-project.org/web/views/

22 Using packages In addition to function-level help, many packages have vignettes, which are overviews or getting started guides to the package, usually with examples. > vignette(all = FALSE) > vignette(all = TRUE) List vignettes for all attached packages (i.e. just the ones you’ve called library for) List vignettes for all installed packages

23 Using packages If the package you’re using has a vignette, you can load it with the vignette function > vignette("timedep") You can extract all the code out of the vignette > edit(vignette("timedep")) You can also find vignettes online http://cran.r-project.org/web/packages/survival/index.html http://cran.r-project.org/web/packages/survival/index.html


Download ppt "Data Manipulation in R Fish 552: Lecture 6. Recommended reading Data Manipulation with R (Phil Specter, 2006)"

Similar presentations


Ads by Google