Data Manipulation in R Fish 552: Lecture 6. Recommended reading Data Manipulation with R (Phil Specter, 2006)

Slides:



Advertisements
Similar presentations
Extended DISC Online System User Instruction: How to Run a Team Analysis.
Advertisements

Programming Paradigms and languages
CSCI 6962: Server-side Design and Programming Input Validation and Error Handling.
COLLECTIVE BARGAINING REPORTING Gateway User Guide Data Entry and Submission January 2014.
Assignment #2, 12- month Calendar CS-2301, B-Term Programming Assignment #2 12-Month Calendar CS-2301, System Programming for Non-Majors (Slides.
Statistical Software An introduction to Statistics Using R Instructed by Jinzhu Jia.
Microsoft Office Word 2013 Expert Microsoft Office Word 2013 Expert Courseware # 3251 Lesson 4: Working with Forms.
Using Object-Oriented JavaScript CST 200- JavaScript 4 –
Copyright ©: SAMSUNG & Samsung Hope for Youth. All rights reserved Tutorials Software: Building apps Suitable for: Advanced.
Week71 APCS-AB: Java Control Structures October 17, 2005.
Chapter 10 Working with Dates & Times Microsoft Excel 2003.
732A44 Programming in R.  Self-studies of the course book  2 Lectures (1 in the beginning, 1 in the end)  Labs (computer). Compulsory submission of.
Introduction to SPSS Edward A. Greenberg, PhD
Data, graphics, and programming in R 28.1, 30.1, Daily:10:00-12:45 & 13:45-16:30 EXCEPT WED 4 th 9:00-11:45 & 12:45-15:30 Teacher: Anna Kuparinen.
DAY 5: MICROSOFT EXCEL – CHAPTER 2 Aliya Farheen January 27,2015.
Hands-on Introduction to R. We live in oceans of data. Computers are essential to record and help analyse it. Competent scientists speak C/C++, Java,
11 3 / 12 CHAPTER Databases MIS105 Lec15 Irfan Ahmed Ilyas.
Creating a Digital Classroom. * Introduction * The Student Experience * Schoology’s Features * Create a Course & Experiment.
Moodle (Course Management Systems). Forums, Chats, and Messaging.
A Simple Guide to Using SPSS ( Statistical Package for the Social Sciences) for Windows.
WFM 6311: Climate Risk Management © Dr. Akm Saiful Islam WFM 6311: Climate Change Risk Management Akm Saiful Islam Lecture-7:Extereme Climate Indicators.
Introduction to R Introductions What is R? RStudio Layout Summary Statistics Your First R Graph 17 September 2014 Sherubtse Training.
DAY 21: MICROSOFT ACCESS – CHAPTER 5 MICROSOFT ACCESS – CHAPTER 6 MICROSOFT ACCESS – CHAPTER 7 Aliya Farheen October 29,2015.
XP New Perspectives on XML, 2 nd Edition Tutorial 7 1 TUTORIAL 7 CREATING A COMPUTATIONAL STYLESHEET.
Copyright © 2010, SAS Institute Inc. All rights reserved. SAS ® Using the SAS Grid.
Object Oriented Programming (OOP) LAB # 1 TA. Maram & TA. Mubaraka TA. Kholood & TA. Aamal.
SunSatFriThursWedTuesMon January
Introductory Data Analysis F73DA2. Contact Times (Spring Term 2008) Monday 4: : Lecture in LT3 Tuesday 2: : Lecture in LT3 Wednesday
Lecture 11 Introduction to R and Accessing USGS Data from Web Services Jeffery S. Horsburgh Hydroinformatics Fall 2013 This work was funded by National.
DAY 5: EXCEL CHAPTER 2 Sravanthi Lakkimsetty Feb 1, 2016.
DAY 6: MICROSOFT EXCEL –CHAPTER 2,3 Aliya Farheen January 28,2016.
Review > unique(plates) > is.numeric(plates) > cut(ages, breaks=c(0,18,65,Inf), labels=c("Kid","Adult","Senior")) > letters > month.name > c(Inf, NA, NaN,
Working with data in R 2 Fish 552: Lecture 3. Recommended Reading An Introduction to R (R Development Core Team) –
Review > system.time(unique(temp)) > merge(station1, station2, by.x="time1", by.y="time2") > match(1:10, c(1,3,5,9)) > as.Date('9/22/1983', format = '%m/%d/%Y')
Introduction to R and Data Science Tools in the Microsoft Stack Jamey Johnston.
"The findings and conclusions in this report are those of the author(s) and do not necessarily represent the official position of the Centers for Disease.
Physics 114: Lecture 1 Overview of Class Intro to MATLAB
C++ Memory Management – Homework Exercises
The Effects of Prelisted Items in Business Survey Questionnaire Tables
(Winter 2017) Instructor: Craig Duckett
ivote A system for polling students in the class
Strings CSCI 112: Programming in C.
Programming in R Intro, data and programming structures
R programming language
Lecture 19 Strings and Regular Expressions
Introduction to R.
JavaScript Objects.
MICROSOFT OUTLOOK and Outlook service Provider
Section 64 – Manipulating Data Using Methods – Java Swing
Other Kinds of Arrays Chapter 11
Lesson 9 Sharing Documents
User Defined Functions
Functions Declarations CSCI 230
Formatting Output.
ICPSR: Resources for Instructors Finding and Analyzing Data 9/26/2012
AMIS 310 Foundations of Accounting
ENDNOTE Software – The Basics
Navya Thum February 13, 2013 Day 7: MICROSOFT EXCEL Navya Thum February 13, 2013.
Data Upload & Management
This is where R scripts will load
Statistics 540 Computing in Statistics
Sr. Quality Engineering Manager,
Inside a PMI Online Course
String Processing 1 MIS 3406 Department of MIS Fox School of Business
This is where R scripts will load
This is where R scripts will load
COMPUTER PROGRAMMING SKILLS
Data analysis with R and the tidyverse
R programming.
Working with dates and times
Presentation transcript:

Data Manipulation in R Fish 552: Lecture 6

Recommended reading Data Manipulation with R (Phil Specter, 2006) –Chapter 4: 4.1 –Chapter 8

Combining data sources We often want to combine two sources of data by merging a common variable or observation –e.g. Data measured by different instruments at overlapping times, but on different times scales {1,5,10,15,20,.... } and {1, 10, 20,.... } Such tasks can be very difficult code only using logicals, rbind(), cbind(), etc.. merge() combines data very effectively Go through help function –?merge

merge() Two sources of data with overlapping time measurements > station1 Time1 Measurements > station2 Time2 Measurements

merge() Create a single data set with both variables having a common time observation > merge(station1, station2, by.x = "Time1", by.y = "Time2") Time1 Measurements1 Measurements

Behind the scenes of merge() merge() relies on several other functions in R which can also be used for other data manipulation tasks Finding common elements in vectors : intersect() > intersect(1:10, 7:20) [1] Matching positions of common elements in vectors: match() –match(x, table, nomatch = NA) > match(1:10, c(1,3,5,9)) [1] 1 NA 2 NA 3 NA NA NA 4 NA Value given to elements that don’t match

Hands-on Exercise 1 Write code that returns a TRUE/FALSE vector indicating whether the elements in one vector matches any of the elements in the other vector (TRUE). Continue to use the two vectors –1:10 –c(1, 3, 5, 9) Hint: Use the match function. How can you change the results of match to TRUE/FALSE?

Dates in R R has built in class, "Date" to handle data entered as a date in various formats The as.Date() function allows a variety of input formats through the format = argument. > as.Date('1983/09/22') [1] " "

Dates in R If your input dates are not in the standard format, a format string can be composed CodeValue %d Day of the month (decimal number) %m Month (decimal number) %b Month (abbreviated) %B Month (full name) %y Year (2 digit) %Y Year (4 digit)

Dates in R > as.Date('9/22/1983', format = '%m/%d/%Y') [1] " " > as.Date('September 22, 1983', format = '%B %d, %Y') [1] " " > as.Date('22SEP83', format = '%d%b%y') [1] " "

Dates in R Other components of the days can easily be extracted –Arguments to these functions must be of the "POSIXt" or "Date" class. > weekdays(as.Date('1983/09/22')) [1] "Thursday" > months(as.Date('1983/09/22')) [1] "September" > quarters(as.Date('1983/09/22')) [1] "Q3"

Dates in R Dates read from instruments usually have a finer time scale (hours, minutes, seconds) The POSIXct class in R should be used for these types of inputs. Default input format for POSIX dates consists of the year, followed by month and day, separated by slashes or dashes. The time values may be followed by white space and a time in the form hour:minutes:seconds or hour:minutes. e.g. –1983/9/22 23:20:05 –If the dates are not in this format see help on strptime()

Dates in R > as.POSIXlt(" :20:05") [1] " :20:05” > aDate <- as.POSIXlt(" :20:05")

Dates in R Many common functions can accept objects of a date class –min(), max(), mean(),... The difftime() function computes the difference between two time dates > difftime(as.Date('2009/09/17'), as.Date('1983/09/22')) Time difference of 9492 days

Dates in R Dates can also be coded as factors > everyday <- seq(from = as.Date(' '), + to = as.Date(' '), by = 'day') > month <- months(everyday) > month <- factor(month, levels = unique(month), ordered = TRUE) > table(months) months January February March April May June seq() can take objects of various classes

Hands-on Exercise 2 Choose any two dates you like –Save them as two objects in the month/day/year format Apply R functions to determine –The weekdays –The month –The difference between the two dates

Packages in R In the homework, you loaded the MASS package –This package contains functions and data sets from Venables and Ripley’s Modern Applied Statistics with S Packages in R are one of the programs salient features and the user benefits from others contributions Before performing an arduous programming task its often useful to search contributed packages that might already have the features you need built in

Installing packages Packages must be downloaded from the web via a mirror site –The nearest one to us to the Fred Hutchinson Cancer Research Institute (This is just going to a mirror of the R website) Set the CRAN mirror –chooseCRANmirror() If you know the name of the package you want to install –install.packages(package) To select a package from a list...

List of installed packages Click on the name of a package to get a list of its functions Install a package

Loading packages Once the package has been installed, it needs to be loaded into R with the function library(package) or require(package) –For right now in the class, these functions are equivalent R should return a warning if the package was compiled on a newer version R that the user is currently using or if a newer version is not compatible with an older package

How to find packages? Ask around Search (rseek.org) Papers Task views:

Using packages In addition to function-level help, many packages have vignettes, which are overviews or getting started guides to the package, usually with examples. > vignette(all = FALSE) > vignette(all = TRUE) List vignettes for all attached packages (i.e. just the ones you’ve called library for) List vignettes for all installed packages

Using packages If the package you’re using has a vignette, you can load it with the vignette function > vignette("timedep") You can extract all the code out of the vignette > edit(vignette("timedep")) You can also find vignettes online