Review > unique(plates) > is.numeric(plates) > cut(ages, breaks=c(0,18,65,Inf), labels=c("Kid","Adult","Senior")) > letters > month.name > c(Inf, NA, NaN,

Slides:



Advertisements
Similar presentations
Strategies for solving scientific problems using computers.
Advertisements

Concepts of Database Management Sixth Edition
Concepts of Database Management Seventh Edition
Programming with Alice Computing Institute for K-12 Teachers Summer 2011 Workshop.
Tirgul 8 Universal Hashing Remarks on Programming Exercise 1 Solution to question 2 in theoretical homework 2.
Database Design Concepts INFO1408 Term 2 week 1 Data validation and Referential integrity.
Chapter 7 Managing Data Sources. ASP.NET 2.0, Third Edition2.
SPSS 1: An Introduction to the Statistical Package SPSS Suzie Cro MRC Clinical Trials Unit.
End Show Introduction to Electronic Spreadsheets Unit 3.
Entering Data in Excel. Entering numbers, text, a date, or a time n 1Click the cell where you want to enter data. n 2Type the data and press ENTER or.
Microsoft Office Word 2013 Expert Microsoft Office Word 2013 Expert Courseware # 3251 Lesson 4: Working with Forms.
Floyd, Digital Fundamentals, 10 th ed EET 2259 Unit 13 Strings and File I/O  Read Bishop, Chapter 9.  Lab #13 and Homework #13 due next week.
Group practice in problem design and problem solving
Chapter 06: Lecture Notes (CSIT 104) 1 Copyright © 2008 Pearson Prentice Hall. All rights reserved. 1 1 Copyright © 2008 Prentice-Hall. All rights reserved.
(c) University of Washingtonhashing-1 CSC 143 Java Hashing Set Implementation via Hashing.
Microsoft Excel How to make a SPREADSHEET. Microsoft Excel IT is recommended that you have EXCEL running at the same time. You can try what you are reading.
1 Nassau Community CollegeProf. Vincent Costa Session 2 Excel Introduction CMP 117 Business Computing: Concepts &Applications.
Managing Business Data Lecture 8. Summary of Previous Lecture File Systems  Purpose and Limitations Database systems  Definition, advantages over file.
A Guide to SQL, Eighth Edition Chapter Three Creating Tables.
DAY 15: ACCESS CHAPTER 2 Larry Reaves October 7,
Introduction to SPSS Edward A. Greenberg, PhD
Introduction to to R Emily Kalah Gade University of Washington Credit to Kristin Siebel for development of much of this PowerPoint.
U:/msu/course/cse/103 Day 10, Slide 1 CSE 103 Students: Your BTs have been graded. See Erica or Jo with questions or stay.
Lesson 5 Using FunctionUsing Function. Objectives.
 A database is a collection of data that is organized so that its contents can easily be accessed, managed, and updated. What is Database?
Mastering Char to ASCII AND DOING MORE RELATED STRING MANIPULATION Why VB.Net ?  The Language resembles Pseudocode - good for teaching and learning fundamentals.
1 ADVANCED MICROSOFT EXCEL Lesson 9 Applying Advanced Worksheets and Charts Options.
Concepts of Database Management Seventh Edition
ISV Innovation Presented by ISV Innovation Presented by Business Intelligence Fundamentals: Data Cleansing Ola Ekdahl IT Mentors 9/12/08.
Concepts of Database Management Seventh Edition
Hands-on Introduction to R. We live in oceans of data. Computers are essential to record and help analyse it. Competent scientists speak C/C++, Java,
Copyright © 2008 Pearson Prentice Hall. All rights reserved Chapter 6 Data Tables and Amortization Tables Exploring Microsoft Office Excel 2007.
DAY 19: MICROSOFT ACCESS – CHAPTER 3 CONTD. Aliya Farheen March 17, 2015.
Create Lists in Millennium Jenny Schmidt SWITCH Library Consortium.
WHAT IS A DATABASE? A DATABASE IS A COLLECTION OF DATA RELATED TO A PARTICULAR TOPIC OR PURPOSE OR TO PUT IT SIMPLY A GENERAL PURPOSE CONTAINER FOR STORING.
What does C store? >>A = [1 2 3] >>B = [1 1] >>[C,D]=meshgrid(A,B) c) a) d) b)
A Simple Guide to Using SPSS ( Statistical Package for the Social Sciences) for Windows.
Lecture 26: Reusable Methods: Enviable Sloth. Creating Function M-files User defined functions are stored as M- files To use them, they must be in the.
MICROSOFT WORD 2010 Lesson 6: Word Templates. The goal of this lesson is for the students to successfully create and work with templates. The student.
Lesson 4.  After a table has been created, you may need to modify it. You can make many changes to a table—or other database object—using its property.
Chapter 3 Automating Your Work. It is frustrating when you have to type the same passage of text repeatedly. For example your name and address. Word includes.
Overview Excel is a spreadsheet, a grid made from columns and rows. It is a software program that can make number manipulation easy and somewhat painless.
XP 1 ﴀ New Perspectives on Microsoft Office 2003, Premium Edition Excel Tutorial 2 Microsoft Office Excel 2003 Tutorial 2 – Working With Formulas and Functions.
Chapter 8 Arrays. A First Book of ANSI C, Fourth Edition2 Introduction Atomic variable: variable whose value cannot be further subdivided into a built-in.
DAY 18: MICROSOFT ACCESS – CHAPTER 3 CONTD. Akhila Kondai October 21, 2013.
T U T O R I A L  2009 Pearson Education, Inc. All rights reserved Student Grades Application Introducing Two-Dimensional Arrays and RadioButton.
Use SPSS for solving the problems Lecture#21. Opening SPSS The default window will have the data editor There are two sheets in the window: 1. Data view2.
Copyright © 2008 Pearson Prentice Hall. All rights reserved. 11 Copyright © 2008 Prentice-Hall. All rights reserved. Committed to Shaping the Next Generation.
Microsoft Access Prepared by the Academic Faculty Members of IT.
Arrays What is an array… –A data structure that holds a set of homogenous elements (of the same type) –Associate a set of numbers with a single variable.
1 EPIB 698C Lecture 1 Instructor: Raul Cruz-Cano
Lecture 11 Introduction to R and Accessing USGS Data from Web Services Jeffery S. Horsburgh Hydroinformatics Fall 2013 This work was funded by National.
Review > x[-c(1,4,6)] > Y[1:3,2:8] > island.data fishData$weight[1] > fishData[fishData$weight < 20 & fishData$condition.
Review > plot(x=primates$Bodywt, y=primates$Brainwt, xlim=c(0,300), ylim=c(0,1400), cex=2, pch=21, col="black", bg="salmon", xlab = "Body weight (kg)",
Working with data in R 2 Fish 552: Lecture 3. Recommended Reading An Introduction to R (R Development Core Team) –
Review > system.time(unique(temp)) > merge(station1, station2, by.x="time1", by.y="time2") > match(1:10, c(1,3,5,9)) > as.Date('9/22/1983', format = '%m/%d/%Y')
Data Manipulation in R Fish 552: Lecture 6. Recommended reading Data Manipulation with R (Phil Specter, 2006)
N5 Databases Notes Information Systems Design & Development: Structures and links.
C++ Memory Management – Homework Exercises
EET 2259 Unit 13 Strings and File I/O
Lecture 19 Strings and Regular Expressions
Other Kinds of Arrays Chapter 11
Introduction to R Studio
Uploading and handling databases
CSCI N317 Computation for Scientific Applications Unit R
Spreadsheets, Modelling & Databases
EET 2259 Unit 13 Strings and File I/O
Chapter 17 JavaScript Arrays
Chapter 1: Creating a Program.
Presentation transcript:

Review > unique(plates) > is.numeric(plates) > cut(ages, breaks=c(0,18,65,Inf), labels=c("Kid","Adult","Senior")) > letters > month.name > c(Inf, NA, NaN, NULL) > subset(x=x, subset=b>7, select=a) > apply(X=m, MARGIN=1, FUN=quantile, c(0.05,0.5, 0.95)) > rowMeans(data) > tapply(X=lengths, INDEX=genders, FUN=mean) > sort(x) > order(x) All of the unique elements in vector Undefined characters Subsetting data frames Apply function to rows or columns Apply a function to groups (INDEX) of data within a vector Family of functions to test types in R Convert to factor Letters of the alphabet: built-in constant Months: built-in constant Mean of each row Sort a vector Indices of a sorted vector

Lecture 6 Data manipulation II Trevor A. Branch FISH 552 Introduction to R

Speed testing system.time() number of seconds to run a command > temp <- sample(1:100, size= , replace=T) > system.time( unique(temp) ) user system elapsed > system.time( temp[!duplicated(temp)] ) user system elapsed > system.time( temp[which(!duplicated(temp))] ) user system elapsed > system.time( as.numeric(levels(factor(temp))) ) user system elapsed Slower than using unique Should be slower, but actually is faster??? MUCH slower 10,000,000-long vector of numbers

Recommended reading Data manipulation in R (Phil Spector, 2008) – – Chapters 4.1, 8

Combining data sources We often want to combine two source of data by merging a common variable or observation – e.g. Data measured by different instruments at overlapping times but on different time scales {0,5,10,15,…} and {0,10,20,…} Such tasks can be difficult to code using only logical operations, rbind(), and cbind(). Instead, merge() combines data very effectively Explore the help function ?merge

merge() Two sources of data, station1 and station2 with overlapping time measurements > station1 time1 data [1,] [2,] [3,] [4,] [98,] [99,] [100,] > station2 time2 category [1,] 0 1 [2,] 5 2 [3,] 10 2 [4,] [19,] 90 1 [20,] 95 3 [21,] 100 3

Merge using common time Create a single data set with both variables having common time observations > merge(station1, station2, by.x="time1", by.y="time2") time1 data category

Behind the scene of merge() The merge() function relies on other R functions that can be used for data manipulation Finding common elements in vectors: intersect() > intersect(1:10, 7:20) [1] Matching positions of common elements in vectors match(x, table, nomatch=NA) > match(1:10, c(1,3,5,9)) [1] 1 NA 2 NA 3 NA NA NA 4 NA

Hands-on exercise 1 Write code that returns a TRUE/FALSE vector indicating whether the elements in one vector X match ( TRUE ) any of the elements in another vector Y. Test it on X <- 1:10 and Y <- c(1,3,5,9) – Hint: use the match() function. Look up help on match(). How can you change the results of match to TRUE / FALSE ? If you have time, try to solve the problem in another way and use system.time() to compare the speed of the different solutions, using: x < y <- sample(x, size= )

How to store and handle dates R has a built-in class "Date" to handle data entered as a date in various formats The as.Date() function allows a variety of input formats through the format = argument > as.Date("2013/10/15") [1] " "

Formatting dates in R If your input dates are not in a standard format, you can add a format string, as follows: CodeValue %d Day of the month (decimal number) %m Month (decimal number) %b Month (abbreviated) %B Month (full name) %y Year (two digits) %Y Year (four digits)

Using date formats > as.Date('9/22/1983', format = '%m/%d/%Y') [1] " " > as.Date('September 22, 1983', format = '%B %d, %Y') [1] " " > as.Date('22SEP83', format = '%d%b%y') [1] " " > as.Date('22sep83', format = '%d%b%y') [1] " " Gracefully handles upper/lower case R function toupper(x) converts to upper case

Extracting date components Components of dates can easily be extracted, provided the items are of class PosIXt or Date > weekdays(as.Date("2013/10/15")) [1] "Tuesday" > months(as.Date("2013/10/15")) [1] "October" > quarters(as.Date("2013/10/15")) [1] "Q4" > julian(as.Date("2013/10/15"), origin=as.Date("2013/01/01")) [1] 287 attr(,"origin") [1] " " Number of days from the start of the year

Sub-daily time scales Most dates read from instruments have a finer time scale than days (hours, minutes, seconds) For these, use the POSIXct class in R (and not the Date class) Default input format the POSIXct class consists of the year, month, day (separated by slashes or dashes), time values may be followed by white space and a time in the form hours:minutes:seconds or hours:minutes: – 1983/9/22 23:20:05 – If the dates are not in this format, see help on strptime()

Converting to POSIXt > as.POSIXlt(" :20:05") [1] " :20:05" > aDate <- as.POSIXlt(" :20:05") > as.POSIXct(" :20:05") [1] " :20:05 PDT" > aDate <- as.POSIXct(" :20:05") POSIXct includes time zone information

Adding and averaging dates Many common functions can accept objects of Date class: min(), mean(), max(),... The difftime() function computes the difference between two time dates > mean(c(as.Date("2013/10/15"), as.Date("2010/06/14"))) [1] " " > max(c(as.Date("2013/10/15"), as.Date("2010/06/15"))) [1] " " > min(c(as.Date("2013/10/15"), as.Date("2010/06/15"))) [1] " " > difftime(as.Date("2013/10/15"),as.Date("2010/06/14")) Time difference of 1219 days

Converting dates to factors First create a vector of the days of the year > everyday <- seq(from=as.Date(" "), to=as.Date(" "), by="day") > everyday [1] " " " " " "... Then convert to factors > month <- months(everyday) > month <- factor(month, levels=unique(month), ordered=TRUE) > table(month) month January February March Number of occurrences of each item

Hands-on exercise 2 Choose any two dates – Save them as two objects in the year-month-date format – Apply R functions to determine the day of the week the month the difference between the two dates

Packages in R In the homework, you loaded the MASS package – This contains functions and data sets from Venables & Ripley “Modern applied statistics with R” Packages are a key feature in R, allowing users to benefit from other’s contributions Before performing any truly arduous programming task, ask yourself whether someone else is likely to have already done that Search for contributed packages that might already have the features you need

already installed packages install a new package Click on the name of the package to get a list of its functions Checking the box loads the package

Installing and loading packages Once a package has been installed, it needs to be loaded into R This can be done by ticking the box next to the package In your R code, loading is done using one of – library(package) forces package to load every time – require(package) only loads package if not already loaded (I always use this version) R will return a warning if the package was compiled on a newer R version or if the version of R is incompatible with an older package

Finding packages Task views that categorize packages into groupshttp://cran.r-project.org/web/views/ Ask other people Active community on Twitter use hashtag #Rstats Search engine (also try Scientific papers describing new methods, e.g. bathymetry plotting package marmap : Pante E & Simon-Bouhet B (2013) PLOS ONE 8(9):e73051 Start with the most popular 100 downloaded packages: packages-for-2013-jan-may/ packages-for-2013-jan-may/

Help on packages Click on the package for a list of functions You can also find vignettes online, for example Many packages have an overview called a vignette that includes examples of the key functions > vignette(all=FALSE) > vignette(all=TRUE) > vignette("googleVis") Lists vignettes for installed and attached packages Lists vignettes for all installed packages Open a particular vignette

Hands-on exercise 3 Data: birthdays in the class – Convert the birthdays into data (use year = 2015) – Write R code to test if there are duplicated birthdays – Find the shortest gap between birthdays – Find the longest gap between birthdays Hints: – Start with 5-10 birthdays, get that working first – Have two test cases: one with a duplicate, one without a duplicate Very advanced: write code to estimate the probability of duplicated birthdays in a class of size N