Review > system.time(unique(temp)) > merge(station1, station2, by.x="time1", by.y="time2") > match(1:10, c(1,3,5,9)) > as.Date('9/22/1983', format = '%m/%d/%Y')

Slides:



Advertisements
Similar presentations
Access 2007 ® Use Databases How can Microsoft Access 2007 help you structure your database?
Advertisements

Introduction to arrays
REACH-CRC. Lookup Functions INDEX-MATCH LOOKUP Database Functions DSUM DMIN DMAX DCOUNT DAVERAGE.
Understanding the Need for Sorting Records
Chapter 5 Creating, Sorting, and Querying a Table
Chapter 8 Data Analysis. Agenda Functions –AND and OR –COUNT, COUNTA, and COUNTIF –CONCATENATE and TRIM –RANK and QUARTILE –MOD and ROW Goal Seek in decision-making.
Spreadsheets and Non- Spatial Databases Unit 4: Module 15, Lecture 2- Advanced Microsoft Excel.
Introduction to GTECH 201 Session 13. What is R? Statistics package A GNU project based on the S language Statistical environment Graphics package Programming.
5/14-5/16, 2007RSFDGrC High Frequent Value Reduct in Very Large Databases Tsau Young Lin San Jose State University, USA Jianchao Han California.
Introduction to Access. What is Access? Database tool Creates a database Good data query (lookup and analysis) ability Good entry forms Good reports Multi-user.
Introduction to Structured Query Language (SQL)
Arrays-Part 1. Objectives Declare and initialize a one-dimensional array Store data in a one-dimensional array Display the contents of a one-dimensional.
Microsoft Visual Basic 2005: Reloaded Second Edition Chapter 8 Arrays.
Chapter 7 Data Management. Agenda Database concept Import data Input and edit data Sort data Function Filter data Create range name Calculate subtotal.
1 Arrays  Arrays are objects that help us organize large amounts of information  Chapter 8 focuses on: array declaration and use passing arrays and array.
CS102--Object Oriented Programming Lecture 6: – The Arrays class – Multi-dimensional arrays Copyright © 2008 Xiaoyan Li.
Lecture 7 Sept 19, 11 Goals: two-dimensional arrays (continued) matrix operations circuit analysis using Matlab image processing – simple examples Chapter.
Lecture 6 Sept 15, 09 Goals: two-dimensional arrays matrix operations circuit analysis using Matlab image processing – simple examples.
A Guide to MySQL 7. 2 Objectives Understand, define, and drop views Recognize the benefits of using views Use a view to update data Grant and revoke users’
Programming Logic and Design Fourth Edition, Comprehensive
A Guide to SQL, Seventh Edition. Objectives Understand, create, and drop views Recognize the benefits of using views Grant and revoke user’s database.
SPSS 1: An Introduction to the Statistical Package SPSS Suzie Cro MRC Clinical Trials Unit.
Baburao Kamble (Ph.D) University of Nebraska-Lincoln Data Analysis Using R Week2: Data Structure, Types and Manipulation in R.
Advanced Tables Lesson 9. Objectives Creating a Custom Table When a table template doesn’t suit your needs, you can create a custom table in Design view.
Tutor: Prof. A. Taleb-Bendiab Contact: Telephone: +44 (0) CMPDLLM002 Research Methods Lecture 9: Quantitative.
COMPREHENSIVE Excel Tutorial 7 Using Advanced Functions, Conditional Formatting, and Filtering.
Array Processing Simple Program Design Third Edition A Step-by-Step Approach 7.
Copyright © 2007, Oracle. All rights reserved. Managing Concurrent Requests.
REACH-CRC © 2012 REACH-CRC. All Rights Reserved.FALL 2012.
Piotr Wolski Introduction to R. Topics What is R? Sample session How to install R? Minimum you have to know to work in R Data objects in R and how to.
Array Processing.
Module 3: Creating Maps. Overview Lesson 1: Creating a BizTalk Map Lesson 2: Configuring Basic Functoids Lesson 3: Configuring Advanced Functoids.
Chapter 17 Creating a Database.
 Agenda 2/20/13 o Review quiz, answer questions o Review database design exercises from 2/13 o Create relationships through “Lookup tables” o Discuss.
6 1 Lecture 8: Introduction to Structured Query Language (SQL) J. S. Chou, P.E., Ph.D.
Accuracy Chapter 5.1 Data Screening. Data Screening So, I’ve got all this data…what now? – Please note this is going to deviate from the book a bit and.
Pasewark & Pasewark 1 Access Lesson 3 Creating Queries Microsoft Office 2007: Introductory.
REACH-CRC © 2012 REACH-CRC. All Rights Reserved.FALL 2012.
You can sort Access data so you can view records in the order you want to view them, and you can filter data so you only see the records you want to see.
Introduction to R Introductions What is R? RStudio Layout Summary Statistics Your First R Graph 17 September 2014 Sherubtse Training.
Introduction to Databases. What is a database?  A database program is nothing more than an electronic version of a 3x5 card file  A database is defined.
DAY 21: MICROSOFT ACCESS – CHAPTER 5 MICROSOFT ACCESS – CHAPTER 6 MICROSOFT ACCESS – CHAPTER 7 Aliya Farheen October 29,2015.
Microsoft Access Database Creation and Management.
EGR 115 Introduction to Computing for Engineers MATLAB Basics 6: Debugging in MATLAB Monday 15 Sept 2014 EGR 115 Introduction to Computing for Engineers.
Programming with Microsoft Visual Basic 2012 Chapter 9: Arrays.
Extracting Information from an Excel List The purpose of creating a database, or list in Excel, is to be able to manipulate the data elements in ways that.
R objects  All R entities exist as objects  They can all be operated on as data  We will cover:  Vectors  Factors  Lists  Data frames  Tables 
HRP Copyright © Leland Stanford Junior University. All rights reserved. Warning: This presentation is protected by copyright law and.
Arrays Declaring arrays Passing arrays to functions Searching arrays with linear search Sorting arrays with insertion sort Multidimensional arrays Programming.
Objective Analysis ATS 315. Objective Analysis  Converting irregularly distributed data to a uniform 2-D grid.
Basics in R part 2. Variable types in R Common variable types: Numeric - numeric value: 3, 5.9, Logical - logical value: TRUE or FALSE (1 or 0)
INFORMATION TECHNOLOGY DATABASE MANAGEMENT. A database is a collection of information organized to provide efficient retrieval. The collected information.
1 Berger Jean-Baptiste
Chapter 9 Introduction to Arrays Fundamentals of Java.
Data Manipulation in Practice Fish 552: Lecture 7.
Lab 5 Arrays ► Lab 4 Exercise Review ► Array Concept ► Why Arrays? ► Array Declaration ► An Example of Array ► Exercise.
1 Introduction to R A Language and Environment for Statistical Computing, Graphics & Bioinformatics Introduction to R Lecture 3
Review > x[-c(1,4,6)] > Y[1:3,2:8] > island.data fishData$weight[1] > fishData[fishData$weight < 20 & fishData$condition.
Lecture 5 More loops Introduction to maximum likelihood estimation Trevor A. Branch FISH 553 Advanced R School of Aquatic and Fishery Sciences University.
Review > unique(plates) > is.numeric(plates) > cut(ages, breaks=c(0,18,65,Inf), labels=c("Kid","Adult","Senior")) > letters > month.name > c(Inf, NA, NaN,
Working with data in R 2 Fish 552: Lecture 3. Recommended Reading An Introduction to R (R Development Core Team) –
LESSON 8: INTRODUCTION TO ARRAYS. Lesson 8: Introduction To Arrays Objectives: Write programs that handle collections of similar items. Declare array.
Data Manipulation in R Fish 552: Lecture 6. Recommended reading Data Manipulation with R (Phil Specter, 2006)
N5 Databases Notes Information Systems Design & Development: Structures and links.
Introduction to R Carolina Salge March 29, 2017.
Microsoft Visual Basic 2005: Reloaded Second Edition
R Data Manipulation Bootstrapping
Chapter 8 Data Structures: Cell Arrays and Structures
CSCI N317 Computation for Scientific Applications Unit R
Spreadsheets, Modelling & Databases
Presentation transcript:

Review > system.time(unique(temp)) > merge(station1, station2, by.x="time1", by.y="time2") > match(1:10, c(1,3,5,9)) > as.Date('9/22/1983', format = '%m/%d/%Y') > julian(as.Date("2013/10/15"), origin=as.Date("2013/01/01")) > as.POSIXlt(" :20:05") > difftime(as.Date("2013/10/15"),as.Date("2010/06/14")) > library(package) > require(package) > vignette("googleVis") How long does a command take Number of days after origin Day and time object Difference in dates from one date to another Merging two data frames Matching items in two vectors Date format Loads a new package for use (after installing it) Loads a new package if not already loaded More advanced help for some packages

Lecture 7 Data manipulation in practice Trevor A. Branch FISH 552 Introduction to R

Reminder: help in R Searching for help help.search("logarithm") Finding function names > apropos("log") [10] "is.logical" "log" "log10" [13] "log1p" "log2" "logb" [16] "Logic" "logical" "logLik" Getting help for a function > help("log") > ?log

Reminder: data types Vector – One-dimensional – All elements must be the same type Matrix – Two-dimensional – All elements must be the same type – Some functions require matrices as inputs Data frame – Same type within a column, all columns the same length – Most commonly used for data List – Contains data of different types and different lengths – Often the return type for statistic analysis functions

Goal of today’s lecture The functions presented in previous lectures were presented individually and usually involved simplified data sources Today we will use many of these functions as a cohesive whole to process some data The California Passenger Fishery Vessel (CPFV) data from California’s central coast

CFPV data Contains information on recreational catch from fishing vessels Data from two ports: Port San Luis (Avila Beach) and Morro Bay > speciesCode <- read.csv("speciesCode.csv") > speciesData <- read.csv("speciesData.csv") > tripData <- read.csv("tripData.csv")

Task to complete Basic summaries and checks of data – quantitative aspects of the data – checks to make sure the data look “OK” – discovery of NAs and values Compile species-specific dabatases – Bocaccio rockfish (Sebastes paucispinis) – All Sebastes species (rockfishes)

What is the number one rule of data analysis? Always plot your data (Actually, always check your data first!)

Trip data Summarizing information about the whole trip > head(tripData, n=3) TripNum SimplifiedTripNum Date Year 1 cp1/en.sr cp2/en.sr cp3/en.sr Port TotalAnglers ObsAnglers 1 San Luis NA 15 2 Morro Bay NA 28 3 San Luis NA 17 ObsAngavg TotalMinutes TotalFish

When do the data begin and end Strategy: convert the dates to the date class and apply basic statistical functions > tripData$Date <- as.Date(tripData$Date) Find the start and end date > ( min.date <- min(tripData$Date) ) [1] " " > ( max.date <- max(tripData$Date) ) [1] " " How many days from the start to the end of the data > difftime(max.date, min.date) Time difference of 1209 days

What is the longest data gap? Strategy: ensure all the observations in tripData are in ascending order by date. Compute the time differences and find the maximum > tripData <- tripData[order(tripData$Date),] > diff(c(1,2,4,5,6)) [1]

In-class exercise 1 Find the maximum difference in successive dates Find the row index of this biggest gap Display rows of tripData immediately before (three rows) and after (three rows) the biggest gap

Visualizing trip dates > plot(x=tripData$Date, y=tripData$TotalMinutes/60, + type="h", ylim=c(0,5), xaxs="i", yaxs="i", + xlab="Trip date", ylab="Trip length (hr)")

Species codes The dataset speciesCode contains a coded number (used in speciesData ), the scientific name, and the common name of 590 groundfish species. > head(speciesCode, n=4) SpeciesCode Scientific Common 1 1 Eptatretus deani Black hagfish 2 2 Eptatretus stoutii Pacific hagfish 3 3 Myxine circifrons Whiteface hagfish 4 21 Lampetra tridentata Pacific lamprey

Species data The dataset speciesData is the master data set containing data about each individual fish caught > head(speciesData, n=4) ID TripNum DropNum SpeciesCode Length Weight Fate TagNum K NA NA RD NA K NA K NA

What species was caught most? Strategy: obtain the most frequent species code count from speciesData and then find the corresponding species in speciesCode > speciesCounts <- table(speciesData$SpeciesCode) > temp <- speciesCounts[which.max(speciesCounts)] > maxCode <- as.numeric(names(speciesCounts[ which.max(speciesCounts)])) > maxCode [1] 2330 > max.spp <- speciesCode[speciesCode$SpeciesCode == maxCode,] > max.spp SpeciesCode Scientific Common Sebastes mystinus Blue rockfish

Creating species-specific datasets To create a dataset about bocaccio rockfish from all the relevant data sets Strategy: subset the speciesData to only include bocaccio observations. Use the merge() function to fuse all the data sources together We will use the grep() function here, which is very useful > grep("a", c("a","b","a","c","a","d")) [1] 1 3 5

Subsetting the database Find which species code belongs to bocaccio > bocaccioRows <- grep("Bocaccio",speciesCode$Common) > speciesCode[bocaccioRows,] SpeciesCode Scientific Common Sebastes paucispinis Bocaccio Subset speciesData to observations where speciesCode is the species code for bocaccio > bocaccioCode <- speciesCode[bocaccioRows, "SpeciesCode"] > bocaccioData <- subset(speciesData, SpeciesCode==bocaccioCode) > head(bocaccioData) ID TripNum DropNum SpeciesCode Length Weight Fate TagNum NA NA RD NA RA NA

Merge the data We now want to make a single dataset that also includes information about the trip on which the fish were caught Merge the two datasets > bocTrip <- merge(bocaccioData, tripData[,-1], by.x="TripNum", by.y="SimplifiedTripNum") Remove the first column, although called TripNum, we need to match on SimplifiedTripNum Specify the by.x and by.y arguments so it knows which columns to match

Examine the resulting data > head(bocTrip, n=4) TripNum ID DropNum SpeciesCode Length Weight NA NA Fate TagNum Date Year Port TotalAnglers 1 RD NA San Luis NA 2 RA NA San Luis 31 3 RA NA Morro Bay 20 4 RD NA Morro Bay 20 ObsAnglers ObsAngavg TotalMinutes TotalFish

In-class exercise 2 1.Create a data frame by subsetting speciesData to include all rockfish species (scientific name Sebastes) using speciesCode to find the corresponding codes 2.Provide a table of the fates of rockfish by species code 3.Calculate the minimum and maximum length recorded for each rockfish species 4.*Advanced: repeat question 2, but obtain a table of fate vs. common name Useful functions include grep(), %in%, table(), names() and tapply(). Also sort(), unique().