Lecture 11 Introduction to R and Accessing USGS Data from Web Services Jeffery S. Horsburgh Hydroinformatics Fall 2013 This work was funded by National Science Foundation Grants EPS and EPS
Objectives Discover and access data from major data sources Write and execute computer code to automate difficult and repetitive data related tasks Manipulate and analyze data using code Retrieve and use data from Web services Create reproducible data visualizations
Exercise Download daily mean discharge values for the USGS gage in the Logan River above State Dam, near Logan, UT (USGS ) for the past 10 years ( through ) Create a time series plot of the data (e.g., in Excel) Calculate overall summary statistics (min, max, mean)
R Demo Retrieve the USGS discharge data using R Apply the USGS dataRetrieval package Create a plot and summary statistics using R code
USGS Data Retrieval Package for R Obtain streamflow and water quality sample data from the USGS National Water Information System (NWIS) Data access is through web services – USGS daily discharge data – USGS unit discharge values (15 minute data) – USGS water quality data – EPA STORET water quality data
How Does the Magic Work? National Water Information System Oracle Daily Values Web Service GetDVData! Query Database Format Results as WaterML XML WaterML XML dataRetrieval R package Parse WaterML into R data frame Sweet! Sites Web Service Sites Web Service
How do I find sites?
USGS Daily Streamflow Values Data Frame NameDefinitionUnits or type DateThe dateyyyy-mm-dd QThe discharge on that datem3/sec JulianThe date expressed as days starting with Jan 1, 1850days MonthMonth of the year, from 1 to 12months DayDay of the year, from 1 to 366days DecYearYear expressed as a decimalyears MonthSeqMonth sequence: an index starting with 1 at Jan, 1850months LogQln(Q)numeric iindex of days from the start of the data framedays Q7Mean discharge for the 7 days, up to day im3/sec Q30Mean discharge for 30 days, up to day im3/sec
What is R? R is a programming language and software environment for statistical computing and graphics Wide variety of statistical and graphing techniques Highly extensible Free and Open source
R R is an interpreted language and can run interactively – R statements are converted to machine instructions as they are executed – This is flexible, but slower
R Packages and Libraries Implement many common data analysis and statistical procedures Provide excellent graphics functionality Serve as a starting point for many data analysis tasks A huge community of R developers exist – it’s likely that there’s an R package for many of the tasks you commonly do
R Programming Language R defaults to a graphical user interface the presents users with a prompt for entering code Each input expression is evaluated and then a result is returned
R Graphical User Interface
Simple Mathematical Expressions in R > # Simple arithmetic [1] 2 > * 4 # Operator precedence [1] 14 > exp(1) # Basic mathematical functions are available [1] > sqrt(10) [1]
Variables in R Numeric – floating point values Boolean (True or False) Strings (character sequences) Types are determined automatically when a variable is created with the assignment “<-” operator
Variables in R > a <- 1 # Variables are defined > b <- 30 # Using the “<-” operator to set values > c <- 3.5 > a * b * c [1] 105 > A * b * c # Variable names are case sensitive Error: object 'A' not found
Vectors in R A series of numbers Created with – c() to concatenate elements or sub-vectors – rep() to repeat elements or patterns – seq() or m:n to generate sequences Most mathematical functions and operations can be applied to vectors – no looping required!
R Vectors > rep(1,10) # Repeats the number times [1] > seq(1,10) # Sequence of integers between 1 and 10 [1] > seq(5,20,by=5) # Every 5th integer from 5 to 20 [1]
Vector Operations > x <- c(2,0,0,4) # Creates a vector with elements 2,0,0,4 > y <- c(1,9,9,9) > x + y # Sums elements of 2 vectors [1] > x * 4 # Multiplies elements [1] > sqrt(x) #Function applies to each element [1] # Returns a vector
Accessing Vector Elements > x <- c(10,20,30,40,50) # Create a vector called x > x[1] # Select the first element [1] 10 > x[1] <- 300 # Set the value of an element in a vector > x [1]
Data Frames A group of related vectors The equivalent of a table in R Create from scratch using data.frame() > newDataFrame <- data.frame(height=c(150,160), weight=c(65,72)) > newDataFrame height weight
Data Frames Read into R from a text file: newDataFrame <- read.table(“table.txt”,header=TRUE) The first line of the file needs to have a name for each column (vector)
Accessing Data Frames Multiple ways to retrieve columns of data The following are all equivalent: newDataFrame[“columnName”] newDataFrame[,n] – where n is the column index newDataFrame$columnName
Lists Collections of other R objects (e.g., vectors, data frames) Created with list function newList <- list(x = 1, y = 5) Access to components follows rules similar to data frames: newList$x newList[“x”] newList[1]
R Workspaces As you create objects in R, they are added to your current workspace Use ls( ) to list your workspace contents Use rm( ) to delete objects from your workspace When you quit R, you can save the current workspace for later use and pick up where you left off
Summary R is a general purpose statistical computing environment – it is software and a language R can get data directly from the USGS using a custom package R provides a powerful environment for manipulating, analyzing, and visualizing data Coding analyses in R can make them more reproducible
References GitHUB repository with USGS R Tools: GitHUB repository with USGS dataRetrieval package: R/dataRetrievalhttps://github.com/USGS- R/dataRetrieval