Using Python to Retrieve Data from the CUAHSI HIS Web Services Jeffery S. Horsburgh Hydroinformatics Fall 2015 This work was funded by National Science Foundation Grants EPS and EPS Slides adapted from original version by Jon Goodall University of Virginia, Hydroinformatics, Fall 2014
Objectives Discover and access data from major hydrologic data sources Create reproducible data visualizations Write and execute computer code to automate difficult and repetitive data related tasks Manipulate data and transform it across file systems, flat files, databases, programming languages, etc. Retrieve and use data from Web services
Class Plan Introduction (setting this material within context) Set up – Software requirements In class demos – Example 1: Getting the site name for a USGS NWIS station using the CUAHSI HIS Web Services – Example 2: Getting the minimum streamflow over the past 5 days for a USGS NWIS station using the CUAHSI HIS Web Services – Example 3: Making a plot of the streamflow data for the past 5 days Challenge problems Wrap up
Big Picture Context 1.Data life cycle 2.Data modeling 3.Database design 4.Database implementation and ODM 5.SQL Querying of an ODM database 6.Python programming against an ODM database 7.Sharing data from an ODM using CUAHSI HIS Web services (WaterOneFlow) and WaterML 8.Accessing CUAHSI HIS Web services using HydroDesktop 9.This week: Accessing CUAHSI HIS Web services using Python
Set up …
Required Packages “suds” – a package for making requests to SOAP web services ( “pandas” – A data analysis library with high performance data structures ( “matplotlib” – A package for scientific plotting (
What is the “suds” package? “Suds is a lightweight SOAP Python client for consuming Web Services.” SOAP and WSDL are standards for creating web services. You don’t need to know the details behind these standards, but if you are interested, Wikipedia has a good summary of both: – SOAP: – WSDL: scription_Languagehttp://en.wikipedia.org/wiki/Web_Services_De scription_Language
What is the “pandas” package? pandas is an open source library providing high performance data structures Some pandas data structures you may be interested in: – Series – a one-dimensional, labeled array capable of holding any data type – axis labels are collectively referred to as the “index” – DataFrame – a 2-dimensional data structure with columns of potentially different types (essentially a high performance table object)
In class examples …
Example 1: Get the site name for a USGS NWIS gage station using the CUAHSI HIS Web Services Use the GetSiteInfoObject method on the CUAHSI HIS WaterOneFlow USGS Unit Values web service: – Use the suds Client object to call the web service method – We will use siteCode = “USGSUV: ” – Suds will automatically parse the WaterML response from the web service call. – We will need to find the siteName property in the response and print it to the console. The answer for “NWISUV: ” is: LOGAN RIVER ABOVE STATE DAM, NEAR LOGAN, UT
Example 2: Getting the minimum streamflow over the past five days for a USGS NWIS station using the CUAHSI HIS Web Services Use the GetValuesObject method on the CUAHSI HIS WaterOneFlow web services – Like example 1, use the suds Client object to call the web service method We will use siteCode = “USGSUV: and ParameterCode = USGSUV: DateTimes should be in the format “YYYY-MM-DD”. – Example GetValuesObject web service call in a browser that returns a WaterML file: 00&variable=NWISUV:00060&startDate= &endDate= &authToken= 00&variable=NWISUV:00060&startDate= &endDate= &authToken We will extract the values and dateTimes and create a pandas Series object to store the time series We will use the min() and idxmin() methods on the Series object to get the minimum streamflow and datetime when the minimum streamflow occurred. – Note: I tested this with Pandas version If you get an error, check your version of Pandas in the PyCharm Package Manager and upgrade the version of Pandas if needed.
Example 3: Create a Time Series Plot of Streamflow Values for the Past 5 Days Use the GetValuesObject method on the CUAHSI HIS WaterOneFlow web services – Like examples 1 and 2, use the suds client object to call the web service method We will use SiteCode = USGSUV: and ParameterCode = USGSUV: DateTimes should be in the format YYYY-MM-DD. We will extract the values and dateTimes and create a Pandas Series object to store the time series. We will create a “figure” object within which we can create our time series plot – We will use the plot() method on the pandas Series object to plot the time series. – Note: I tested this with Pandas version If you get an error, check your version of Pandas in the PyCharm Package Manager and upgrade the version of Pandas if needed.
Using these principles for other WaterOneFlow Web services CUAHSI HIS Central lists other available web services that can be accessed in a similar way – – There are over 400 billion observations available through HIS Central! The HydroServers you created last week with Dr. Ames can also be accessed using this same approach
Challenge problems …
Coding Challenges For the same station (NWISUV: ), modify the example 2 script so that it prints the daily min, max, and average streamflow for the past 5 days. (time permitting) Modify your script so that it prints out the min, max, and average streamflow for EACH DAY during the past 5 days this is part of what you will need to do for Assignment 8.
Wrap up …
Pros and Cons of using Web Services vs. a local ODM database Pros: – Access to the entire database on which the service is based without the need to store the data locally – No need to keep local local in sync with USGS version Cons: – Requires Internet connection – Speed: Getting data via web services will almost certainly be slower than getting data from your own database – Data is outside your control: breaking changes, unavailable services, etc. – (Some of these could be improved with a data caching strategy)
Summary You can use Python to automate the retrieval of hydrologic data via web services The “suds” package enables you to retrieve data as Python objects “pandas” has some nice data structures that make analysis and visualization easier “matplotlib” allows you to make nice plots
Thursday’s Class Introduce Assignment 7 Work on the assignment in class