Abstract OLAP Cube Visualization of Hydrologic Data Catalogs Ilya Zaslavsky a, Matthew Rodriguez a, Bora Beran b, David Valentine a, Jillian Wallis c,

Slides:



Advertisements
Similar presentations
Flux Data Server User Tutorial Deb Agarwal, Catharine van Ingen, Susan Holladay, and Misha Krassovski Berkeley Water Center (UCB, LBL), ORNL, and Microsoft.
Advertisements

Some notes on CyberGIS in hydrology Ilya Zaslavsky Spatial Information Systems Lab San Diego Supercomputer Center UCSD TeraGrid CyberGIS Workshop, February.
HydroServer A Platform for Publishing Space- Time Hydrologic Datasets Support EAR CUAHSI HIS Sharing hydrologic data Jeffery.
This work is funded by the Inland Northwest Research Alliance INRA Constellation of Experimental Watersheds: Cyberinfrastructure to Support Publication.
ICEWATER: INRA Constellation of Experimental Watersheds Cyberinfrastructure to Support Publication of Water Resources Data Jeffery S. Horsburgh, Utah State.
C van Ingen, D Agarwal, M Goode, J Gupchup, J Hunt, R Leonardson, M Rodriguez, N Li Berkeley Water Center John Hopkins University Lawrence Berkeley Laboratory.
Linking HIS and GIS How to support the objective, transparent and robust calculation and publication of SWSI? Jeffery S. Horsburgh CUAHSI HIS Sharing hydrologic.
This work is funded by National Science Foundation Grant EAR Accessing and Sharing Data Using the CUAHSI Hydrologic Information System CUAHSI HIS.
CUAHSI HIS Data Services Project David R. Maidment Director, Center for Research in Water Resources University of Texas at Austin (HIS Project Leader)
SENSORS, CYBERINFRASTRUCTURE, AND EXAMINATION OF HYDROLOGIC AND HYDROCHEMICAL RESPONSE IN THE LITTLE BEAR RIVER OBSERVATORY TEST BED Jeffery S. Horsburgh.
Components of an Integrated Environmental Observatory Information System Cyberinfrastructure to Support Publication of Water Resources Data Jeffery S.
This work was funded by the U.S. National Science Foundation under grant EAR Any opinions, findings and conclusions or recommendations expressed.
HydroServer A Platform for Publishing Space- Time Hydrologic Datasets Support EAR CUAHSI HIS Sharing hydrologic data Jeffery.
Time Series Analyst An Internet Based Application for Viewing and Analyzing Environmental Time Series Jeffery S. Horsburgh Utah State University David.
Development of a Community Hydrologic Information System Jeffery S. Horsburgh Utah State University David G. Tarboton Utah State University.
Two NSF Data Services Projects Rick Hooper, President Consortium of Universities for the Advancement of Hydrologic Science, Inc.
Using GIS in Creating an End-to- End System for Publishing Environmental Observations Data Jeffery S. Horsburgh David G. Tarboton, David R. Maidment, Ilya.
Integrating Historical and Realtime Monitoring Data into an Internet Based Watershed Information System for the Bear River Basin Jeff Horsburgh David Stevens,
Introducing the CUAHSI Hydrologic Information System Desktop Application (HydroDesktop) and Open Development Community Jiří Kadlec, Daniel Ames, Teva Velupillai.
Deployment and Evaluation of an Observations Data Model Jeffery S Horsburgh David G Tarboton Ilya Zaslavsky David R. Maidment David Valentine
SAN DIEGO SUPERCOMPUTER CENTER Developing a CUAHSI HIS Data Node, as part of Cyberinfrastructure for the Hydrologic Sciences David Valentine Ilya Zaslavsky.
CSE6011 Warehouse Models & Operators  Data Models  relations  stars & snowflakes  cubes  Operators  slice & dice  roll-up, drill down  pivoting.
An End-to-End System for Publishing Environmental Observations Data Jeffery S. Horsburgh David K. Stevens, David G. Tarboton, Nancy O. Mesner, Amber Spackman.
A Services Oriented Architecture for Water Resources Data David R. Maidment Center for Research in Water Resources University of Texas at Austin EPA Storet.
Tools for Publishing Environmental Observations on the Internet Justin Berger, Undergraduate Researcher Jeff Horsburgh, Faculty Mentor David Tarboton,
Distributed Data Analysis & Dissemination System (D-DADS) Prepared by Stefan Falke Rudolf Husar Bret Schichtel June 2000.
Online Analytical Processing (OLAP) Hweichao Lu CS157B-02 Spring 2007.
Over-allocation to irrigation Bushfire recovery impacts Expanding plantations Drying and warming climate Uncapped groundwater extraction Expanding farm.
About CUAHSI The Consortium of Universities for the Advancement of Hydrologic Science, Inc. (CUAHSI) is an organization representing 120+ universities.
Ocean Sciences What is CUAHSI? CUAHSI – Consortium of Universities for the Advancement of Hydrologic Science, Inc Formed in 2001 as a legal entity Program.
Abstract Archiving and Near Real Time Visualization of USGS Instantaneous Data Ilya Zaslavsky, David Ryan, Thomas Whitenack, David Valentine, Matthew Rodriguez.
Information Requirements for Integrating Spatially Discrete, Feature- Based Earth Observations Jeffery S. Horsburgh Anthony Aufdenkampe, Kerstin Lehnert,
Hydrologic Information System for the Nation Ilya Zaslavsky Spatial Information Systems Lab San Diego Supercomputer Center UCSD UCSD Seminar, February.
About CUAHSI The Consortium of Universities for the Advancement of Hydrologic Science, Inc. (CUAHSI) is an organization representing 120+ universities.
Hydrologic Information System for the Nation Ilya Zaslavsky Spatial Information Systems Lab San Diego Supercomputer Center UCSD Alexandria Library talk,
Water Web Services David R. Maidment Center for Research in Water Resources University of Texas at Austin Open Waters Symposium Delft, the Netherlands.
Data Interoperability in the Hydrologic Sciences The CUAHSI Hydrologic Information System David Tarboton, David Maidment, Ilya Zaslavsky, Dan Ames, Jon.
CUAHSI Hydrologic Information System an introduction Ilya Zaslavsky Director, Spatial Information Systems Lab San Diego Supercomputer Center University.
Advancing an Information Model for Environmental Observations Jeffery S. Horsburgh Anthony Aufdenkampe, Richard P. Hooper, Kerstin Lehnert, Kim Schreuders,
Publishing Observations Data: from ODM to HIS Central.
Hydrologic Information System for the Nation Ilya Zaslavsky Spatial Information Systems Lab San Diego Supercomputer Center UCSD UCSD Seminar, February.
Hydrologic Information System for the Nation Ilya Zaslavsky Spatial Information Systems Lab San Diego Supercomputer Center UCSD BOM talk, Melbourne, March.
Hydrologic Information System for the Nation I. Zaslavsky (SDSC) & The CUAHSI HIS Project his.cuahsi.org, hiscentral.cuahsi.org.
CUAHSI, WATERS and HIS by Richard P. Hooper, David G. Tarboton and David R. Maidment.
Chapter 6 SAS ® OLAP Cube Studio. Section 6.1 SAS OLAP Cube Studio Architecture.
Water and Catchment Data Services David R. Maidment Center for Research in Water Resources University of Texas at Austin River Science Symposium Swansea,
CBEO:N Chesapeake Bay Environmental Observatory as a Network Node About CBEO The mission of the CBEO project is development of a Chesapeake Bay Environmental.
CUAHSI Hydrologic Information Systems David R. Maidment Center for Research in Water Resources University of Texas at Austin and Ilya Zaslavsky, David.
The CUAHSI Hydrologic Information System Presented by Dr. Tim Whiteaker The University of Texas at Austin 22 February, 2011.
Bringing Water Data Together David R. Maidment Center for Research in Water Resources University of Texas at Austin Texas Water Summit San Antonio Tx,
CUAHSI HIS Features of Observations Data Model. NWIS ArcGIS Excel NCAR Trends NAWQA Storet NCDC Ameriflux Matlab AccessSAS Fortran Visual Basic C/C++
Abstract Analysis and Visualization of Hydrologic Data and Observations Catalogs Using the OLAP Data Cube Technology Ilya Zaslavsky a, Matthew Rodriguez.
CUAHSI Hydrologic Information Systems David R. Maidment and Ernest To Center for Research in Water Resources, University of Texas at Austin Hydrosystems.
Sharing SRP Water Sample Data Using CUAHSI HIS Infrastructure Ilya Zaslavsky, Thomas Whitenack, Keith Pezzoli, Hiram Sarabia University of California at.
Distributed Data Analysis & Dissemination System (D-DADS ) Special Interest Group on Data Integration June 2000.
The CUAHSI Observations Data Model Jeff Horsburgh David Maidment, David Tarboton, Ilya Zaslavsky, Michael Piasecki, Jon Goodall, David Valentine,
CUAHSI HIS: Science Challenges Linking small integrated research sites (
1 CUAHSI Web Services and Hydrologic Information Systems By David R. Maidment, University of Texas at Austin Collaborators: Ilya Zaslavsky and Reza Wahadj,
Hydroinformatics Lecture 15: HydroServer and HydroServer Lite The CUAHSI HIS is Supported by NSF Grant# EAR CUAHSI HIS Sharing hydrologic data.
The Bear River Watershed Information System Jeffery S. Horsburgh Utah Water Research Laboratory Utah State University David.
Developing a community hydrologic information system David G Tarboton David R. Maidment (PI) Ilya Zaslavsky Michael Piasecki Jon Goodall
The CUAHSI Hydrologic Information System Spatial Data Publication Platform David Tarboton, Jeff Horsburgh, David Maidment, Dan Ames, Jon Goodall, Richard.
Using GIS in Creating an End-to-End System for Publishing Environmental Observations Data Jeffery S. Horsburgh David G. Tarboton, David R. Maidment, Ilya.
Sharing Hydrologic Data with the CUAHSI* Hydrologic Information System
The CUAHSI Hydrologic Information System and NHD Plus A Services Oriented Architecture for Water Resources Data David G Tarboton David R. Maidment (PI)
Jeffery S. Horsburgh Utah State University
Lecture 8 Database Implementation
CUAHSI HIS Sharing hydrologic data
Datamining : Refers to extracting or mining knowledge from large amounts of data Applications : Market Analysis Fraud Detection Customer Retention Production.
HydroDesktop: A Key Component of the CUAHSI/CZO HIS for Hydrologic Data Discovery, Visualization, and Analysis Daniel P. Ames, Ph.D. P.E. Idaho State University.
Presentation transcript:

Abstract OLAP Cube Visualization of Hydrologic Data Catalogs Ilya Zaslavsky a, Matthew Rodriguez a, Bora Beran b, David Valentine a, Jillian Wallis c, and Catharine van Ingen b San Diego Supercomputer Center, UCSD, San Diego, CA a ; Microsoft Research, San Francisco, CA b, and Center for Embedded Networked Sensing, UCLA, Los Angeles, CA C About CUAHSI HIS As part of the CUAHSI Hydrologic Information System project, we assemble comprehensive observations data catalogs that support CUAHSI data discovery services and online mapping interfaces (e.g. the Data Access System for Hydrology, DASH). These catalogs describe several nation-wide data repositories that are important for hydrologists, including USGS NWIS and EPA STORET data collections. The catalogs contain a wealth of information reflecting the entire history and geography of hydrologic observations in the US. Enabling simple interactive data discovery across these catalogs is an integral step to making the data easily accessible for analysis. OLAP (Online Analytical Processing), often referred to as data cube analysis, is an approach to organizing and querying large multi-dimensional data collections. We have applied the OLAP techniques, as implemented in Microsoft SQL Server 2005, to the analysis of the catalogs from several agencies. In this initial report, we focus on the OLAP technology as applied to catalogs, and preliminary results of the analysis. The initial results are related to hydrologic data availability from the observations data catalogs. The results reflect geography and history of available data totals from USGS NWIS and EPA STORET repositories, and spatial and temporal dynamics of available measurements for several key nutrient-related parameters. The Consortium of Universities for the Advancement of Hydrologic Science, Inc. (CUAHSI) is an organization representing 120+ universities in the US and abroad. As part of its mission, CUAHSI supports the development of cyberinfrastructure for the hydrologic sciences. The CUAHSI HIS (Hydrologic Information System) project is a multi-year multi-institution effort focused on consistent management of observations data available from several federal agencies (USGS, EPA, USDA, NOAA, etc.) as well as published by individual investigators. CUAHSI HIS develops service-oriented architecture for hydrologic research and education, to enable publication, discovery, retrieval, analysis and integration of hydrologic data. The project team has defined a common information model for organizing hydrologic observation data, designed a common exchange protocol (Water Markup Language) and developed a collection of SOAP web services (WaterOneFlow services) that provide uniform access to different federal, state and local hydrologic data repositories. This system is now implemented as a collection of Hydrologic Information Servers deployed at NSF-supported Hydrologic Observatory test beds. EPA STORET Conclusion OLAP datacubes allow domain scientists to access and mine huge volumes of observations data and catalogs simply and efficiently using familiar desktop tools such as Excel PivotTables. We demonstrate how OLAP datacubes are constructed for several large observations data catalogs which contain site summary information, including site’s period of record for each parameter, or series. The datacubes are analyzed spatially and temporally. The the summary maps and diagrams of data availability by various datacube dimensions and filters appear useful to understanding the nature of the datsets. Constructing datacubes over data values as well can automatically produce what are now referred to as unique data products such as daily and yearly statistics (Mean, Minimum, Maximum, Variance) across one or more sites filtered by quality or other attribute. Future work will include construction of domain specific hierarchies (e.g. using HUC codes), and generation and analysis of watershed-specific cubes that integrate observations from multiple sources. Changes over time Some physical properties by decade: Available Data Total USGS NWIS Daily Values US Map of USGS Daily Values Series 1 degree latitude-longitude bins CUAHSI HIS Service Oriented Architecture: General Outline OLAP Technology for Catalogs Observations data catalogs assembled within CUAHSI HIS follow standard relational schema of the Observations Data Model (ODM). The ODM catalogs collections of observation series, each described by: what: observed parameter variable including metadata such as processing methodology, units, quality assessments what: observed parameter variable including metadata such as processing methodology, units, quality assessments where: location or site where the observation was made including relevant site metadata such as Hydrologic Unit or vegetation class where: location or site where the observation was made including relevant site metadata such as Hydrologic Unit or vegetation class when: start and end date of measurements and time granularity of measurement when: start and end date of measurements and time granularity of measurement OLAP (Online Analytical Processing) is an approach to rapidly analyze such large multi- dimensional databases. OLAP Datacubes organize data along dimensions, enable drilldown on hierarchies, and aggregates such as average or minimum. Using SQL Server Analysis Services, an OLAP cube can be created from any database with a star schema such as ODM. using SQL Server Analysis Services. Simple aggregates (such as daily averages or yearly maxima) can be pre-computed for improved query performance. Links CUAHSI HIS: HIS SDSC : BWC OLAP user manual: Building a Cube from a CUAHSI ODM database: Using OLAP Datacubes Bear River Sample Watershed stream gage comings and goings Daily Discharge Averaged by Year, Month, Week (Top 12 Sites Only) Annual Mean Monthly Mean Weekly Mean Drilling down time hierarchy gives overall sense of wet years and flood events Datacube enables simple temporal drilldown. These examples are for the ODM sample database from the Little Bear River region in Utah (2 sites, 12 series, 593,000 observations). The three graphs to the right plot mean annual, monthly, and weekly discharge at the dozen highest flow gages in the watershed over the years 1980 to The annual graph clearly shows the wet years - years of highest flow. The monthly and weekly graph show differences between years. For example, the high flows in 1984 occur in the middle of the year as the result of snow melt. The high flows in 1986 begin in the early part of the year before the usual snow melt continue through the summer season. Datacubes can be easily browsed with Excel PivotTables and drag-drop over the internetDatacubes can be easily browsed with Excel PivotTables and drag-drop over the internet MatLab and other rich analysis client access integration comingMatLab and other rich analysis client access integration coming Browsing can give fast overview insight into data availability and qualityBrowsing can give fast overview insight into data availability and quality Mean Period of Record Mapping Data Availability Different types of nutrients by decade These are examples from the USGS National Water Information System datacube. The cube contains 1.4 million sites and 12 million series. To the left are “maps” of data availability, by one degree cells. Note higher volumes of measurements accumulated along the coasts, on the east coast especially. Stations with the longest mean period of records are distributed across the country, with peaks in the central parts. To the right is dynamics in data availability for selected parameters. The top chart shows changes in measurement of nutrients over years: note the drop in phosphate and increase in phosphorus and orthophosphate measurements. The lower chart shows that ‘discharge’ dropped as ‘gauge height’ increased in the last two decades To the left, data availability for a parameter, discharge, for a site shows that data availability is not always continuous. Sites have disappeared, and reappeared. USDA SNOTEL The SNOTEL datacube contains 810 sites, 4552 series, and 28 million observations. The data is collected by a standard set of variables, and the number of observations increases over time. The SNOTEL datacube also contains values, so you can rapidly investigate aggregations over county, states, and the entire dataset. Total Available Data The EPA STORET datacube contains 273K sites and 2.7M series. Florida is the source of about 25% of the total records. Water Quality Data Record Counts by Site Classification Number of years of record by start decade About 60% of the water quality records are short term measurements (one year or less in duration). The starting decade of the longer series are to the right. 93% of the series are water quality data.