University of Illinois at Urbana-Champaign National Center for Supercomputing Applications Cyberinfrastructure Challenges for Environmental Observatories.

Slides:



Advertisements
Similar presentations
Grand Challenges Hydrologic Sciences: Closing the water balance Social Sciences: People, institutions, and their water decisions Engineering: Integration.
Advertisements

Corpus Christi Bay Observatory Testbed Source: David Maidment, Univ. of Texas.
NG-CHC Northern Gulf Coastal Hazards Collaboratory Simulation Experiment Integration Sandra Harper 1, Manil Maskey 1, Sara Graves 1, Sabin Basyal 1, Jian.
C HESAPEAKE R ESEARCH C ONSORTIUM Tom Gross (Chesapeake Community Modeling Program) J OHNS H OPKINS U NIVERSITY Bill Ball (Dept. of Geography & Environmental.
ASCR Data Science Centers Infrastructure Demonstration S. Canon, N. Desai, M. Ernst, K. Kleese-Van Dam, G. Shipman, B. Tierney.
ICEWATER: INRA Constellation of Experimental Watersheds Cyberinfrastructure to Support Publication of Water Resources Data Jeffery S. Horsburgh, Utah State.
1 Cyberinfrastructure Framework for 21st Century Science & Engineering (CF21) IRNC Kick-Off Workshop July 13,
Linking HIS and GIS How to support the objective, transparent and robust calculation and publication of SWSI? Jeffery S. Horsburgh CUAHSI HIS Sharing hydrologic.
Components of an Integrated Environmental Observatory Information System Cyberinfrastructure to Support Publication of Water Resources Data Jeffery S.
This work was funded by the U.S. National Science Foundation under grant EAR Any opinions, findings and conclusions or recommendations expressed.
Time Series Analyst An Internet Based Application for Viewing and Analyzing Environmental Time Series Jeffery S. Horsburgh Utah State University David.
Development of a Community Hydrologic Information System Jeffery S. Horsburgh Utah State University David G. Tarboton Utah State University.
Two NSF Data Services Projects Rick Hooper, President Consortium of Universities for the Advancement of Hydrologic Science, Inc.
Integrating Historical and Realtime Monitoring Data into an Internet Based Watershed Information System for the Bear River Basin Jeff Horsburgh David Stevens,
Deployment and Evaluation of an Observations Data Model Jeffery S Horsburgh David G Tarboton Ilya Zaslavsky David R. Maidment David Valentine
SAN DIEGO SUPERCOMPUTER CENTER Developing a CUAHSI HIS Data Node, as part of Cyberinfrastructure for the Hydrologic Sciences David Valentine Ilya Zaslavsky.
University of Illinois at Urbana-Champaign National Center for Supercomputing Applications An Integrated Environmental Observatory Cyberenvironment Barbara.
Tools for Publishing Environmental Observations on the Internet Justin Berger, Undergraduate Researcher Jeff Horsburgh, Faculty Mentor David Tarboton,
An Environmental Information System for Hypoxia in Corpus Christi Bay: A WATERS Network Testbed Paul Montagna, Texas A&M University Corpus Christi Barbara.
LÊ QU Ố C HUY ID: QLU OUTLINE  What is data mining ?  Major issues in data mining 2.
The WATERS (WATer and Environmental Research Systems Network) Network: A Joint CLEANER and CUAHSI Venture Barbara Minsker, U of Illinois, Urbana, IL David.
About CUAHSI The Consortium of Universities for the Advancement of Hydrologic Science, Inc. (CUAHSI) is an organization representing 120+ universities.
THEME[ENV ]: Inter-operable integration of shared Earth Observation in the Global Context Duration: Sept. 1, 2011 – Aug. 31, 2014 Total EC.
HydroView WATERS Network and related NSF observatory initiatives: Transformative facilities for environmental research, education, and outreach Nicholas.
The Natural Resources Digital Library Needs, Partners, and Challenges Bonnie Avery, Janine Salwasser, & Janet Webster Oregon State University.
U.S. Department of the Interior U.S. Geological Survey CDI Data Management Working Group December 12, 2011 Sally Holl, USGS Texas Water Science Center.
THEME[ENV ]: Inter-operable integration of shared Earth Observation in the Global Context Duration: Sept. 1, 2011 – Aug. 31, 2014 Total EC.
CI Days: Planning Your Campus Cyberinfrastructure Strategy Russ Hobby, Internet2 Internet2 Member Meeting 9 October 2007.
material assembled from the web pages at
Ohio State University Department of Computer Science and Engineering 1 Cyberinfrastructure for Coastal Forecasting and Change Analysis Gagan Agrawal Hakan.
National Center for Supercomputing Applications University of Illinois at Urbana-Champaign Dynamic Virtual Observatories James Myers, Luigi Marini, Rob.
CUAHSI Hydrologic Information Systems. HIS Project Team Yao Liang John Helly Project co-PI Collaborator.
2005 Materials Computation Center External Board Meeting The Materials Computation Center Duane D. Johnson and Richard M. Martin (PIs) Funded by NSF DMR.
Peter Bajcsy, Rob Kooper, Luigi Marini, Barbara Minsker and Jim Myers National Center for Supercomputing Applications (NCSA) University of Illinois at.
MAEviz as a MAE/NCSA Cyberenvironment Partnership Jim Myers Associate Director NCSA Cyberenvironments.
A framework to support collaborative Velo: Knowledge Management for Collaborative (Science | Biology) Projects A framework to support collaborative 1.
What is Cyberinfrastructure? Russ Hobby, Internet2 Clemson University CI Days 20 May 2008.
OOI CI LCA REVIEW August 2010 Ocean Observatories Initiative OOI Cyberinfrastructure Architecture Overview Michael Meisinger Life Cycle Architecture Review.
Chad Berkley NCEAS National Center for Ecological Analysis and Synthesis (NCEAS), University of California Santa Barbara Long Term Ecological Research.
CBEO:N Chesapeake Bay Environmental Observatory as a Network Node About CBEO The mission of the CBEO project is development of a Chesapeake Bay Environmental.
CUAHSI: A University Consortium for Hydrologic Science Richard P. Hooper, Executive Director Consortium of Universities for the Advancement of Hydrologic.
Research Design for Collaborative Computational Approaches and Scientific Workflows Deana Pennington January 8, 2007.
CUAHSI Hydrologic Information System and its role in Hydrologic Observatories Core Team: D. Maidment, J. Helly, P. Kumar, M. Piasecki, R. Hooper Collaborators:
Geosciences - Observations (Bob Wilhelmson) The geosciences in NSF’s world consists of atmospheric science, ocean science, and earth science Many of the.
National Center for Supercomputing Applications Barbara S. Minsker, Ph.D. Associate Professor National Center for Supercomputing Applications and Department.
Presented by Scientific Annotation Middleware Software infrastructure to support rich scientific records and the processes that produce them Jens Schwidder.
GEON2 and OpenEarth Framework (OEF) Bradley Wallet School of Geology and Geophysics, University of Oklahoma
Cyberinfrastructure What is it? Russ Hobby Internet2 Joint Techs, 18 July 2007.
Presented by Jens Schwidder Tara D. Gibson James D. Myers Computing & Computational Sciences Directorate Oak Ridge National Laboratory Scientific Annotation.
Hydrologic Observatory Review Committee Fred Ogden Nancy Grimm Larry Murdoch Jim Butler Markus Hilpert David Hyndman Steve Jennings Lev Kavvas Jeff Talley.
Breakout # 1 – Data Collecting and Making It Available Data definition “ Any information that [environmental] researchers need to accomplish their tasks”
GEOSCIENCE NEEDS & CHALLENGES Dogan Seber San Diego Supercomputer Center University of California, San Diego, USA.
CUAHSI SURVEY RESULTS AT VIRGINIA TECH Nimmy Ravindran and Yao Liang Dept. of Electrical and Computer Engineering Virginia Tech.
Cyberinfrastructure Overview Russ Hobby, Internet2 ECSU CI Days 4 January 2008.
Cyberinfrastructure: Many Things to Many People Russ Hobby Program Manager Internet2.
CUAHSI HIS: Science Challenges Linking small integrated research sites (
Scientific Workflows for the Sensor Web ICT for Earth Observation Anwar Vahed.
Cyberinfrastructure Overview of Demos Townsville, AU 28 – 31 March 2006 CREON/GLEON.
Data Management: Data Processing Types of Data Processing at USGS There are several ways to classify Data Processing activities at USGS, and here are some.
The Virtual Observatory and Ecological Informatics System (VOEIS): Using RESTful architecture and an extensible data model to provide a unique data management.
1 CUAHSI Web Services and Hydrologic Information Systems By David R. Maidment, University of Texas at Austin Collaborators: Ilya Zaslavsky and Reza Wahadj,
The Bear River Watershed Information System Jeffery S. Horsburgh Utah Water Research Laboratory Utah State University David.
The CUAHSI Hydrologic Information System Spatial Data Publication Platform David Tarboton, Jeff Horsburgh, David Maidment, Dan Ames, Jon Goodall, Richard.
Sharing Hydrologic Data with the CUAHSI* Hydrologic Information System
Strategies for NIS Development
Joslynn Lee – Data Science Educator
The CUAHSI Hydrologic Information System and NHD Plus A Services Oriented Architecture for Water Resources Data David G Tarboton David R. Maidment (PI)
CUAHSI-Hydrologic Information Systems
CUAHSI-Hydrologic Information Systems
ESRI User Conference Water Resources User Group, July 15, 2010
Presentation transcript:

University of Illinois at Urbana-Champaign National Center for Supercomputing Applications Cyberinfrastructure Challenges for Environmental Observatories Barbara Minsker Director, Environmental Engineering, Science, & Hydrology Group, National Center for Supercomputing Applications; Professor, Dept of Civil & Environ. Engineering; University of Illinois, Urbana, IL, USA January 9, 2007

National Center for Supercomputing Applications Background NSF Office of Cyberinfrastructure is funding NCSA and SDSC to: –Work with leading edge communities to develop cyberinfrastructure to support science and engineering –Incorporate successful prototypes into a persistent cyberinfrastructure NCSA runs the CLEANER Project Office, which is leading planning for the WATERS Network, one of 3 NSF proposed environmental observatories –Co-Directors: Barbara Minsker, Jerald Schnoor (U of Iowa), Chuck Haas (Drexel U) To support WATERS planning, NCSA’s Environmental CyberInfrastructure Demonstrator (ECID) project is creating a prototype CI –Driven by requirements gathering and close community collaborations

WATERS Network WATer and Environmental Research Systems Network  Joint collaboration between the CLEANER Project Office and CUAHSI, Inc, sponsored by ENG & GEO Directorates at the National Science Foundation (NSF) CLEANER = Collaborative Large Scale Engineering Analysis Network for Environmental Research CLEANER = Collaborative Large Scale Engineering Analysis Network for Environmental Research CUAHSI = Consortium of Universities for the Advancement of Hydrologic Science CUAHSI = Consortium of Universities for the Advancement of Hydrologic Science  Planning underway to build a nationwide environmental observatory network using NSF’s Major Research Equipment and Facility Construction (MREFC) funding Target construction date: 2011 Target construction date: 2011 Target operation date: 2015 Target operation date: 2015

WATERS DRAFT VISION The WATERS Network will transform our understanding of the Earth’s water and related biogeochemical cycles across multiple spatial and temporal scales to enable forecasting and management of critical water processes affected by human activities.

WATERS DRAFT GRAND CHALLENGES To detect the interactions of human activities and natural perturbations with the quantity, distribution and quality of water in real time. To predict the patterns and variability of processes affecting the quantity and quality of water at scales from local to continental. To achieve optimal management of water resources through the use of institutional and economic instruments.

Network Design Principles: Enable multi-scale, dynamic predictive modeling for water, sediment, and water quality (flux, flow paths, rates), including: Near-real-time assimilation of data Feedback for observatory design Point- to national-scale prediction Network provides data sets and framework to test: Sufficiency of the data Alternative model conceptualizations Master Design Variables: Scale Climate (arid vs humid) Coastal vs inland Land use, land cover, population density Energy and materials/industry Land form and geology Nested (where appropriate) Observatories over Range of Scales: Point Plot (100 m 2 ) Subcatchment (2 km 2 ) Catchment (10 km 2 ) – single land use Watershed (100–10,000 km 2 ) – mixed use Basin (10,000–100,000 km 2 ) Continental Environmental Field Facilities (EFFs) Observatory Scale

National Center for Supercomputing Applications CI Requirements Gathering Interviews at conferences and meetings (Tom Finholt and staff, U. of Michigan) Usability studies (NCSA, Wentling group) Community survey (Finholt group) –AEESP and CUAHSI surveyed in 2006 as proxies for environmental engineering and hydrology communities –313 responses out of 600 surveys mailed (52.2% response rate) –Key findings are driving ECID cyberenvironment development

National Center for Supercomputing Applications What is the single most important obstacle to using data from different sources? 55% concerned about insufficient credit for shared data N=278 Nonstandard/ inconsistent units/formats Metadata problems Other obstacles

National Center for Supercomputing Applications What three software packages do you use most frequently in your work? *Other: MS Word MS PowerPoint Statistics applications (e.g., Stata, R, S-Plus) SigmaPlot PHREEQC MathCAD FORTRAN compiler Mathematica GRASS GIS Groundwater models Modflow Majority are not using high- end computational tools.

National Center for Supercomputing Applications Factors influencing technology adoption Ease of use, good support, and new capabilities are essential.

National Center for Supercomputing Applications What are the three most compelling factors that would lead you to collaborate with another person in your field? Community seeks collaborations to gain different expertise.

National Center for Supercomputing Applications WATERS CI Challenges Clearly, the first requirement for observatory CI is that the community must gain access to observatory data However, simply delivering the data through a Web portal is not going to allow the observatories to reach their full potential and meet the community’s requirements

National Center for Supercomputing Applications WATERS CI Challenges, Cont’d. Understanding data quality and getting credit for data sharing requires an integrated provenance system to track what has been done with the data Enabling users who do not have strong computational skills to work with the flood of environmental data requires: –Easy-to-use tools for manipulating large data sets, analyzing them, and assimilating them into models –Workflow integrators that allow users to integrate their tools and models with real-time streaming environmental data The vast community of observatory users & the resources they generate create a need for knowledge networking tools to help them find collaborators, data, workflows, publications, etc. To address these requirements, cyberenvironments are needed

National Center for Supercomputing Applications Environmental CI Architecture: Research Services Create Hypo- thesis Obtain Data Analyze Data &/or Assimilate into Model(s) Link &/or Run Analyses &/or Model(s) Discuss Results Publish Knowledge Services Data Services Workflows & Model Services Meta- Workflows Collaboration Services Digital Library Research Process Supporting Technology Integrated CI ECID Project Focus: Cyberenvironments HIS Project Focus

National Center for Supercomputing Applications Cyberenvironments Couple traditional desktop computing environments coupled with the resources and capabilities of a national cyberinfrastructure Provide unprecedented ability to access, integrate, automate, and manage complex, collaborative projects across disciplinary and geographical boundaries. ECID is demonstrating how cyberenvironments can: –Support observatory sensor and event management, workflow and scientific analyses, and knowledge networking, including provenance information to track data from creation to publication. –Provide collaborative environments where scientists, educators, and practitioners can acquire, share, and discuss data and information. The cyberenvironments are designed with a flexible, service-oriented architecture, so that different components can be substituted with ease

National Center for Supercomputing Applications ECID CyberEnvironment Components CyberCollaboratory: Collaborative Portal CyberIntegrator: Exploratory Workflow Integration CI:KNOW: Network Browser/ Recommender Tupelo Metadata Services Community Event Management/Processing SSO Single Sign-On Security (coming) CUAHSI HIS Data Services

National Center for Supercomputing Applications CyberIntegrator Studying complex environmental systems requires: –Coupling analyses and models –Real-time, automated updating of analyses and modeling with diverse tools CyberIntegrator is a prototype workflow executor technology to support exploratory modeling and analysis of complex systems. Integrates the following tools to date: –Excel –IM2Learn image processing and mining tools, including ArcGIS image loading –D2K data mining –Java codes, including event management tools Matlab & Fortran codes to be added soon. Additional tools will be included based on high priority needs of beta users.

National Center for Supercomputing Applications CyberIntegrator Architecture Example of CyberIntegrator Use: Carrie Gibson created a fecal coliform prediction model in ArcGIS using Model Builder that predicts annual average concentrations. Ernest To rewrote the model as a macro in Excel to perform Monte Carlo simulation to predict median and 90th percentile values. CyberIntegrator’s goal: Reduce manual labor in linking these tools, visualizing the results, and updating in real time.

National Center for Supercomputing Applications Real-Time Simulation of Copano Bay TMDL with CyberIntegrator CyberIntegrator Streamflows to Distributions (Excel) USGS Daily Streamflows (web services) Fecal Coliform Concentrations Model (Excel) Load Shapefiles (Im2Learn) Shapefiles For Copano Bay call data Geo-reference and Visualize Results (Im2Learn) Excel ExecutorIm2Learn Executor

National Center for Supercomputing Applications Sensor Anomaly Detection Scenario CC Bay Sensor Monitor Page CyberIntegrator Dashboard Sensor data Anomalies Listens for data events & creates event when anomaly discovered. Anomaly Detector 1 Anomaly Detector 2 Anomalies Sensor Data Shares workflow to server Event Manager CCBay Sensor Map User subscribes to anomaly detector workflows CI-KNOW Network CyberIntegrator loads recommended workflow. User adjusts parameters to CCBay Sensor. Sensor map shows nearby related sensors so user can check data. Anomaly detector is faulty. CI-KNOW recommends alternate anomaly detector from Chesapeake Bay observatory. Alerts user to anomaly detection, along with other events (logged-in users, new documents, etc.)

National Center for Supercomputing Applications Cyberenvironment Technologies Workflow Publication/ Retrieval Web Services Raw Data JMS JMS Broker (ActiveMQ 4.0.1) Anomaly Subscription JMS Data and Anomaly Subscriptions JMS CyberDashboard Desktop Application CyberCollaboratory CI-KNOW Recommender Network Web Service SOAP Workflow Reference URL CyberIntegrator Data Subscriptions JMS Anomaly Publication JMS Workflow Service CyberIntegrator Workflow SOAP Semantic Content Provenance Event Topics Workflow Templates User Subscriptions Tupelo ECID Managed Data/Metadata Sensor Page Reference URL Metadata Anomalies Data RDBMS

National Center for Supercomputing Applications ECID & Corpus Christi Bay (CCBay) WATERS Observatory Testbed CCBay WATERS Observatory Testbed is one of 10 observatory testbeds recently funded by NSF –Collaboration of environmental engineering, hydrology, biology, and information technology researchers Goal of the testbed: –Integrate ECID and HIS technology to create end-to- end environmental information system –Use the technology to study hypoxia in CCBay Use real-time data streams from diverse monitoring systems to predict hypoxia one day ahead Mobilize manual sampling crews when conditions are right

National Center for Supercomputing Applications Sensors in Corpus Christi Bay Montagna stations SERF stations TCOON stations USGS gages TCEQ stations Hypoxic Regions NCDC station National Datasets (National HIS)Regional Datasets (Workgroup HIS) USGSNCDCTCOONDr. Paul MontagnaTCEQSERF

National Center for Supercomputing Applications CCBay Environmental Information System Dashboard Alert Anomaly Detector Hypoxia Predictor Event- Triggered Workflow Execution Event- driven Research Storage for Later Research CyberIntegrator: Forecast CyberCollaboratory: Contact Collaborators CCBay Sensors

National Center for Supercomputing Applications CCBay Near-Real-Time Hypoxia Prediction Data Archive Hypoxia Machine Learning Models Anomaly Detection Replace or Remove Errors Update Boundary Condition Models Hypoxia Model Integrator Hydrodynamic Model Visualize Hydrodynamics Water Quality Model Sensor net Visualize Hypoxia Risk D2K workflows Fortran numerical models IM2Learn workflows C++ code

National Center for Supercomputing Applications CCBay CI Challenges Automating QA/QC in a real-time network –David Hill is creating sensor anomaly detectors using statistical models (autoregressive models using naïve, clustering, perceptron, and artificial neural network approaches; and multi-sensor models using dynamic Bayesian networks) –While statistical models can identify anomalies, it is sometimes difficult to differentiate sensor errors from unusual environmental phenomena Getting access to the data, which are collected by different groups, stored in multiple formats in different locations –The project is defining a common data dictionary and units and will build Web services to translate

National Center for Supercomputing Applications CCBay CI Challenges, Contd. Integrating data into diverse models –Calibration uses historical data, typically done by hand –Near-real-time updating needs automated approaches –Models are complex and derivative-based calibration approaches would be difficult to implement Model integration –Grids change from one type of model to another – defining a common coarse grid, with finer grids overlaid where needed –Data transformers must be built between models

National Center for Supercomputing Applications Conclusions Creating CI for environmental data is challenging but the benefits in enabling larger-scale, near-real-time research will be enormous The ECID Cyberenvironment demonstrates the benefits of end-to-end integration of cyberinfrastructure and desktop tools, including: –HIS-type data services –Workflow –Event management –Provenance and knowledge management, and –Collaboration for supporting environmental researchers, educators, and outreach partners This creates a powerful system for linking observatory operations with flexible, investigator-driven research in a community framework (i.e., the national network). –Workflow and knowledge management support testing hypotheses across observatories –Provenance supports QA/QC and rewards for community contributions in an automated fashion.

National Center for Supercomputing Applications Acknowledgments Contributors: –NCSA ECID team (Peter Bajcsy, Noshir Contractor, Steve Downey, Joe Futrelle, Hank Green, Rob Kooper, Yong Liu, Luigi Marini, Jim Myers, Mary Pietrowicz, Tim Wentling, York Yao, Inna Zharnitsky) –Corpus Christi Bay Testbed team (PIs: Jim Bonner, Ben Hodges, David Maidment, Barbara Minsker, Paul Montagna) Funding sources: –NSF grants BES , BES , and SCI –Office of Naval Research grant N