Download presentation
Presentation is loading. Please wait.
Published byDarlene Green Modified over 9 years ago
1
University of Illinois at Urbana-Champaign National Center for Supercomputing Applications Cyberinfrastructure Challenges for Environmental Observatories Barbara Minsker Director, Environmental Engineering, Science, & Hydrology Group, National Center for Supercomputing Applications; Professor, Dept of Civil & Environ. Engineering; University of Illinois, Urbana, IL, USA January 9, 2007
2
National Center for Supercomputing Applications Background NSF Office of Cyberinfrastructure is funding NCSA and SDSC to: –Work with leading edge communities to develop cyberinfrastructure to support science and engineering –Incorporate successful prototypes into a persistent cyberinfrastructure NCSA runs the CLEANER Project Office, which is leading planning for the WATERS Network, one of 3 NSF proposed environmental observatories –Co-Directors: Barbara Minsker, Jerald Schnoor (U of Iowa), Chuck Haas (Drexel U) To support WATERS planning, NCSA’s Environmental CyberInfrastructure Demonstrator (ECID) project is creating a prototype CI –Driven by requirements gathering and close community collaborations
3
WATERS Network WATer and Environmental Research Systems Network Joint collaboration between the CLEANER Project Office and CUAHSI, Inc, sponsored by ENG & GEO Directorates at the National Science Foundation (NSF) CLEANER = Collaborative Large Scale Engineering Analysis Network for Environmental Research CLEANER = Collaborative Large Scale Engineering Analysis Network for Environmental Research CUAHSI = Consortium of Universities for the Advancement of Hydrologic Science CUAHSI = Consortium of Universities for the Advancement of Hydrologic Science Planning underway to build a nationwide environmental observatory network using NSF’s Major Research Equipment and Facility Construction (MREFC) funding Target construction date: 2011 Target construction date: 2011 Target operation date: 2015 Target operation date: 2015
4
WATERS DRAFT VISION The WATERS Network will transform our understanding of the Earth’s water and related biogeochemical cycles across multiple spatial and temporal scales to enable forecasting and management of critical water processes affected by human activities.
5
WATERS DRAFT GRAND CHALLENGES To detect the interactions of human activities and natural perturbations with the quantity, distribution and quality of water in real time. To predict the patterns and variability of processes affecting the quantity and quality of water at scales from local to continental. To achieve optimal management of water resources through the use of institutional and economic instruments.
7
Network Design Principles: Enable multi-scale, dynamic predictive modeling for water, sediment, and water quality (flux, flow paths, rates), including: Near-real-time assimilation of data Feedback for observatory design Point- to national-scale prediction Network provides data sets and framework to test: Sufficiency of the data Alternative model conceptualizations Master Design Variables: Scale Climate (arid vs humid) Coastal vs inland Land use, land cover, population density Energy and materials/industry Land form and geology Nested (where appropriate) Observatories over Range of Scales: Point Plot (100 m 2 ) Subcatchment (2 km 2 ) Catchment (10 km 2 ) – single land use Watershed (100–10,000 km 2 ) – mixed use Basin (10,000–100,000 km 2 ) Continental Environmental Field Facilities (EFFs) Observatory Scale
9
National Center for Supercomputing Applications CI Requirements Gathering Interviews at conferences and meetings (Tom Finholt and staff, U. of Michigan) Usability studies (NCSA, Wentling group) Community survey (Finholt group) –AEESP and CUAHSI surveyed in 2006 as proxies for environmental engineering and hydrology communities –313 responses out of 600 surveys mailed (52.2% response rate) –Key findings are driving ECID cyberenvironment development
10
National Center for Supercomputing Applications What is the single most important obstacle to using data from different sources? 55% concerned about insufficient credit for shared data N=278 Nonstandard/ inconsistent units/formats Metadata problems Other obstacles
11
National Center for Supercomputing Applications What three software packages do you use most frequently in your work? *Other: MS Word MS PowerPoint Statistics applications (e.g., Stata, R, S-Plus) SigmaPlot PHREEQC MathCAD FORTRAN compiler Mathematica GRASS GIS Groundwater models Modflow Majority are not using high- end computational tools.
12
National Center for Supercomputing Applications Factors influencing technology adoption Ease of use, good support, and new capabilities are essential.
13
National Center for Supercomputing Applications What are the three most compelling factors that would lead you to collaborate with another person in your field? Community seeks collaborations to gain different expertise.
14
National Center for Supercomputing Applications WATERS CI Challenges Clearly, the first requirement for observatory CI is that the community must gain access to observatory data However, simply delivering the data through a Web portal is not going to allow the observatories to reach their full potential and meet the community’s requirements
15
National Center for Supercomputing Applications WATERS CI Challenges, Cont’d. Understanding data quality and getting credit for data sharing requires an integrated provenance system to track what has been done with the data Enabling users who do not have strong computational skills to work with the flood of environmental data requires: –Easy-to-use tools for manipulating large data sets, analyzing them, and assimilating them into models –Workflow integrators that allow users to integrate their tools and models with real-time streaming environmental data The vast community of observatory users & the resources they generate create a need for knowledge networking tools to help them find collaborators, data, workflows, publications, etc. To address these requirements, cyberenvironments are needed
16
National Center for Supercomputing Applications Environmental CI Architecture: Research Services Create Hypo- thesis Obtain Data Analyze Data &/or Assimilate into Model(s) Link &/or Run Analyses &/or Model(s) Discuss Results Publish Knowledge Services Data Services Workflows & Model Services Meta- Workflows Collaboration Services Digital Library Research Process Supporting Technology Integrated CI ECID Project Focus: Cyberenvironments HIS Project Focus
17
National Center for Supercomputing Applications Cyberenvironments Couple traditional desktop computing environments coupled with the resources and capabilities of a national cyberinfrastructure Provide unprecedented ability to access, integrate, automate, and manage complex, collaborative projects across disciplinary and geographical boundaries. ECID is demonstrating how cyberenvironments can: –Support observatory sensor and event management, workflow and scientific analyses, and knowledge networking, including provenance information to track data from creation to publication. –Provide collaborative environments where scientists, educators, and practitioners can acquire, share, and discuss data and information. The cyberenvironments are designed with a flexible, service-oriented architecture, so that different components can be substituted with ease
18
National Center for Supercomputing Applications ECID CyberEnvironment Components CyberCollaboratory: Collaborative Portal CyberIntegrator: Exploratory Workflow Integration CI:KNOW: Network Browser/ Recommender Tupelo Metadata Services Community Event Management/Processing SSO Single Sign-On Security (coming) CUAHSI HIS Data Services
19
National Center for Supercomputing Applications CyberIntegrator Studying complex environmental systems requires: –Coupling analyses and models –Real-time, automated updating of analyses and modeling with diverse tools CyberIntegrator is a prototype workflow executor technology to support exploratory modeling and analysis of complex systems. Integrates the following tools to date: –Excel –IM2Learn image processing and mining tools, including ArcGIS image loading –D2K data mining –Java codes, including event management tools Matlab & Fortran codes to be added soon. Additional tools will be included based on high priority needs of beta users.
20
National Center for Supercomputing Applications CyberIntegrator Architecture Example of CyberIntegrator Use: Carrie Gibson created a fecal coliform prediction model in ArcGIS using Model Builder that predicts annual average concentrations. Ernest To rewrote the model as a macro in Excel to perform Monte Carlo simulation to predict median and 90th percentile values. CyberIntegrator’s goal: Reduce manual labor in linking these tools, visualizing the results, and updating in real time.
21
National Center for Supercomputing Applications Real-Time Simulation of Copano Bay TMDL with CyberIntegrator CyberIntegrator Streamflows to Distributions (Excel) USGS Daily Streamflows (web services) Fecal Coliform Concentrations Model (Excel) Load Shapefiles (Im2Learn) Shapefiles For Copano Bay call data Geo-reference and Visualize Results (Im2Learn) 1 234 Excel ExecutorIm2Learn Executor
22
National Center for Supercomputing Applications Sensor Anomaly Detection Scenario CC Bay Sensor Monitor Page CyberIntegrator Dashboard Sensor data Anomalies Listens for data events & creates event when anomaly discovered. Anomaly Detector 1 Anomaly Detector 2 Anomalies Sensor Data Shares workflow to server Event Manager CCBay Sensor Map User subscribes to anomaly detector workflows CI-KNOW Network CyberIntegrator loads recommended workflow. User adjusts parameters to CCBay Sensor. Sensor map shows nearby related sensors so user can check data. Anomaly detector is faulty. CI-KNOW recommends alternate anomaly detector from Chesapeake Bay observatory. Alerts user to anomaly detection, along with other events (logged-in users, new documents, etc.)
23
National Center for Supercomputing Applications Cyberenvironment Technologies Workflow Publication/ Retrieval Web Services Raw Data JMS JMS Broker (ActiveMQ 4.0.1) Anomaly Subscription JMS Data and Anomaly Subscriptions JMS CyberDashboard Desktop Application CyberCollaboratory CI-KNOW Recommender Network Web Service SOAP Workflow Reference URL CyberIntegrator Data Subscriptions JMS Anomaly Publication JMS Workflow Service CyberIntegrator Workflow SOAP Semantic Content Provenance Event Topics Workflow Templates User Subscriptions Tupelo ECID Managed Data/Metadata Sensor Page Reference URL Metadata Anomalies Data RDBMS
24
National Center for Supercomputing Applications ECID & Corpus Christi Bay (CCBay) WATERS Observatory Testbed CCBay WATERS Observatory Testbed is one of 10 observatory testbeds recently funded by NSF –Collaboration of environmental engineering, hydrology, biology, and information technology researchers Goal of the testbed: –Integrate ECID and HIS technology to create end-to- end environmental information system –Use the technology to study hypoxia in CCBay Use real-time data streams from diverse monitoring systems to predict hypoxia one day ahead Mobilize manual sampling crews when conditions are right
25
National Center for Supercomputing Applications Sensors in Corpus Christi Bay Montagna stations SERF stations TCOON stations USGS gages TCEQ stations Hypoxic Regions NCDC station National Datasets (National HIS)Regional Datasets (Workgroup HIS) USGSNCDCTCOONDr. Paul MontagnaTCEQSERF
26
National Center for Supercomputing Applications CCBay Environmental Information System Dashboard Alert Anomaly Detector Hypoxia Predictor Event- Triggered Workflow Execution Event- driven Research Storage for Later Research CyberIntegrator: Forecast CyberCollaboratory: Contact Collaborators CCBay Sensors
27
National Center for Supercomputing Applications CCBay Near-Real-Time Hypoxia Prediction Data Archive Hypoxia Machine Learning Models Anomaly Detection Replace or Remove Errors Update Boundary Condition Models Hypoxia Model Integrator Hydrodynamic Model Visualize Hydrodynamics Water Quality Model Sensor net Visualize Hypoxia Risk D2K workflows Fortran numerical models IM2Learn workflows C++ code
28
National Center for Supercomputing Applications CCBay CI Challenges Automating QA/QC in a real-time network –David Hill is creating sensor anomaly detectors using statistical models (autoregressive models using naïve, clustering, perceptron, and artificial neural network approaches; and multi-sensor models using dynamic Bayesian networks) –While statistical models can identify anomalies, it is sometimes difficult to differentiate sensor errors from unusual environmental phenomena Getting access to the data, which are collected by different groups, stored in multiple formats in different locations –The project is defining a common data dictionary and units and will build Web services to translate
29
National Center for Supercomputing Applications CCBay CI Challenges, Contd. Integrating data into diverse models –Calibration uses historical data, typically done by hand –Near-real-time updating needs automated approaches –Models are complex and derivative-based calibration approaches would be difficult to implement Model integration –Grids change from one type of model to another – defining a common coarse grid, with finer grids overlaid where needed –Data transformers must be built between models
30
National Center for Supercomputing Applications Conclusions Creating CI for environmental data is challenging but the benefits in enabling larger-scale, near-real-time research will be enormous The ECID Cyberenvironment demonstrates the benefits of end-to-end integration of cyberinfrastructure and desktop tools, including: –HIS-type data services –Workflow –Event management –Provenance and knowledge management, and –Collaboration for supporting environmental researchers, educators, and outreach partners This creates a powerful system for linking observatory operations with flexible, investigator-driven research in a community framework (i.e., the national network). –Workflow and knowledge management support testing hypotheses across observatories –Provenance supports QA/QC and rewards for community contributions in an automated fashion.
31
National Center for Supercomputing Applications Acknowledgments Contributors: –NCSA ECID team (Peter Bajcsy, Noshir Contractor, Steve Downey, Joe Futrelle, Hank Green, Rob Kooper, Yong Liu, Luigi Marini, Jim Myers, Mary Pietrowicz, Tim Wentling, York Yao, Inna Zharnitsky) –Corpus Christi Bay Testbed team (PIs: Jim Bonner, Ben Hodges, David Maidment, Barbara Minsker, Paul Montagna) Funding sources: –NSF grants BES-0414259, BES-0533513, and SCI- 0525308 –Office of Naval Research grant N00014-04-1-0437
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.