Proposed Microsoft Water TCI+ Development of the AmeriFlux and Central Valley Data Portals conducted through a partnership with the Berkeley Water Center.

Slides:



Advertisements
Similar presentations
An example of a large-scale interdisciplinary carbon problem Multidecadal climate variability Atmospheric evidence Ocean source? (upwelling, biological.
Advertisements

DS-01 Disaster Risk Reduction and Early Warning Definition
Early Experience Prototyping a Science Data Server for Environmental Data Deb Agarwal (LBL) Catharine van Ingen (MSFT) 25 October 2006.
Maines Sustainability Solutions Initiative (SSI) Focuses on research of the coupled dynamics of social- ecological systems (SES) and the translation of.
The Role of Environmental Monitoring in the Green Economy Strategy K Nathan Hill March 2010.
ASCR Data Science Centers Infrastructure Demonstration S. Canon, N. Desai, M. Ernst, K. Kleese-Van Dam, G. Shipman, B. Tierney.
Presentation at WebEx Meeting June 15,  Context  Challenge  Anticipated Outcomes  Framework  Timeline & Guidance  Comment and Questions.
C van Ingen, D Agarwal, M Goode, J Gupchup, J Hunt, R Leonardson, M Rodriguez, N Li Berkeley Water Center John Hopkins University Lawrence Berkeley Laboratory.
What Needs to be Done? Environmental Impacts Carol Turley and Jerry Blackford Plymouth Marine Laboratory, UK CCS R & D Workshop, Royal Academy of Engineering,
Components of an Integrated Environmental Observatory Information System Cyberinfrastructure to Support Publication of Water Resources Data Jeffery S.
U.S. Department of the Interior U.S. Geological Survey Tom Armstrong Senior Advisor for Global Change Programs U.S. Geological Survey
Dr Matthew Stiff CEH Director Environmental Informatics Presentation to CRM SIG NeSC Edinburgh 12 July 2007 The Environmental Informatics Programme.
Introduction and Overview “the grid” – a proposed distributed computing infrastructure for advanced science and engineering. Purpose: grid concept is motivated.
Berkeley Water Center Early Experience Prototyping a Science Data Server for Environmental Data Deb Agarwal, LBL Catharine van Ingen,
PURPOSE OF THE UIF * Enable the University to seize opportunities at the frontiers of knowledge and learning or to reshape existing programs consistent.
Development of a Community Hydrologic Information System Jeffery S. Horsburgh Utah State University David G. Tarboton Utah State University.
SAN DIEGO SUPERCOMPUTER CENTER Developing a CUAHSI HIS Data Node, as part of Cyberinfrastructure for the Hydrologic Sciences David Valentine Ilya Zaslavsky.
EIA : “Automated Understanding of Captured Experience” Georgia Institute of Technology, College of Computing Investigators: Irfan Essa, G. Abowd,
Data-PASS Shared Catalog Micah Altman & Jonathan Crabtree 1 Micah Altman Harvard University Archival Director, Henry A. Murray Research Archive Associate.
1 Building National Cyberinfrastructure Alan Blatecky Office of Cyberinfrastructure EPSCoR Meeting May 21,
State Geological Survey Contributions to the National Geothermal Data System.
V. Chandrasekar (CSU), Mike Daniels (NCAR), Sara Graves (UAH), Branko Kerkez (Michigan), Frank Vernon (USCD) Integrating Real-time Data into the EarthCube.
The Natural Resources Digital Library Needs, Partners, and Challenges Bonnie Avery, Janine Salwasser, & Janet Webster Oregon State University.
Tradeoff Analysis: From Science to Policy John M. Antle Department of Ag Econ & Econ Montana State University.
The Climate Prediction Project Global Climate Information for Regional Adaptation and Decision-Making in the 21 st Century.
Designing the Microbial Research Commons: An International Symposium Overview National Academy of Sciences Washington, DC October 8-9, 2009 Cathy H. Wu.
Publishing and Visualizing Large-Scale Semantically-enabled Earth Science Resources on the Web Benno Lee 1 Sumit Purohit 2
Updates from EOSDIS -- as they relate to LANCE Kevin Murphy LANCE UWG, 23rd September
Scientific, technological and organisational obstacles facing hydrology Dr Kate Heal School of GeoSciences University of Edinburgh
1 Data Integration Community of Practice Meeting September 15, 2009 Science Data Integration.
GENIUS kick-off - November 2013 GENIUS kick-off meeting WP400 – Tools for data exploitation X. Luri.
Planning for Arctic GIS and Geographic Information Infrastructure Sponsored by the Arctic Research Support and Logistics Program 30 October 2003 Seattle,
Enterprise GIS Planning and Framework Jennifer Reek GIS Coordinator City of Brookfield, WI.
Deb Agarwal abd Marty Humphrey e Norman Beekwilder e Monte Goode abd
Cyberinfrastructure in Parks and Protected Area Management: The Open Parks Grid Elizabeth Dennis Baldwin 1, Brett Wright 1, Sebastien Goasguen 2, Rob Baldwin.
Why Establish an Ecosystem-Atmosphere Flux Measurement Network in India? Dennis Baldocchi ESPM/Ecosystem Science Div. University of California, Berkeley.
15-18 October 2002 Greenville, North Carolina Global Terrestrial Observing System GTOS Jeff Tschirley Programme director.
ESIP Federation Air Quality Cluster Partner Agencies.
Deb Agarwal (UCB and LBNL) Catharine van Ingen (MSFT) Berkeley Water Center Microsoft TCI IndoFlux Meeting, Chennai, India, July.
Research Design for Collaborative Computational Approaches and Scientific Workflows Deana Pennington January 8, 2007.
CUAHSI HIS Survey at Berkeley Seongeun Jeong and Xu Liang Department of Civil & Environmental Engineering UC Berkeley.
Geosciences - Observations (Bob Wilhelmson) The geosciences in NSF’s world consists of atmospheric science, ocean science, and earth science Many of the.
W HAT IS I NTEROPERABILITY ? ( AND HOW DO WE MEASURE IT ?) INSPIRE Conference 2011 Edinburgh, UK.
National Center for Supercomputing Applications Barbara S. Minsker, Ph.D. Associate Professor National Center for Supercomputing Applications and Department.
Soil and Water Conservation Modeling: MODELING SUMMIT SUMMARY COMMENTS Dennis Ojima Natural Resource Ecology Laboratory COLORADO STATE UNIVERSITY 31 MARCH.
MEDIN Work Plan for By March 2011 MEDIN will be 3 years into the original 5 year development plan started in Would normally ask for continued.
What is CDR? – A Few Examples Water Resources in a Changing Climate – Idaho Climate Change Large CD consortia — not the case that everyone works on everything.
1 National Flood Workshop Dr. Thomas Graziano Chief Hydrologic Services Division National Weather Service National Oceanic and Atmospheric Administration.
Global Terrestrial Observing System linking the world’s terrestrial monitoring systems to provide a global vision of the Earth we share.
Cyberinfrastructure What is it? Russ Hobby Internet2 Joint Techs, 18 July 2007.
Shaping a Health Statistics Vision for the 21 st Century 2002 NCHS Data Users Conference 16 July 2002 Daniel J. Friedman, PhD Massachusetts Department.
Cyberinfrastructure to promote Model - Data Integration Robert Cook, Yaxing Wei, and Suresh S. Vannan Oak Ridge National Laboratory Presented at the Model-Data.
Enabling e-Research in Combustion Research Community T.V Pham 1, P.M. Dew 1, L.M.S. Lau 1 and M.J. Pilling 2 1 School of Computing 2 School of Chemistry.
Deepcarbon.net Xiaogang Ma, Patrick West, John Erickson, Stephan Zednik, Yu Chen, Han Wang, Hao Zhong, Peter Fox Tetherless World Constellation Rensselaer.
The Science Requirements for Coastal and Marine Spatial Planning Dr. Robert B. Gagosian President and CEO September 24, 2009.
WA Task Report Prepared by Rick Lawford May 29, 2008.
Cyberinfrastructure: Many Things to Many People Russ Hobby Program Manager Internet2.
Earth System Curator and Model Metadata Discovery and Display for CMIP5 Sylvia Murphy and Cecelia Deluca (NOAA/CIRES) Hannah Wilcox (NCAR/CISL) Metafor.
Fire Emissions Network Sept. 4, 2002 A white paper for the development of a NSF Digital Government Program proposal Stefan Falke Washington University.
Adaptive Integrated Framework (AIF): a new methodology for managing impacts of multiple stressors in coastal ecosystems A bit more on AIF, project components.
1 Symposium on the 50 th Anniversary of Operational Numerical Weather Prediction Dr. Jack Hayes Director, Office of Science and Technology NOAA National.
Splinter Session 1a : Identify topics Europe would like to have included in the GEO WP Chair: Luigi Fusco, ESA Reporting: Luca Demicheli, EuroGeoSurveys.
The Global Scene Wouter Los University of Amsterdam The Netherlands.
Application of NASA ESE Data and Tools to Particulate Air Quality Management A proposal to NASA Earth Science REASoN Solicitation CAN-02-OES-01 REASoN:
NASA Earth Exchange (NEX) A collaborative supercomputing environment for global change science Earth Science Division/NASA Advanced Supercomputing (NAS)
Cyberinfrastructure Overview of Demos Townsville, AU 28 – 31 March 2006 CREON/GLEON.
Big Data in Indian Agriculture D. Rama Rao Director, NAARM.
Strategies for NIS Development
INTAROS – Integrated Arctic Observation System
Geospatial Data Use and sharing Concepts
Presentation transcript:

Proposed Microsoft Water TCI+ Development of the AmeriFlux and Central Valley Data Portals conducted through a partnership with the Berkeley Water Center Susan Hubbard (BWC), Deb Agarwal (LBNL, UCB) & Catharine vanIngen (MSFT) Feb. 2006

Outline l Overview of Berkeley Water Center (BWC) l Motivation and General Objectives for Development of Water Data Portals l Description of proposed Portals: m Carbon-Climate; m Central Valley Cyber-Infrastructure l Proposal Specifics m Requested Support m Project Timeline l Summary

Berkeley Water Center (BWC): A Water Center of Excellence l Is developing a new mode of for doing business at Berkeley by developing a seamless integration of UCB and LBNL expertise; l Conducts interdisciplinary investigations that are coordinated through research thrust areas; l Accelerates thrust area results into applications; l Develops collaborations between Berkeley water researchers and other expert groups; l Creates strong, mutually beneficial partnerships between Berkeley and other academic, governmental, and private sector institutions; BWC involves faculty from 3 UCB Colleges and 3 LBNL Divisions

Meeting the water needs of humans is one of the greatest challenges of the 21st century; Hydrological processes are highly complex and dynamic over various spatial and temporal scales; Understanding hydrological processes with sufficient accuracy in the face of anthropogenic and global changes is a prerequisite to successful water management. Simple access to curated data and related metadata is a necessary component of a modern cyber-infrastructure that enables researchers and water managers to assimilate complex, multi-scale datasets collected from networked micro sensors to global satellite platforms and to use that data with modeling or mining tools to test hypotheses. Motivation for the Water Data Portals

Portal Prototypes: General Objectives l Demonstrate and advanced approach for tackling 21 st century challenges by leveraging web service concepts, Microsoft technologies, and information technology expertise; l Developed in close collaboration with water scientists to ensure that the result is immediately seen as useful for doing water science l Early focus is on most critical components needed to address relevant science questions, rather than creating a fully developed problem solving environment. l Continually demonstrate and “dogfood” prototypes with end-to-end scenarios and use feedback to refine and augment l Work on two different, yet scientifically related, projects that will : m Permit us to understand what is common and what is distinct between different water research approaches; m Allow us to work with a wide range of water datasets and analysis techniques; m Provide demonstration vehicles to two different water research communities.

CALIFORNIA HYDROLOGY Research objectives not well defined; Extremely diverse and ‘dirtier’ large datasets Curation and infrastructure needed prior to synthesis CENTRAL VALLEY DATA PORTAL CARBON-CLIMATE Community has well-defined research objectives; Protocols for acquisition and reporting of AmeriFlux data well developed. Datasets are ripe for synthesis but lack cyberinfrastructure CARBON-CLIMATE DATA PORTAL We propose the development of two data portals, developed based on the needs of different water research communities

Carbon Climate Data Portal l The ability to make global change predictions requires information about carbon stocks and fluxes and the impact of those on climate. l AmeriFlux datasets are used to assess carbon fluxes. m These datasets are collected from 149 environmental observatories located across the Americas. m Protocols are already developed for data acquisition and reporting to a central facility. m The size of a complete historical dataset is a few 100 MBs.

Carbon-Climate Feedbacks: Example l Climate warming is associated with earlier onset of Spring, which is expected to enhance plant growth and to lead to an increase in Carbon sequestration; l Berkeley Researchers (Angert et al., 2005, PNAS) recently found that: m Earlier springs permit more uptake of CO 2 m However, increase in droughts (hotter, dryer summers) resulted in lower net CO 2 uptake which cancels out earlier enhanced uptake. l Carbon-Climate Feedbacks are important; l The ability to compute simple correlations across sites, measurements, and seasons will enable other such interactions to be discovered and thereby to improve global change predictions.

Soils Climate Remote Sensing Examples of Carbon-Climate Datasets Observatory datasets Spatially continuous datasets

Prototype Ameriflux Portal Development l Design of a schema capable of versioning and researcher annotation of AmeriFlux data; l Build a data loading pipeline with basic data cleaning capability through leveraging SQL Server 2005 Integration services l Develop a web portal that can provide simple dataset selection and downloading across measurement sites, parameters, versions, and time windows; l Integrate with commonly used data visualization tools to allow simple data mining and browsing.

Prototype Ameriflux Portal Adoption l Perform end-to-end scenario demonstrations and live dogfooding in collaboration with BWC scientists l Refine and augment based on feedback. Potential augmentations: m Federate with other data sources, such as MODIS remote sensing, soils, and climate datasets. m Link to numerical models that permit hypothesis testing m Leverage workflow components to automate key analysis tasks l Locate long term home for further development and use of the portal.

Web Service Interface to Data and Tools Data Portals: Host Ameriflux Climate Data, Statsgo Soils Data, MODIS products Web-based Workbench access Tools: Statistical Graphical LAI Temp Fpar Veg Index Surf Refl NPP Albedo Choose Ameriflux Area/Transect, Time Range, Data Type Gap Fill, A technique Gap Fill, B technique Design Workflow Statistical & graphical analysis Canoak Model Site 9 Data harvest Sites 1-16 Canoak Model Site 1 Version control Network display LAI Statistical & Graphical analysis Data Cleaning Tools Data Mining and Analysis Tools Modeling Tools Visualization Tools Ecology Toolbox Compute Resources Carbon-Climate Workbench Climate Statsgo MODIS Import other Datasets Knowledge Generation Tools Carbon-Climate Workbench Vision

Central Valley Data Portal l Across the US, groundwater supplies roughly 40 percent of drinking water; l The State of California alone uses about 16 Million acre-feet of ground water each year, more than any other State in the Nation, and 80% of that goes toward crop irrigation; l The 400 Mile long Central Valley supplies ¼ of the food in the US. l California Groundwater quantity and quality is critical to the economic viability of the state! l A Data Portal will enable joint analysis of a range of datasets and tools that are critical to California water resource and water quality, Central Valley

USGS Projects l The importance of Central valley water resources and quality has prompted the USGS to develop a $50M to monitor ground water quality; l The USGS project focuses on intensive data collection, and no plans have been made to curate these data or to federate them with the other water datasets critical for understanding water balance and quality over time in the Central Valley.

Examples of Central Valley Water Datasets Basin Boundary and Stream Network Hydrological Units from Well logs Water levels

List of Analytes Volatile organic compounds Pesticides Stable Isotopes, D, O-18 Tritium-3He / Noble Gases Specific Conductance Stable isotopes, 3H/He, noble gases Carbon Isotopes (C-13,C-14) Radon, Radium, gross alpha/beta Field parameters - temp, EC, DO, turbidity, pH, alk. Major ions and trace elements Arsenic & Iron speciation Nutrients (nitrates, phosphates) Dissolved Organic Carbon Emerging Contaminants E. Coli, total Coliform, Coliphage Selected “ Emerging Contaminants ” Pharmaceuticals N-nitrosodimethylamine (NDMA) Perchlorate 1,4-dioxane Chromium (total and VI) GAMA Project Ken Belitz GAMA water quality Data

Prototype Central Valley Portal Development l Follows approach described for Carbon- Climate portal development (data curation, cleaning, mining and visualization) l Data loading pipeline and cleaning will be more challenging because the datasets are larger, more diverse, and ‘dirtier’ than AmeriFlux l Data visualization likely includes some sort of mapping to display measurements across the Valley

Prototype Central Valley Portal Adoption l Follows approach described for Carbon-Climate portal development (end-to-end scenario demonstrations and live dogfooding in collaboration with BWC scientists) l Use portal to link subset of the data to modest numerical model and attempt to do a specific scientific investigation using the data (Kesterson salt balance). m Demonstrates value of portal m Demonstrates value of dataset l Refine and augment based on feedback. Potential augmentations are also similar to Carbon-Climate portal development although other data sources or data within those sources will differ. l Locate long term home for further development and use of the portal. BWC will host prototype portal for demonstrations and dogfooding.

Data Harvesting and Transformations Knowledge discovery, Hypothesis testing, Water Synthesis Distributed Central Valley Data Sets Data Cleaning, Models, Analysis Tools BWC Analysis Gateway Dissemination and Archiving Building Water Cyberinfrastructure to Connect Data, Resources, and People BWC Data Gateway BWC Water Portal Computational Resources Central Valley Data Vision

Example research and policy questions for the Central Valley Portal Short term: Is salt leaking from the scattered farm irrigation runnoff ponds in the Kesterson Valley? If so, when will that become a significant water quality concern? Long term: What is the long term impact of groundwater constituents, such as fertilizers and emerging contaminants, on human and economic health of California?

Transferability of Central Valley Prototype l Development of an infrastructure to study Central Valley Water is a critical step in fusing science into water management and decision making processes; l Because of the importance of the Central Valley in the water community, the infrastructure will serve as a prototype for basins across the world. l Portal will serve as a springboard for subsequent BWC water research in the Central Valley, such as investigation of the impact of global change on Central Valley productivity.

Proposal Project Parameters l IT components of proposed project to be led by Dr. Deb Agarwal (LBNL/UCB) and Dr. Catharine van Ingen (MSFT); l The BWC will ensure that the prototypes benefit from good scientific input and are distributed through the community; l We request support for 2 programmers and 1 graduate student per year (~350k/year) for two years. l The programmers will start with the development of the more straightforward Carbon-Climate portal and will transition to development of the more challenging Central Valley Portal. l One programmer will primarily focus on development of data loading and web service access, while the other programmer will focus on data cleaning, mining and visualization tools. l Bi-weekly seminars will be held to facilitate exchange between the programmers and the BWC scientists involved in the portal development.

3/06 09/06 03/07 09/07 03/08 Hire programmers, postdocs, graduate students Begin intensive work on protoype Ameriflux Portal Begin conceptual development of architecture for the Central Valley Portal and get data for curation Complete early prototype Ameriflux Portal Begin AmeriFlux Demonstrations to the Carbon Flux community Begin intensive work on prototype Central Valley Portal Refine prototype Ameriflux Portal based on user feedback and make available to researchers for early use Begin federating of AmeriFlux Portal with models and climate and remote sensing datasets Begin prototype Central Valley Portal short and longer term model federation demonstrations Refine and augment prototype Central Valley Portal as needed Transfer Central Valley and AmeriFlux Portals to the respective scientific communities. Project Timeline

Summary l Projects will demonstrate what modern commodity tools and commercial data handling practices can bring to water resources investigations and water management. l Through close interaction between computer scientists and the BWC water specialists and partners, we envision that the data portals developed through this TCI will be immediately beneficial to water science professionals and serve as an example in the more general e-science community. l We request support of 700k over two years to support the development of the proposed portals.