Proposed Microsoft Water TCI+ Development of the AmeriFlux and Central Valley Data Portals conducted through a partnership with the Berkeley Water Center Susan Hubbard (BWC), Deb Agarwal (LBNL, UCB) & Catharine vanIngen (MSFT) Feb. 2006
Outline l Overview of Berkeley Water Center (BWC) l Motivation and General Objectives for Development of Water Data Portals l Description of proposed Portals: m Carbon-Climate; m Central Valley Cyber-Infrastructure l Proposal Specifics m Requested Support m Project Timeline l Summary
Berkeley Water Center (BWC): A Water Center of Excellence l Is developing a new mode of for doing business at Berkeley by developing a seamless integration of UCB and LBNL expertise; l Conducts interdisciplinary investigations that are coordinated through research thrust areas; l Accelerates thrust area results into applications; l Develops collaborations between Berkeley water researchers and other expert groups; l Creates strong, mutually beneficial partnerships between Berkeley and other academic, governmental, and private sector institutions; BWC involves faculty from 3 UCB Colleges and 3 LBNL Divisions
Meeting the water needs of humans is one of the greatest challenges of the 21st century; Hydrological processes are highly complex and dynamic over various spatial and temporal scales; Understanding hydrological processes with sufficient accuracy in the face of anthropogenic and global changes is a prerequisite to successful water management. Simple access to curated data and related metadata is a necessary component of a modern cyber-infrastructure that enables researchers and water managers to assimilate complex, multi-scale datasets collected from networked micro sensors to global satellite platforms and to use that data with modeling or mining tools to test hypotheses. Motivation for the Water Data Portals
Portal Prototypes: General Objectives l Demonstrate and advanced approach for tackling 21 st century challenges by leveraging web service concepts, Microsoft technologies, and information technology expertise; l Developed in close collaboration with water scientists to ensure that the result is immediately seen as useful for doing water science l Early focus is on most critical components needed to address relevant science questions, rather than creating a fully developed problem solving environment. l Continually demonstrate and “dogfood” prototypes with end-to-end scenarios and use feedback to refine and augment l Work on two different, yet scientifically related, projects that will : m Permit us to understand what is common and what is distinct between different water research approaches; m Allow us to work with a wide range of water datasets and analysis techniques; m Provide demonstration vehicles to two different water research communities.
CALIFORNIA HYDROLOGY Research objectives not well defined; Extremely diverse and ‘dirtier’ large datasets Curation and infrastructure needed prior to synthesis CENTRAL VALLEY DATA PORTAL CARBON-CLIMATE Community has well-defined research objectives; Protocols for acquisition and reporting of AmeriFlux data well developed. Datasets are ripe for synthesis but lack cyberinfrastructure CARBON-CLIMATE DATA PORTAL We propose the development of two data portals, developed based on the needs of different water research communities
Carbon Climate Data Portal l The ability to make global change predictions requires information about carbon stocks and fluxes and the impact of those on climate. l AmeriFlux datasets are used to assess carbon fluxes. m These datasets are collected from 149 environmental observatories located across the Americas. m Protocols are already developed for data acquisition and reporting to a central facility. m The size of a complete historical dataset is a few 100 MBs.
Carbon-Climate Feedbacks: Example l Climate warming is associated with earlier onset of Spring, which is expected to enhance plant growth and to lead to an increase in Carbon sequestration; l Berkeley Researchers (Angert et al., 2005, PNAS) recently found that: m Earlier springs permit more uptake of CO 2 m However, increase in droughts (hotter, dryer summers) resulted in lower net CO 2 uptake which cancels out earlier enhanced uptake. l Carbon-Climate Feedbacks are important; l The ability to compute simple correlations across sites, measurements, and seasons will enable other such interactions to be discovered and thereby to improve global change predictions.
Soils Climate Remote Sensing Examples of Carbon-Climate Datasets Observatory datasets Spatially continuous datasets
Prototype Ameriflux Portal Development l Design of a schema capable of versioning and researcher annotation of AmeriFlux data; l Build a data loading pipeline with basic data cleaning capability through leveraging SQL Server 2005 Integration services l Develop a web portal that can provide simple dataset selection and downloading across measurement sites, parameters, versions, and time windows; l Integrate with commonly used data visualization tools to allow simple data mining and browsing.
Prototype Ameriflux Portal Adoption l Perform end-to-end scenario demonstrations and live dogfooding in collaboration with BWC scientists l Refine and augment based on feedback. Potential augmentations: m Federate with other data sources, such as MODIS remote sensing, soils, and climate datasets. m Link to numerical models that permit hypothesis testing m Leverage workflow components to automate key analysis tasks l Locate long term home for further development and use of the portal.
Web Service Interface to Data and Tools Data Portals: Host Ameriflux Climate Data, Statsgo Soils Data, MODIS products Web-based Workbench access Tools: Statistical Graphical LAI Temp Fpar Veg Index Surf Refl NPP Albedo Choose Ameriflux Area/Transect, Time Range, Data Type Gap Fill, A technique Gap Fill, B technique Design Workflow Statistical & graphical analysis Canoak Model Site 9 Data harvest Sites 1-16 Canoak Model Site 1 Version control Network display LAI Statistical & Graphical analysis Data Cleaning Tools Data Mining and Analysis Tools Modeling Tools Visualization Tools Ecology Toolbox Compute Resources Carbon-Climate Workbench Climate Statsgo MODIS Import other Datasets Knowledge Generation Tools Carbon-Climate Workbench Vision
Central Valley Data Portal l Across the US, groundwater supplies roughly 40 percent of drinking water; l The State of California alone uses about 16 Million acre-feet of ground water each year, more than any other State in the Nation, and 80% of that goes toward crop irrigation; l The 400 Mile long Central Valley supplies ¼ of the food in the US. l California Groundwater quantity and quality is critical to the economic viability of the state! l A Data Portal will enable joint analysis of a range of datasets and tools that are critical to California water resource and water quality, Central Valley
USGS Projects l The importance of Central valley water resources and quality has prompted the USGS to develop a $50M to monitor ground water quality; l The USGS project focuses on intensive data collection, and no plans have been made to curate these data or to federate them with the other water datasets critical for understanding water balance and quality over time in the Central Valley.
Examples of Central Valley Water Datasets Basin Boundary and Stream Network Hydrological Units from Well logs Water levels
List of Analytes Volatile organic compounds Pesticides Stable Isotopes, D, O-18 Tritium-3He / Noble Gases Specific Conductance Stable isotopes, 3H/He, noble gases Carbon Isotopes (C-13,C-14) Radon, Radium, gross alpha/beta Field parameters - temp, EC, DO, turbidity, pH, alk. Major ions and trace elements Arsenic & Iron speciation Nutrients (nitrates, phosphates) Dissolved Organic Carbon Emerging Contaminants E. Coli, total Coliform, Coliphage Selected “ Emerging Contaminants ” Pharmaceuticals N-nitrosodimethylamine (NDMA) Perchlorate 1,4-dioxane Chromium (total and VI) GAMA Project Ken Belitz GAMA water quality Data
Prototype Central Valley Portal Development l Follows approach described for Carbon- Climate portal development (data curation, cleaning, mining and visualization) l Data loading pipeline and cleaning will be more challenging because the datasets are larger, more diverse, and ‘dirtier’ than AmeriFlux l Data visualization likely includes some sort of mapping to display measurements across the Valley
Prototype Central Valley Portal Adoption l Follows approach described for Carbon-Climate portal development (end-to-end scenario demonstrations and live dogfooding in collaboration with BWC scientists) l Use portal to link subset of the data to modest numerical model and attempt to do a specific scientific investigation using the data (Kesterson salt balance). m Demonstrates value of portal m Demonstrates value of dataset l Refine and augment based on feedback. Potential augmentations are also similar to Carbon-Climate portal development although other data sources or data within those sources will differ. l Locate long term home for further development and use of the portal. BWC will host prototype portal for demonstrations and dogfooding.
Data Harvesting and Transformations Knowledge discovery, Hypothesis testing, Water Synthesis Distributed Central Valley Data Sets Data Cleaning, Models, Analysis Tools BWC Analysis Gateway Dissemination and Archiving Building Water Cyberinfrastructure to Connect Data, Resources, and People BWC Data Gateway BWC Water Portal Computational Resources Central Valley Data Vision
Example research and policy questions for the Central Valley Portal Short term: Is salt leaking from the scattered farm irrigation runnoff ponds in the Kesterson Valley? If so, when will that become a significant water quality concern? Long term: What is the long term impact of groundwater constituents, such as fertilizers and emerging contaminants, on human and economic health of California?
Transferability of Central Valley Prototype l Development of an infrastructure to study Central Valley Water is a critical step in fusing science into water management and decision making processes; l Because of the importance of the Central Valley in the water community, the infrastructure will serve as a prototype for basins across the world. l Portal will serve as a springboard for subsequent BWC water research in the Central Valley, such as investigation of the impact of global change on Central Valley productivity.
Proposal Project Parameters l IT components of proposed project to be led by Dr. Deb Agarwal (LBNL/UCB) and Dr. Catharine van Ingen (MSFT); l The BWC will ensure that the prototypes benefit from good scientific input and are distributed through the community; l We request support for 2 programmers and 1 graduate student per year (~350k/year) for two years. l The programmers will start with the development of the more straightforward Carbon-Climate portal and will transition to development of the more challenging Central Valley Portal. l One programmer will primarily focus on development of data loading and web service access, while the other programmer will focus on data cleaning, mining and visualization tools. l Bi-weekly seminars will be held to facilitate exchange between the programmers and the BWC scientists involved in the portal development.
3/06 09/06 03/07 09/07 03/08 Hire programmers, postdocs, graduate students Begin intensive work on protoype Ameriflux Portal Begin conceptual development of architecture for the Central Valley Portal and get data for curation Complete early prototype Ameriflux Portal Begin AmeriFlux Demonstrations to the Carbon Flux community Begin intensive work on prototype Central Valley Portal Refine prototype Ameriflux Portal based on user feedback and make available to researchers for early use Begin federating of AmeriFlux Portal with models and climate and remote sensing datasets Begin prototype Central Valley Portal short and longer term model federation demonstrations Refine and augment prototype Central Valley Portal as needed Transfer Central Valley and AmeriFlux Portals to the respective scientific communities. Project Timeline
Summary l Projects will demonstrate what modern commodity tools and commercial data handling practices can bring to water resources investigations and water management. l Through close interaction between computer scientists and the BWC water specialists and partners, we envision that the data portals developed through this TCI will be immediately beneficial to water science professionals and serve as an example in the more general e-science community. l We request support of 700k over two years to support the development of the proposed portals.