Download presentation
Presentation is loading. Please wait.
Published byElvin York Modified over 9 years ago
1
http://esd.lbl.gov/BWC/ Deb Agarwal (UCB and LBNL) Catharine van Ingen (MSFT) Berkeley Water Center Microsoft TCI IndoFlux Meeting, Chennai, India, July 13, 2006 Designing CyberInfrastructure to Support End Science
2
Project Motivation l Data is now being gathered into common data archives l Data archives provide an opportunity for cross-discipline and cross-site investigations l Data analysis techniques which worked well on small data sets often do not scale l Current CS tools have evolved in support of other disciplines – Investigate their ability to facilitate data analysis
3
Distributed Data Sets Building BWC Water Cyberinfrastructure to Connect Data, Resources, and People Science Portal Data Harvesting and Transformations Data Cleaning, Models, Analysis Tools Computational Resources
4
Web Service Interface to Data and Tools Data Providers: Host Ameriflux Climate Data Statsgo Soils Data MODIS products Web-based Workbench access Tools: Statistical Graphical LAI Temp Fpar Veg Index Surf Refl NPP Albedo Choose Ameriflux Area/Transect, Time Range, Data Type Gap Fill, A technique Gap Fill, B technique Design Workflow Statistical & graphical analysis Canoak Model Site 9 Data harvest Sites 1-16 Canoak Model Site 1 Version control Network display LAI Statistical & Graphical analysis Data Cleaning Tools Data Mining and Analysis Tools Modeling Tools Visualization Tools Ecology Toolbox Compute Resources Carbon Community Workbench Climate Statsgo MODIS Import other Datasets Knowledge Generation Tools
5
Approach l Work closely with the end scientists to define, prototype, and test the system l Provide a solution that leverages both server-based and local desktop/laptop environments l Leverage commercial tools to the extent possible
6
Some Critical Capabilities l Support for versioning of data sets l Work with multiple data sets l Advanced data selection and plotting capabilities m Select data relative to an event m Simple calculation across any specified date range m Statistical information available m Plots - scatter, diurnal, time series, probability density function, tiled, correlation l Ability to access capabilities from desktop
7
Data Pipeline ORNL Ameriflux Site CSV Files BWC SQL Server Database Data Cube Excel Pivot Table and Chart
8
Data Cleaning and Versioning BWC SQL Server Database Excel spreadsheet of current data Investigator updated spreadsheet
9
Analysis Services Data Cube l An organized view of the data l A multi-dimensional view into the data l Can integrate multiple data sources l Define measures and dimensions m Measure – a value you want to be able to plot m Dimension – An axis you want to be able to use to select data and as axis l Calculations – define new measures
12
Precipitation trends and totals Summer precipitation: Tonzi and Vaira ~ 2% of total Metolius ~ 24% of total Walker Branch ~ 40% of total *Plot created by Gretchen Miller of UC Berkeley
13
Other applications *Plot created by Gretchen Miller of UC Berkeley
14
Observations by latitude *Plot created by Gretchen Miller of UC Berkeley
15
Observations by ecosystem type *Plot created by Gretchen Miller of UC Berkeley
16
Some Lessons Learned so Far l Data naming and unit consistency is critical to easy ingest of large amounts of data l Commercial tools do not necessarily provide all the right analysis capabilities directly l Scaling capabilities of the tools not yet clear l We will need tools to aid in notification of PIs
17
Portal Deployment l Behind the portal are a collection of databases and data cubes l Distribution for ease of use m Only see the data of interest m Private data remains stable l Distribution for scaling m Smaller queries on smaller databases take less resources m Larger databases and cubes can be replicated across machines l Batch job like infrastructure for managing very long running queries
19
Acknowlegements l Science Team m Dennis Baldocchi m Bev Law m Gretchen Miller l Cyberinfrastructure m Matt Rodriguez m Monte Goode l Microsoft m Tony Hey m Nolan Li l Oak Ridge National Lab CDIAC personnel l Berkeley Water Center m Yoram Rubin m Susan Hubbard
20
URLs and Connection Coordinates l Web Site m http://esd.lbl.gov/BWC l Blog m http://dsd.lbl.gov/BWC/amfluxblog l E-mail m bwc-tci@lists.berkeley.edu
21
http://esd.lbl.gov/BWC/
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.