Download presentation
Presentation is loading. Please wait.
1
EAP ILTER 9 July 2007 Don Henshaw Andrews Experimental Forest LTER Pacific Northwest Research Station, USFS Forest Service Oregon State University Corvallis, Oregon CLIMDB/HYDRODB: A Web Harvester And Data Warehouse Approach To Building A Cross-site Climate And Hydrology Database
2
EAP ILTER 9 July 2007 Long-Term Research Long-Term Ecological Research (LTER) U.S. Forest Service Research (USFS) International LTER (ILTER) The 20-year review of LTER challenges the network to enhance its inter-site research activities by adopting a strategy for network- based research USFS Research intends to increase collaboration and develop network products for existing Experimental Forests/Watersheds International LTER collaboration
3
EAP ILTER 9 July 2007 Allow and enhance discovery and access of information Foster development of network-level datasets Commit to populate climate and hydrology datasets Facilitate synthesis and integration of information Improve discovery, access, aggregation, and visualization of data across multiple sites Overcome diversity in individual site information systems Promote collaboration and community-building Develop partnerships between Information Technology and science communities LTER Network Information System Advisory Committee, 2003, 2004 LTER Network Information System (NIS) Goals
4
EAP ILTER 9 July 2007 ClimDB/HydroDB Objectives Improve access to long-term collections of climatic and hydrological data –Long-Term Ecological Research (LTER) 26 NSF-funded sites Taiwan Ecological Research Network (ILTER) –U.S. Forest Service Research Experimental Forests / Experimental Watersheds Use web technologies to facilitate synthetic research –Maintain a current data warehouse of multi-site, multi-network, long-term data –Provide single portal accessibility with a query interface to download and graphically display data
6
EAP ILTER 9 July 2007 ClimDB/HydroDB Harvester / Database/ Query Interface Data ProvidersCentral SitePublic User Triggers on-demand auto-harvest HTTP Post USFS Data Exchange Format Web Page display, graph, download Web Services SOAP, WSDL Access Tools site-specific data mining Data Warehouse Centralized ClimDB/HydroDB Database Harvester NWS Data USGS Data LTER Data Query interface
7
EAP ILTER 9 July 2007 ClimDB/HydroDB Components Data Providers Individual sites –Participating sites manage and control original source data within their local information systems –Sites provide data as a static or dynamically created file Exchange format –Consistent, comma-delimited file –Flexibility allows contributors to add or remove parameters from harvest files at any time –Attributes and units standardized and based on a controlled vocabulary
8
Harvest “Harvester” Mechanics Data Warehouse Centralized ClimDB/HydroDB Database Harvester Exchange Data Transform, QA, Load Feedback Error logs Site contact ClimHy Admin The Quality Assurance (QA)/Feedback System: Provides feedback through error and warning messages directly to the client’s browser and through e-mail Specifies errors in exchange format Identifies data limit and integrity errors Enables sites to quickly modify their datasets for successful re-harvesting
9
EAP ILTER 9 July 2007 Participant Web Page http://www.fsl.orst.edu/climhy/harvest/harvest.htm
10
EAP ILTER 9 July 2007 Duplicate records found
11
EAP ILTER 9 July 2007 Illegal number of data fields in exchange file
12
EAP ILTER 9 July 2007 Failed min<mean<max relationship
13
EAP ILTER 9 July 2007 Allows HydroDB to directly harvest U. S. Geological Survey (USGS) gauging station data from their webpage Captures near real-time provisional USGS hydrological data on a weekly schedule Harvests USGS historical data and replaces the provisional data with final archived versions on a regular basis Generalized as a service to the broader LTER community Georgia Coastal Ecosystem LTER Collaboration
14
Georgia Coastal Ecosystem LTER Collaboration USGS Data Harvesting Service
15
EAP ILTER 9 July 2007 Centralized Architecture Source data is loaded into a global schema in the relational database (RDBMS) –Calculates and loads aggregated data (monthly, annual) The global schema for the data warehouse is based on highly normalized tables within the database –allows simple structures to house all site data and metadata –is extensible to additional daily measurements The central data warehouse is persistent and participants can continually update and replace harvested data Data Warehouse Centralized ClimDB/HydroDB Database Transform, QA, Load Harvester
16
EAP ILTER 9 July 2007 Data Access Page Public Access Web Page http://www.fsl.orst.edu/climhy
17
EAP ILTER 9 July 2007 Data Acquisition Download or Graphical Display
18
EAP ILTER 9 July 2007
19
EAP ILTER 9 July 2007 Metadata Reports Detail information for the general site, all stations, and all parameters. Metadata descriptions can also be downloaded as a PDF
20
EAP ILTER 9 July 2007 Georgia Coastal Everglades (GCE) Matlab Data Toolbox GUI dialog for retrieving ClimDB/HydroDB data From Wade Sheldon (GCE)
21
EAP ILTER 9 July 2007 Imported data set GCE tools editor window) Imported data set (GCE data grid view, with flagged values displayed) ClimDB/HydroDB Metadata template
22
EAP ILTER 9 July 2007
23
Client (Harvester) ClimDB Web Services Data Service Metadata Service Notification Service Climate Data Harvester sends XML request for data to Web Service 1. One web Service queries an LTER Site database, another exports the data, and another issues an email to the LTER Site data manager detailing success of query 2. LTER Site ClimDB Centralized ClimDB Database (Andrews LTER) 3. LTER Site climate data are returned to harvester in XML 5. The centralized ClimDB database at Andrews LTER is populated Diagram modified from Longjiang Ding, SDSC XML ClimDB Config File EML Resource Description Wizard SOAP, WSDL, UDDI Web Services demonstration of ClimDB 4. Web Service wraps the entire centralized ClimDB database ClimDB Web Services Data Service SOAP, WSDL, UDDI
24
EAP ILTER 9 July 2007 Site contributions have increased dramatically in the past year for air temperature, precipitation, and stream discharge. Site Contribution Participation includes: 40 total sites 24 LTER sites + 2 International LTER sites 22 USFS sites 11 sites include USGS gauging stations 281 total measurement stations 143 meteorological, 138 stream gauging (59 USGS) 21 daily measurement parameters 7,200,000 daily values
25
EAP ILTER 9 July 2007 Data Warehouse Content Parameter (Daily values) % by Measured Parameter Stream Discharge 29 Precipitation26 Air Temperature 22 Relative Humidity 4 Global Radiation 4 Soil Temperature 3 Resultant Wind Speed 3 Resultant Wind Direction 2 Other7 Observations: Coverage of precipitation, discharge, and air temperature data is strong across sites. We encourage sites to contribute relative humidity, soil temperature, wind speed & direction, and global radiation in datasets. Primary emphasis Secondary emphasis
26
EAP ILTER 9 July 2007 ClimDB/HydroDB Web Access Summary Values based on data from February 2003 - August 2006 Type of download 6700 Downloads 12%50%38%Total DisplaysPlotsFiles Visitors to the ClimDB/HydroDB web interface are increasing and currently average 30 sessions per day.
27
EAP ILTER 9 July 2007 Status of Type of Use Type of Use % of Total Research40% (60% general research) Education35% (90% students) Testing/Exploring25% (50% testing by participants) Values based on data plots from January - March 2004
28
EAP ILTER 9 July 2007 Keys to Successful Implementation Scientific interest –Scientist/modeler demand for current and comparable data –Need for synthetic data products Organizational –Commitment to building network databases –Information management (15% LTER site budget) –Data access / release policies –Data collection standards –Planning meetings included Climatologists, Information Managers, Data Users/Modelers, and Field Technician participation Incentives –Financial incentives –Value-added products returned to participating sites Easy access, aggregated data, graphical displays, QA checks Host site commitment –Leadership, time, resources
29
EAP ILTER 9 July 2007 Conclusions The ClimDB/HydroDB approach is an effective bridge technology between older, more rigid data distribution models and modern service-oriented architectures Establishes software and service development at the central node permitting rapid adaptation to changing needs Maintains low-overhead, flexibility and technological neutrality for data providers Additional "concentrator nodes" and middleware services can also be deployed very easily and rapidly within this model to improve efficiency and build bridges to other federated databases
30
EAP ILTER 9 July 2007 Funding was provided by National Science Foundation (NSF) Long-Term Ecological Research (LTER) supplemental funding U. S. Forest Service Research and Development Forest Health Monitoring (FHM) program Pacific Northwest Research Station (PNW) …to the Andrews Forest LTER at Oregon State University for ClimDB/HydroDB development …to individual sites for the preparation of climate and hydrology data Visit ClimDB/HydroDB at http://www.fsl.orst.edu/climhy Acknowledgement
31
EAP ILTER 9 July 2007 User Guide Section 1.3 Required Steps for Site Participation To participating the site will: Provide the research areas, meteorological stations, gauged watersheds, and gauging station names and code names Restructure local site data into a standardized daily exchange format Use the online metadata forms to provide metadata for overall research area, for every weather station and for every parameter Harvest data
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.