EAP ILTER 9 July 2007 Don Henshaw Andrews Experimental Forest LTER Pacific Northwest Research Station, USFS Forest Service Oregon State University Corvallis,

Slides:



Advertisements
Similar presentations
Mark Servilla & Duane Costa LTER Network Office LTER 2012 All Scientist Meeting LTER Network Office.
Advertisements

GCE Data Toolbox for MATLAB Wade Sheldon Georgia Coastal Ecosystems LTER University of Georgia John Chamblee & Richard Cary Coweeta LTER University of.
Web Access to Long-term Research Hydrology Data Doug Ryan USDA Forest Service Research and Development.
1 Introduction to XML. XML eXtensible implies that users define tag content Markup implies it is a coded document Language implies it is a metalanguage.
Caro-COOPS Data Management: Metadata. Cast-Net addresses the need for improved connectivity among coastal observing systems by creating a regional framework.
Examples and opportunities for syntheses of long-term cross site data LTER Network Experimental Forest Network Lotic Intersite Nitrogen Experiment.
Development of a Community Hydrologic Information System Jeffery S. Horsburgh Utah State University David G. Tarboton Utah State University.
16 months…. The Visibility Information Exchange Web System is a database system and set of online tools originally designed to support the Regional Haze.
Integrating Historical and Realtime Monitoring Data into an Internet Based Watershed Information System for the Bear River Basin Jeff Horsburgh David Stevens,
Building the LTER Network Information System. NIS History, Then and Now YearMilestone 1993 – 1996NIS vision formed by Information Managers (IMs) and LTER.
Watershed Data System: Overview Jean L. Steiner, E. John Sadler, Jin-Song Chen Greg Wilson, David James, Bruce Vandenberg John Ross, Teri Oster, Kevin.
Web-based Portal for Discovery, Retrieval and Visualization of Earth Science Datasets in Grid Environment Zhenping (Jane) Liu.
Slide 1 of 9 Presenting 24x7 Scheduler The art of computer automation Press PageDown key or click to advance.
Dissemination of Haze Data, Data Products and Information Bret Schichtel, Rodger Ames, Shawn McClure and Doug Fox.
Synthesis of Incomplete and Qualified Data using the GCE Data Toolbox Wade Sheldon Georgia Coastal Ecosystems LTER University of Georgia.
Discussion and conclusion The OGC SOS describes a global standard for storing and recalling sensor data and the associated metadata. The standard covers.
About CUAHSI The Consortium of Universities for the Advancement of Hydrologic Science, Inc. (CUAHSI) is an organization representing 120+ universities.
Trimble Connected Community
ClimDB/HydroDB (ClimHy) Integration ClimHy has been migrated from AND to LNO and will remain status quo in 2011 – Public page (
OCLC Online Computer Library Center CONTENTdm ® Digital Collection Management Software Ron Gardner, OCLC Digital Services Consultant ICOLC Meeting April.
U.S. Department of the Interior U.S. Geological Survey CDI Data Management Working Group December 12, 2011 Sally Holl, USGS Texas Water Science Center.
ClimDB/HydroDB A web harvester and data warehouse for hydrometeorological data 2011 StreamChemDB Oct Yang Xia (LTER Network Office, University of.
1 The following presentation is from the Oracle Webcast “What’s New in P6 EPPM Release 8.1.” As a partner, you may not use the Oracle Power Point template,
U.S. Environmental Protection Agency WATERS Status Update
OASIS ebXML Registry Standard Open Forum 2003 on Metadata Registries 10:30 – 11:15 January 20, 2003 Kathryn Breininger The Boeing Company Chair, OASIS.
Long Term Ecological Research Network Information System LTER Grid Pilot Study LTER Information Manager’s Meeting Montreal, Canada 4-7 August 2005 Mark.
GCMD/IDN STATUS AND PLANS Stephen Wharton CWIC Meeting February19, 2015.
Workshop on QC in Derived Data Products, Las Cruces, NM, 31 January 2007 ClimDB/HydroDB Objectives Don Henshaw Improve access to long-term collections.
Eric Holtel.  Introduction  Project Description  Demonstration  Deliverables  Conclusion.
TEMPLATE DESIGN © An increasing world population, industrial development, globalization and changing weather and climate.
Data Management Developing a Venue for Synthesis Jason Downing BNZ Data Management 2009.
The Network Performance Advisor J. W. Ferguson NLANR/DAST & NCSA.
IODE Ocean Data Portal - technological framework of new IODE system Dr. Sergey Belov, et al. Partnership Centre for the IODE Ocean Data Portal MINCyT,
GCE Data Toolbox -- metadata-based tools for automated data processing and analysis Wade Sheldon University of Georgia GCE-LTER.
SEEK EcoGrid l Integrate diverse data networks from ecology, biodiversity, and environmental sciences l Metacat, DiGIR, SRB, Xanthoria,... l EML is the.
1 Schema Registries Steven Hughes, Lou Reich, Dan Crichton NASA 21 October 2015.
Strategies for Adding EML Support to the GCE Data Toolbox for Matlab Wade Sheldon Georgia Coastal Ecosystems LTER (WWW: gce-lter.marsci.uga.edu/lter)
GCE Software Tools for Data Mining, Analysis and Synthesis Wade M. Sheldon Georgia Coastal Ecosystems LTER, University of Georgia, Athens, Georgia Introduction.
Building the LTER Network Information System. NIS History, Then and Now YearMilestone 1993 – 1996NIS vision formed by Information Managers (IMs) and LTER.
LTER Data Management Margaret O’Brien Santa Barbara Coastal Long Term Ecological Research (LTER) Project Santa Barbara Channel Biodiversity Observation.
GBIF Data Access and Database Interoperability 2003 Work Programme Overview Donald Hobern, GBIF Programme Officer for Data Access and Database Interoperability.
Mercury – A Service Oriented Web-based system for finding and retrieving Biogeochemical, Ecological and other land- based data National Aeronautics and.
Long Term Ecological Research Network Office Trends Project Spaghetti & Linguine (aka Trends Data Store) Mark Servilla 14 September.
The US Long Term Ecological Research (LTER) Network: Site and Network Level Information Management Kristin Vanderbilt Department of Biology University.
1 Registry Services Overview J. Steven Hughes (Deputy Chair) Principal Computer Scientist NASA/JPL 17 December 2015.
Distributed Data Analysis & Dissemination System (D-DADS ) Special Interest Group on Data Integration June 2000.
ClearQuest XML Server with ClearCase Integration Northwest Rational User’s Group February 22, 2007 Frank Scholz Casey Stewart
SEEK Science Environment for Ecological Knowledge l EcoGrid l Ecological, biodiversity and environmental data l Computational access l Standardized, open.
System/SDWG Update Management Council Face-to-Face Flagstaff, AZ August 22-23, 2011 Sean Hardman.
Fire Emissions Network Sept. 4, 2002 A white paper for the development of a NSF Digital Government Program proposal Stefan Falke Washington University.
Providing web services to mobile users: The architecture design of an m-service portal Minder Chen - Dongsong Zhang - Lina Zhou Presented by: Juan M. Cubillos.
CUAHSI HIS: Science Challenges Linking small integrated research sites (
The Research Data Archive at NCAR: A System Designed to Handle Diverse Datasets Bob Dattore and Steven Worley National Center for Atmospheric Research.
B. Dalesio, N. Arnold, M. Kraimer, E. Norum, A. Johnson EPICS Collaboration Meeting December 8-10, 2004 Roadmap for IOC.
Global Change Master Directory (GCMD) Mission “To assist the scientific community in the discovery of Earth science data, related services, and ancillary.
Theresa Valentine Spatial Information Manager Corvallis Forest Science Lab.
Simulation Production System Science Advisory Committee Meeting UW-Madison March 1 st -2 nd 2007 Juan Carlos Díaz Vélez.
Cyberinfrastructure Overview of Demos Townsville, AU 28 – 31 March 2006 CREON/GLEON.
Data Management: Data Processing Types of Data Processing at USGS There are several ways to classify Data Processing activities at USGS, and here are some.
IODE Ocean Data Portal - technological framework of new IODE system Dr. Sergey Belov, et al. Partnership Centre for the IODE Ocean Data Portal.
A Semi-Automated Digital Preservation System based on Semantic Web Services Jane Hunter Sharmin Choudhury DSTC PTY LTD, Brisbane, Australia Slides by Ananta.
Grid Services for Digital Archive Tao-Sheng Chen Academia Sinica Computing Centre
Using Python to Retrieve Data from the CUAHSI HIS Web Services Jeffery S. Horsburgh Hydroinformatics Fall 2015 This work was funded by National Science.
Strategies for NIS Development
Jeffery S. Horsburgh Utah State University
Network Information System Advisory Committee (NISAC)
Lecture 8 Database Implementation
Flanders Marine Institute (VLIZ)
Robert Dattore and Steven Worley
Presentation transcript:

EAP ILTER 9 July 2007 Don Henshaw Andrews Experimental Forest LTER Pacific Northwest Research Station, USFS Forest Service Oregon State University Corvallis, Oregon CLIMDB/HYDRODB: A Web Harvester And Data Warehouse Approach To Building A Cross-site Climate And Hydrology Database

EAP ILTER 9 July 2007 Long-Term Research Long-Term Ecological Research (LTER) U.S. Forest Service Research (USFS) International LTER (ILTER)  The 20-year review of LTER challenges the network to enhance its inter-site research activities by adopting a strategy for network- based research  USFS Research intends to increase collaboration and develop network products for existing Experimental Forests/Watersheds  International LTER collaboration

EAP ILTER 9 July 2007  Allow and enhance discovery and access of information Foster development of network-level datasets Commit to populate climate and hydrology datasets  Facilitate synthesis and integration of information Improve discovery, access, aggregation, and visualization of data across multiple sites Overcome diversity in individual site information systems  Promote collaboration and community-building Develop partnerships between Information Technology and science communities LTER Network Information System Advisory Committee, 2003, 2004 LTER Network Information System (NIS) Goals

EAP ILTER 9 July 2007 ClimDB/HydroDB Objectives  Improve access to long-term collections of climatic and hydrological data –Long-Term Ecological Research (LTER)  26 NSF-funded sites  Taiwan Ecological Research Network (ILTER) –U.S. Forest Service Research  Experimental Forests / Experimental Watersheds  Use web technologies to facilitate synthetic research –Maintain a current data warehouse of multi-site, multi-network, long-term data –Provide single portal accessibility with a query interface to download and graphically display data

EAP ILTER 9 July 2007 ClimDB/HydroDB Harvester / Database/ Query Interface Data ProvidersCentral SitePublic User Triggers on-demand auto-harvest HTTP Post USFS Data Exchange Format Web Page display, graph, download Web Services SOAP, WSDL Access Tools site-specific data mining Data Warehouse Centralized ClimDB/HydroDB Database Harvester NWS Data USGS Data LTER Data Query interface

EAP ILTER 9 July 2007 ClimDB/HydroDB Components Data Providers  Individual sites –Participating sites manage and control original source data within their local information systems –Sites provide data as a static or dynamically created file  Exchange format –Consistent, comma-delimited file –Flexibility allows contributors to add or remove parameters from harvest files at any time –Attributes and units standardized and based on a controlled vocabulary

Harvest “Harvester” Mechanics Data Warehouse Centralized ClimDB/HydroDB Database Harvester Exchange Data Transform, QA, Load Feedback Error logs Site contact ClimHy Admin The Quality Assurance (QA)/Feedback System: Provides feedback through error and warning messages directly to the client’s browser and through Specifies errors in exchange format Identifies data limit and integrity errors Enables sites to quickly modify their datasets for successful re-harvesting

EAP ILTER 9 July 2007 Participant Web Page

EAP ILTER 9 July 2007 Duplicate records found

EAP ILTER 9 July 2007 Illegal number of data fields in exchange file

EAP ILTER 9 July 2007 Failed min<mean<max relationship

EAP ILTER 9 July 2007  Allows HydroDB to directly harvest U. S. Geological Survey (USGS) gauging station data from their webpage  Captures near real-time provisional USGS hydrological data on a weekly schedule  Harvests USGS historical data and replaces the provisional data with final archived versions on a regular basis  Generalized as a service to the broader LTER community Georgia Coastal Ecosystem LTER Collaboration

Georgia Coastal Ecosystem LTER Collaboration USGS Data Harvesting Service

EAP ILTER 9 July 2007 Centralized Architecture  Source data is loaded into a global schema in the relational database (RDBMS) –Calculates and loads aggregated data (monthly, annual)  The global schema for the data warehouse is based on highly normalized tables within the database –allows simple structures to house all site data and metadata –is extensible to additional daily measurements  The central data warehouse is persistent and participants can continually update and replace harvested data Data Warehouse Centralized ClimDB/HydroDB Database Transform, QA, Load Harvester

EAP ILTER 9 July 2007 Data Access Page Public Access Web Page

EAP ILTER 9 July 2007 Data Acquisition Download or Graphical Display

EAP ILTER 9 July 2007

EAP ILTER 9 July 2007 Metadata Reports Detail information for the general site, all stations, and all parameters. Metadata descriptions can also be downloaded as a PDF

EAP ILTER 9 July 2007 Georgia Coastal Everglades (GCE) Matlab Data Toolbox GUI dialog for retrieving ClimDB/HydroDB data From Wade Sheldon (GCE)

EAP ILTER 9 July 2007 Imported data set GCE tools editor window) Imported data set (GCE data grid view, with flagged values displayed) ClimDB/HydroDB Metadata template

EAP ILTER 9 July 2007

Client (Harvester) ClimDB Web Services Data Service Metadata Service Notification Service Climate Data Harvester sends XML request for data to Web Service 1. One web Service queries an LTER Site database, another exports the data, and another issues an to the LTER Site data manager detailing success of query 2. LTER Site ClimDB Centralized ClimDB Database (Andrews LTER) 3. LTER Site climate data are returned to harvester in XML 5. The centralized ClimDB database at Andrews LTER is populated Diagram modified from Longjiang Ding, SDSC XML ClimDB Config File EML Resource Description Wizard SOAP, WSDL, UDDI Web Services demonstration of ClimDB 4. Web Service wraps the entire centralized ClimDB database ClimDB Web Services Data Service SOAP, WSDL, UDDI

EAP ILTER 9 July 2007 Site contributions have increased dramatically in the past year for air temperature, precipitation, and stream discharge. Site Contribution Participation includes: 40 total sites 24 LTER sites + 2 International LTER sites 22 USFS sites 11 sites include USGS gauging stations 281 total measurement stations 143 meteorological, 138 stream gauging (59 USGS) 21 daily measurement parameters 7,200,000 daily values

EAP ILTER 9 July 2007 Data Warehouse Content Parameter (Daily values) % by Measured Parameter Stream Discharge 29 Precipitation26 Air Temperature 22 Relative Humidity 4 Global Radiation 4 Soil Temperature 3 Resultant Wind Speed 3 Resultant Wind Direction 2 Other7 Observations: Coverage of precipitation, discharge, and air temperature data is strong across sites. We encourage sites to contribute relative humidity, soil temperature, wind speed & direction, and global radiation in datasets. Primary emphasis Secondary emphasis

EAP ILTER 9 July 2007 ClimDB/HydroDB Web Access Summary Values based on data from February August 2006 Type of download 6700 Downloads 12%50%38%Total DisplaysPlotsFiles Visitors to the ClimDB/HydroDB web interface are increasing and currently average 30 sessions per day.

EAP ILTER 9 July 2007 Status of Type of Use Type of Use % of Total Research40% (60% general research) Education35% (90% students) Testing/Exploring25% (50% testing by participants) Values based on data plots from January - March 2004

EAP ILTER 9 July 2007 Keys to Successful Implementation  Scientific interest –Scientist/modeler demand for current and comparable data –Need for synthetic data products  Organizational –Commitment to building network databases –Information management (15% LTER site budget) –Data access / release policies –Data collection standards –Planning meetings included Climatologists, Information Managers, Data Users/Modelers, and Field Technician participation  Incentives –Financial incentives –Value-added products returned to participating sites  Easy access, aggregated data, graphical displays, QA checks  Host site commitment –Leadership, time, resources

EAP ILTER 9 July 2007 Conclusions  The ClimDB/HydroDB approach is an effective bridge technology between older, more rigid data distribution models and modern service-oriented architectures  Establishes software and service development at the central node permitting rapid adaptation to changing needs  Maintains low-overhead, flexibility and technological neutrality for data providers  Additional "concentrator nodes" and middleware services can also be deployed very easily and rapidly within this model to improve efficiency and build bridges to other federated databases

EAP ILTER 9 July 2007 Funding was provided by National Science Foundation (NSF) Long-Term Ecological Research (LTER) supplemental funding U. S. Forest Service Research and Development Forest Health Monitoring (FHM) program Pacific Northwest Research Station (PNW) …to the Andrews Forest LTER at Oregon State University for ClimDB/HydroDB development …to individual sites for the preparation of climate and hydrology data Visit ClimDB/HydroDB at Acknowledgement

EAP ILTER 9 July 2007 User Guide Section 1.3 Required Steps for Site Participation To participating the site will:  Provide the research areas, meteorological stations, gauged watersheds, and gauging station names and code names  Restructure local site data into a standardized daily exchange format  Use the online metadata forms to provide metadata for overall research area, for every weather station and for every parameter  Harvest data