Supersite Relational Database Project: ‘Federated PM Data Warehouse’ ‘Federated PM Data Warehouse Rudolf Husar, PI Center for Air Pollution Impact and.

Slides:



Advertisements
Similar presentations
ASIAES Project Overview Satellite Image Network for Natural Hazard Management in ASEAN+3 region Pakorn Apaphant Geo-Informatics and Space Technology Development.
Advertisements

Web Services Implementation Case Study: DataFed Air Quality Data & Services Project Coordinators: Software Architecture: R. Husar Software Implementation:
Proposal Outline: Extensions to the VIEWS System: Analysis Tools and Auxiliary Data R. Husar, CAPITA March, 2003 Presentation and Analysis Tools CATT for.
VIEWS / TSS Overview. End-to-end Air Quality Data and Decision Support VIEWS / TSS Vision Acquisition Import Unification Management Manipulation Retrieval.
Technical BI Project Lifecycle
Visibility Information Exchange Web System. Source Data Import Source Data Validation Database Rules Program Logic Storage RetrievalPresentation AnalysisInterpretation.
Federated PM and Haze Data Warehouse Project a sub- project of (enter your sticker & logo here ) Nov 20, 2001, RBH St. Louis Midwest Supersite Project.
The Visibility Information Exchange Web System is a database system and set of online tools originally designed to support the Regional Haze Rule enacted.
Advanced Topics COMP163: Database Management Systems University of the Pacific December 9, 2008.
16 months…. The Visibility Information Exchange Web System is a database system and set of online tools originally designed to support the Regional Haze.
The Visibility Information Exchange Web System (VIEWS): An Approach to Air Quality Data Management and Presentation In a broader sense, VIEWS facilitates.
Proposal Outline: Extensions to the VIEWS: General CATT Analysis Tool R. Husar, CAPITA Revised, June 26, 2003 Proposed Sub-Projects CATT for VIEWS$20k.
Stefan Falke Center for Air Pollution Impact and Trend Analysis Washington University in St. Louis Networked Data and Tools for Environmental Management.
Distributed Data Analysis & Dissemination System (D-DADS) Prepared by Stefan Falke Rudolf Husar Bret Schichtel June 2000.
Web-based Portal for Discovery, Retrieval and Visualization of Earth Science Datasets in Grid Environment Zhenping (Jane) Liu.
Dissemination of Haze Data, Data Products and Information Bret Schichtel, Rodger Ames, Shawn McClure and Doug Fox.
System Design/Implementation and Support for Build 2 PDS Management Council Face-to-Face Mountain View, CA Nov 30 - Dec 1, 2011 Sean Hardman.
Developing Health Geographic Information Systems (HGIS) for Khorasan Province in Iran (Technical Report) S.H. Sanaei-Nejad, (MSc, PhD) Ferdowsi University.
Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization.
Supersite Relational Database System (SRDS) Rudolf Husar, PI Center for Air Pollution Impact and Trend Analysis (CAPITA) Washington University, St. Louis,
CAPITA Projects NSF ToolsCollaboration Tools for Virtual Workgroups EPA WebVis Internet Visibility System NOAAASOS Data Evaluation EPAICAP Intercontinental.
Distributed Voyager (DVoy) Web Services
DRAFT June 6, 2005 ESIP AQ Cluster, Air Quality Cluster Air Quality Cluster TechTrack Earth Science Information Partners Partners NASA.
Presentation. NARSTO-Supersite Data System: Data Flow Data gathering, QA/QC and standard formatting is done by individual projects The data exchange standards,
Ideas on a Network Evaluation and Design System Prepared for EPA OAQPS Richard Scheffe by Rudolf B. Husar and Stefan R. Falke Center for Air Pollution.
Supersite Relational Database Project: (Data Portal?) a sub- project of St. Louis Midwest Supersite Project Draft of the November 16, 2001 Presentation.
Dvoy Database Ideas. Heterogeneous to homogeneous Homogenization by applying uniform schema: Multidimensional data model User queries are directed toward.
Spatio-Temporal Data Sharing using XML Web Services Presented at the Workgroup Meeting on Web-based Environmental Information System for Global Emission.
Stefan Falke Center for Air Pollution Impact and Trend Analysis Washington University in St. Louis Brooke Hemming US EPA – Office of Research and Development.
Application of ESE Data and Tools to Particulate Air Quality Management The CAPITA REASoN Project August 15, 2003 Stefan Falke and Rudolf Husar Center.
Supersite Relational Database Project: (Data Portal?) a sub- project of St. Louis Midwest Supersite Project Draft of the November 16, 2001 Presentation.
Accessing and Using Fire-Related Data with the CAPITA DataFed.net* Services Framework Stefan Falke Rudolf Husar Kari Hoijarvi Washington University in.
Air Quality Data Services: Application of OGC specifications Air Quality Data: Multi-dimensional, multi-source, multi-format Point observations are collected.
Select, Overlay, Explore; Integration of diverse data Distributed Data Heterogeneous coding, access Connects providers to users; Homogenize data access.
Stefan Falke and Rudolf Husar Center for Air Pollution Impact and Trend Analysis Washington University in St. Louis A NSF Digital Government Pilot Project.
Supersite Relational Database Project: (Data Portal?) a sub- project of St. Louis Midwest Supersite Project Draft of the November 16, 2001 Presentation.
Building Dashboards SharePoint and Business Intelligence.
Metadata By N.Gopinath AP/CSE Metadata and it’s role in the lifecycle. The collection, maintenance, and deployment of metadata Metadata and tool integration.
COMMUNITY. Data Acquisition and Usage Value Chain.
Distributed Data Analysis & Dissemination System (D-DADS ) Special Interest Group on Data Integration June 2000.
Brooke L. Hemming, Ph.D. US EPA/National Center for Environmental Assessment Stefan Falke, Ph.D. Washington University in St. Louis Terry Keating, Ph.D.
U.S. Environmental Protection Agency Central Data Exchange Pilot Project Promoting Geospatial Data Exchange Between EPA and State Partners. April 25, 2007.
Fire Emissions Network Sept. 4, 2002 A white paper for the development of a NSF Digital Government Program proposal Stefan Falke Washington University.
NASA REASoN Project SHAirED: S ervices for H elping the Air -quality Community use E SE D ata Stefan Falke, Kari Höijärvi and Rudolf Husar, Washington.
NASA REASoN Project SHAirED: S ervices for H elping the Air -quality Community use E SE D ata Stefan Falke, Kari Höijärvi and Rudolf Husar, Washington.
Object storage and object interoperability
Dvoy Networking Ideas. OpenGIS Web Services Mission: Definition and specification of geospatial web services. A Web service is an application that can.
Processes of the Information Value Chain Informing Knowledge ActionProductive Knowledge Information Organizing Grouping Classifying Formatting Geo-referencing.
Web Services-Based Mediator of Distributed Data Flow and Processing Project Coordinators: Software Architecture: R. Husar Software Implementation: K. Höijärvi.
Alternative Approaches for PM2.5 Mapping: Visibility as a Surrogate Stefan Falke AAAS Science and Engineering Fellow U.S. EPA - Office of Environmental.
An Integrated Fire, Smoke and Air Quality Data & Tools Network Stefan Falke and Rudolf Husar Center for Air Pollution Impact and Trend Analysis Washington.
ESIP Air Quality Jan Air Quality Cluster Air Quality Cluster Technology Track Earth Science Information Partners Partners NASA NOAA EPA (?) USGS.
Copyright 2007, Information Builders. Slide 1 iWay Web Services and WebFOCUS Consumption Michael Florkowski Information Builders.
MEDIATORS. Mediation Typical file-sharing systems have a single global schema for describing their data P2P networks have to consider heterogeneous schemas.
Application of NASA ESE Data and Tools to Particulate Air Quality Management A proposal to NASA Earth Science REASoN Solicitation CAN-02-OES-01 REASoN:
Harmonization and Integration of Semi- Structured Data Through Wikis and Controlled Tagging E. M. Robinson, R. B. Husar Washington University, St. Louis,
VOYAGER Data Explorer: Architecture and Technologies See also the the Voyager Developer Website and early ApplicationsDeveloper WebsiteApplications Layered.
Proposal to MANE_VU: Extensions to the VIEWS: CATT Analysis Tool Full Proposal Text Full Proposal Text R. Husar, PI, CAPITA Revised, October 8, 2003 The.
Voyager Data Services Services for Finding, Exploring and Presenting Distributed Environmental Data Outline Prepared by Voyager Interest Group on Environmental.
Physical Oceanography Distributed Active Archive Center THUANG June 9-13, 20089th GHRSST-PP Science Team Meeting GHRSST GDAC and EOSDIS PO.DAAC.
Building Preservation Environments with Data Grid Technology Reagan W. Moore Presenter: Praveen Namburi.
Data Management: Data Processing Types of Data Processing at USGS There are several ways to classify Data Processing activities at USGS, and here are some.
Fire, Smoke & Air Quality: Tools for Data Exploration & Analysis : Data Sharing/Processing Infrastructure This project integrates.
1 Copyright © 2008, Oracle. All rights reserved. Repository Basics.
Application of NASA ESE Data and Tools to Particulate Air Quality Management A proposal to NASA Earth Science REASoN Solicitation CAN-02-OES-01 REASoN:
NATIONAL AERONAUTICS AND SPACE ADMINISTRATION ESDS Reuse Working Group Earth Science Data Systems Reuse Working Group Case Study: SHAirED Services for.
DATAFED Application Programs. Dvoy Data Flow and Processes DataView 1 View Data Abstract Portrayal Device Portrayal Render Device View Portrayal Device.
Current and Future State of the IMPROVE Website
Dissemination of Haze Data, Data Products and Information
VIEWS / TSS Overview.
Presentation transcript:

Supersite Relational Database Project: ‘Federated PM Data Warehouse’ ‘Federated PM Data Warehouse Rudolf Husar, PI Center for Air Pollution Impact and Trend Analysis (CAPITA) Washington University, St. Louis, MO Nov 27, 2001 Proposal Presentation to the Supersite Program a sub- project of St. Louis Midwest Supersite Project, Jay Turner, PI

Design, Populate and Maintain a Supersite Relational Database System Include monitoring data from Supersites and auxiliary projects Facilitate cross-Supersite [regional or comparative] data analyses Support analyses by a variety of research groups Purpose of the Project:

EPA Specs of the Supersite Relational Data System (from RFP) Data Input: –Data input electronically –Modest amount of metadata on sites, instruments, data sources/version, contacts etc. –Simple data structures, formats and convenient submission procedures Data Storage and Maintenance: –Data storage in relational database(s), possibly distributed over multiple servers –A catalog of data holdings and request logs –Supersite data updates quarterly Data Access: –User-friendly web-access by multiple authorized users –Data query by parameter, method, location, date/time, or other metadata –Multiple data output formats (ASCII, spreadsheet, other (dbf, XML)

General Approach to SRDS Design Based on consensus, adopt a uniform relational data structure, suitable for regional and cross-Supersite data integration and analysis. We propose a star schema with spatial, temporal, parameter and method dimensions. The ‘original’ data are to be maintained at the respective providers or custodians (Supersites, CIRA, CAPITA..). We propose the creation of flexible ‘adapters’ and web-submission forms for the transfer of data subsets into the uniformly formatted ‘Federated Data Warehouse’. Data users would access the data warehouse manually or through software. We propose data access using modern ‘web services’ protocol, suitable for building data viewers and processors (filtering, aggregation and fusion)

The ‘Minimal’ Star Schema The minimal Site table includes SiteID, Name and Lat/Lon. The minimal Parameter table consists of ParamterID, Description and Unit The time dimensional table is usually skipped since time is self-describing The minimal Fact (Data) table consists of the Obs_Value and the three dimensional codes for Obs_DateTime, Site_ID and Parameter_ID For integrative, cross-Supersite analysis, data queries by time, location and parameter, the database has to have time, location and parameter as dimensions The above minimal (multidimensional) schema was used in the CAPITA data exploration software, Voyager for the past 22 years, encoding datasets. Most Supersite data require a more elaborate schema to fully capture the content

From Heterogeneous to Homogeneous Schema Individual Supersite SQL databases have varied designs, usually following the ‘snowflake’ pattern (see Database Schema Design for the Federated Data Warehouse ) Database Schema Design for the Federated Data Warehouse Though they have complicated schemata, these SQL servers can be queried along spatial, temporal and parameter dimensions. However, the query to retrieve the same information depends on the of the particular database. A way to homogenize the distributed data is access all the data through a Data Adapter using only a subset of the tables/fields from any particular database (red) The proposed extracted uniform (abstract) schema is the Minimal Star Schema, (possibly expanded). The final form of the uniformly extracted data schema will be arrived at by consensus. Subset used Uniform Schema Fact Table Data Adapter Extraction of homogeneous data from heterogeneous sources

Live Demo of the Data Warehouse Prototype Uniform Data Query regardless of the native schema: Query by parameter, location, time, method Currently online data are accessible from the CIRA (IMPROVE) and CAPITA SQL servers The hidden DataAdopter - accepts the uniform query - accesses the data server - transforms the original to uniform data - delivers uniforms DataSets A rudimentary viewer displays the data in a table for browsing.

Federated Data Warehouse Federated Data Warehouse Architecture Tree-tier architecture consisting of –Provider Tier: Back-end servers containing heterogeneous data, maintained by the federation members –Proxy Tier: Retrieves designated Provider data and homogenizes it into common, uniform Datasets –User Tier: Accesses the Proxy Server and uses the uniform data for presentation, integration or processing The Provider servers interact only with the Proxy Server in accordance with the Federation Contract –The contract sets the rules of interaction (accessible data subsets, types of queries) –Strong server security measures enforced, e.g. through Secure Socket layer The data User interacts only with the generic Proxy Server using flexible Web Services interface –Generic data queries, applicable to all data in the Warehouse (e.g. space, time, parameter data sub-cube) –The data query is addressed to a Web Service provided by the Proxy Server of the Federation –Uniformly formatted, self-describing data packages are handed to the user for presentation or further processing SQLDataAdapter1 CustomDataAdapter SQLDataAdapter2 SQLServer1 SQLServer2 LegacyServer Presentation Data Access & Use Provider Tier Heterogeneous Data Proxy Tier Data Homogenization, etc. Member Servers Proxy Server User Tier Data Consumption Processing Integration Federated Data Warehouse Fire Wall, Federation Contract Web Service, Uniform Query & Data

Data Re-Use and Synergy Data producers maintain their own workspace and resources (data, reports, comments). Part of the resources are shared by creating a common virtual resources. Web-based integration of the resources can be across several dimensions: Spatial scale:Local – global data sharing Data content:Combination of data generated internally and externally The main benefits of sharing are data re-use, data complementing and synergy. The goal of the system is to have the benefits of sharing outweigh the costs. Content User Local Global Virtual Shared Resources Data, Knowledge Tools, Methods User Shared part of resources

Data Entry to the Supersite Relational Data System: EPA Supersite Data Coordinated Supersite Relational Tables EOSDIS Data Archive NARSTO ORNL DES, Data Ingest Supersite SQL Server DES-SQL Transformer DataAdapter Supersite & other SQL Data Data Query Table Output Direct Web Data Input 1.Automatic translation and transfer of NARSTO-archived DES data to SQL 2.Web-submission of of relational tables by the data producers/custodians 3.Batch transfer of large auxiliary datasets to the SQL server

Data Warehouse Features As much as possible, data should reside in their respective home environment. ‘Uprooted’ data in decoupled databases tend to decay i.e. can not be easily updated, maintained, enriched. Data from the providers will be transferred to the ‘federated data warehouse’ through (1) on-line DataAdapters, (2) Manual web submission and (3) Semi- automated transfer from the NARSTO archive. Retrieval of uniform data from the data warehouse facilitates integration and comparison along the key dimensions (space, time, parameter, method) The open architecture data warehouse (see Web Services) promotes the building of further value chains: Data Viewers, Data Integration Programs, Automatic Report Generators etc..Web Services

Data Preparation Procedures: Data gathering, QA/QC and standard formatting is to be done by individual projects The data exchange standards, data ingest and archives are by ORNL and NASA Data ingest is to automated, aided by tools and procedures supplied by this project –NARSTO DES-SQL translator –Web submission tools and procedures –Metadata Catalog and I/O facilities Data submissions and access will be password protected as set by the community. Submitted data will be retained in a temporary buffer space and following verification transferred to the shared SQL database. The data access, submissions etc. will be automatically recorded an summarized in human-readable reports.

Data Catalog Data Catalog will be maintained through CAPITA website. Limited metadata (based on user consensus) will be recorded for each dataset User feedback on individual datasets will be through comments/feedback pages An example is the data section of the St. Louis Supersite website.St. Louis Supersite

Related CAPITA Projects EPA Network Design Project (~$150K/yr –April 2003). Development of novel quantitative methods of network optimization. The network performance evaluation is conducted using the complete PM FRM data set in AIRS which will be available for input into the SRDS. EPA WebVis Project (~$120K/yr - April 2003). Delivery current visibility data to the public through a web-based system. The surface met data are being transferred into the SQL database (Since March 2001) and will be available to SRDS. NSF Collaboration Support Project (~$140K/yr – Dec 2004). Continuing development of interactive web sites for community discussions and for web-based data sharing; (directly applicable to this project) NOAA ASOS Analysis Project (~$50K/yr - May 2002). Evaluate the potential utility of the ASOS visibility sensors (900 sites, one minute resolution) as PM surrogate. Data now available for April-October 2001 – can be incorporated into to the Supersite Relational Data System. St. Louis Supersite Project website (~$50K/yr – Dec 2003). The CAPITA group maintains the St. Louis Supersite website and some auxiliary data. It will also be used for this project

Federated Data Warehouse Applications XML Web Services Satellite Vector GIS Data XDim Data OLAP Cube SQL Table HTTP Services Text Data Web Page Text Data Time Chart Scatter Chart Text, Table Data View & Process Tier Layered Map Cursor Data Warehouse Tier Data View Manager Connection Manager Data Access Manager Cursor-Query Manager OpenGIS Services Data are rendered by linked Data Views (map, time, text) Distributed data of multiple types (spatial, temporal text ) The Broker handles the views, connections, data access, cursor

Supersite Relational Data System: Schedule First four four months to design of the relational database, associated data transformers, I/O; submitted to the Supersite workgroups for comment In six months, Supersite data preparation and entry begins In Year 2 and Year 3, data sets will be updated by providers as needed; system accessible to data user community Year Year Year RDMS Design Feed back Impl. & Test SQL Supersite Data Entry Auxiliary Data Entry Other Coordinated Data Entry Supersite, Coordinated and Auxiliary Data Updates

Personnel, Management and Facilities Personnel PI, R. B. Husar (10%), Kari Hoijarvi (25%). Software experience at CAPITA, Microsoft, Visala. 20% of project budget ($12k/yr) to consultants: J. Watson, DRI, W. White and J. Turner, WU. Collaborators, (CAPITA associates): B. Schichtel, CIRA, S. Falke, EPA, M. Bezic, Microsoft. Management This project is a sub-project of the St. Louis-Midwest Supersite project, Dr. Jay Turner, PI. Special focus is on supporting large scale, crosscutting, and integrative analysis. This project will leverage the other CAPITA data sharing projects Resources and Facilities CAPITA has the ‘largest known privately held collection of air quality, metrological and emission data’, available in uniform Voyager format and extensively accessed from the CAPITA website The computing and communication facilities include two servers, ten workstations and laptops, connected internally and externally through high-speed networks. Software development tools includes Visual Studio, part of the.NET distributed development environment