Voyager Data Services Services for Finding, Exploring and Presenting Distributed Environmental Data Outline Prepared by Voyager Interest Group on Environmental.

Slides:



Advertisements
Similar presentations
Web Services Implementation Case Study: DataFed Air Quality Data & Services Project Coordinators: Software Architecture: R. Husar Software Implementation:
Advertisements

©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 12 Slide 1 Distributed Systems Design 2.
Federated PM and Haze Data Warehouse Project a sub- project of (enter your sticker & logo here ) Nov 20, 2001, RBH St. Louis Midwest Supersite Project.
Service-Oriented Architecture
Talk Outline Air Quality Information System Challenges (5min) –Real-time monitoring and data delivery (1 slide) –Characterization of pollutant in.
Proposal Outline: Extensions to the VIEWS: General CATT Analysis Tool R. Husar, CAPITA Revised, June 26, 2003 Proposed Sub-Projects CATT for VIEWS$20k.
Stefan Falke Center for Air Pollution Impact and Trend Analysis Washington University in St. Louis Networked Data and Tools for Environmental Management.
Distributed Data Analysis & Dissemination System (D-DADS) Prepared by Stefan Falke Rudolf Husar Bret Schichtel June 2000.
Web-based Portal for Discovery, Retrieval and Visualization of Earth Science Datasets in Grid Environment Zhenping (Jane) Liu.
Select, Overlay, Explore; Multidimensional data Maintain Distributed Data; Heterogeneous coding, access Connect providers to users; Homogenize data access.
©Ian Sommerville 2006Software Engineering, 8th edition. Chapter 12 Slide 1 Distributed Systems Architectures.
CAPITA Projects NSF ToolsCollaboration Tools for Virtual Workgroups EPA WebVis Internet Visibility System NOAAASOS Data Evaluation EPAICAP Intercontinental.
Distributed Voyager (DVoy) Web Services
DRAFT June 6, 2005 ESIP AQ Cluster, Air Quality Cluster Air Quality Cluster TechTrack Earth Science Information Partners Partners NASA.
DRAFT April 28, 2005 ESIP AQ Cluster, The data life cycle consists of the acquisition and the usage parts Usage ActivitiesData Acquisition.
Instrument Builders Information Specialists (ESIP) Scientists Curriculum Developers Teachers Decision Analysts Decision Makers Reports From Kim Kastens.
Introduction to Apache OODT Yang Li Mar 9, What is OODT Object Oriented Data Technology Science data management Archiving Systems that span scientific.
REASoN REASoN Project to link NASA's data, modeling and systems to users in research, applications and education Application of NASA ESE Data and Tools.
Air Quality Focus Group Discussion Summary ESIP Winter Meeting January 2005 Air Quality is one of 12 Applications of National Priority as defined by NASA.
Federated Network for Sharing Air Quality Data and Processing Services Center for Air Pollution Impact and Trend Analysis (CAPITA) Washington University,
Work Group Meeting on HTAP-Relevant IT Techniques, Tools and Philosophies: DataFed Experience and Perspectives Rudolf B. Husar CAPITA, Washington University,
Dvoy Database Ideas. Heterogeneous to homogeneous Homogenization by applying uniform schema: Multidimensional data model User queries are directed toward.
REASoN REASoN Project to link NASA's data, modeling and systems to users in research, education and applications Application of NASA ESE Data and Tools.
VOYAGER Data Explorer: Architecture and Technologies See also Design and ApplicationsDesign and Applications Built and used Used by a Virtual Community.
Spatio-Temporal Data Sharing using XML Web Services Presented at the Workgroup Meeting on Web-based Environmental Information System for Global Emission.
DataFed Challenge. Value-Adding Processes Integrated DataDatasets Std. Interface Data Views Std. Interface Data Control Reports Obs. & ModelsDecision.
Current Air Quality Information ‘Ecosystem’ (Draft for Feedback) AQ information includes emissions, ambient & satellite data and model outputs The distributed.
Stefan Falke Center for Air Pollution Impact and Trend Analysis Washington University in St. Louis Brooke Hemming US EPA – Office of Research and Development.
Application of ESE Data and Tools to Particulate Air Quality Management The CAPITA REASoN Project August 15, 2003 Stefan Falke and Rudolf Husar Center.
Air Quality Cluster Air Quality Cluster TechTrack Earth Science Information Partners Partners(?) NASA NOAA EPA USGS DOE NSF Industry… Data Flow Technologies.
Supersite Relational Database Project: (Data Portal?) a sub- project of St. Louis Midwest Supersite Project Draft of the November 16, 2001 Presentation.
Accessing and Using Fire-Related Data with the CAPITA DataFed.net* Services Framework Stefan Falke Rudolf Husar Kari Hoijarvi Washington University in.
1 Application Scenario: Smoke Impact REASoN Project: Application of NASA ESE Data and Tools to Particulate Air Quality Management (PPT/PDF)Application.
WS Roadmap. The pathway to a service-oriented architecture The pathway to a service-oriented architecture Bob Sutor, IBM IBM identified four steppingstones.
Select, Overlay, Explore; Integration of diverse data Distributed Data Heterogeneous coding, access Connects providers to users; Homogenize data access.
Stefan Falke and Rudolf Husar Center for Air Pollution Impact and Trend Analysis Washington University in St. Louis A NSF Digital Government Pilot Project.
VOYAGER Data Explorer: Architecture and Technologies See also the the Voyager Developer Website and early ApplicationsDeveloper WebsiteApplications Layered.
Federated Network for Sharing Air Quality Data and Processing Services Center for Air Pollution Impact and Trend Analysis (CAPITA) Washington University,
Select, Overlay, Explore; Multidimensional data Maintain Distributed Data; Heterogeneous coding, access Connect providers to users; Homogenize data access.
COMMUNITY. Data Acquisition and Usage Value Chain.
Distributed Data Analysis & Dissemination System (D-DADS ) Special Interest Group on Data Integration June 2000.
Dvoy Related Ideas. Data Acquisition and Usage Value Chain.
VOYAGER Data Explorer: Architecture and Technologies See also the the Voyager Developer Website and early ApplicationsDeveloper WebsiteApplications Layered.
Air and Waste Management Association Professional Development Course AIR-257: Satellite Detection of Aerosols Issues and Opportunities Fraction.
Fire Emissions Network Sept. 4, 2002 A white paper for the development of a NSF Digital Government Program proposal Stefan Falke Washington University.
NASA REASoN Project SHAirED: S ervices for H elping the Air -quality Community use E SE D ata Stefan Falke, Kari Höijärvi and Rudolf Husar, Washington.
NASA REASoN Project SHAirED: S ervices for H elping the Air -quality Community use E SE D ata Stefan Falke, Kari Höijärvi and Rudolf Husar, Washington.
Dvoy Networking Ideas. OpenGIS Web Services Mission: Definition and specification of geospatial web services. A Web service is an application that can.
Processes of the Information Value Chain Informing Knowledge ActionProductive Knowledge Information Organizing Grouping Classifying Formatting Geo-referencing.
Architecture and Technologies for an Agile, User-Oriented Air Quality Data System Rudolf B. Husar Washington University, St. Louis Presented at the workshop.
Web Services-Based Mediator of Distributed Data Flow and Processing Project Coordinators: Software Architecture: R. Husar Software Implementation: K. Höijärvi.
An Integrated Fire, Smoke and Air Quality Data & Tools Network Stefan Falke and Rudolf Husar Center for Air Pollution Impact and Trend Analysis Washington.
ESIP Air Quality Jan Air Quality Cluster Air Quality Cluster Technology Track Earth Science Information Partners Partners NASA NOAA EPA (?) USGS.
: Data Sharing/Processing Infrastructure Data Catalog and Access Dozens of datasets on aerosols, emissions, fire, meteorology,
Copyright 2007, Information Builders. Slide 1 iWay Web Services and WebFOCUS Consumption Michael Florkowski Information Builders.
Anatomy of a Wrapper Service: TOMS Satellite Image Data Given the URL template and the image description, the wrapper service can access the image for.
1 SEEDS IT Vision Scenario: Smoke Impact REASoN Project: Application of NASA ESE Data and Tools to Particulate Air Quality Management (PPT/PDF)Application.
MEDIATORS. Mediation Typical file-sharing systems have a single global schema for describing their data P2P networks have to consider heterogeneous schemas.
DRAFT June 6, 2005 ESIP AQ Cluster, Contact R. Husar Air Quality Cluster Air Quality Cluster TechTrack Earth Science Information Partners.
Topic Suggestions Scheffe GEOSS Support to Regional Air Quality (see next slide) –Data. Services –Sharing/Harvesting Infrastructure –Intellectual Resources.
Federated Network for Sharing Air Quality Data and Processing Services Center for Air Pollution Impact and Trend Analysis (CAPITA) Washington University,
Fire, Smoke & Air Quality: Tools for Data Exploration & Analysis : Data Sharing/Processing Infrastructure This project integrates.
1 Copyright © 2008, Oracle. All rights reserved. Repository Basics.
Environmental Data Content and Form Stuff. 4 D Geo-Environmental Data Cube (X, Y, Z, T) Environmental data represent measurements in the physical world.
There is increasing evidence that intercontinental transport of air pollutants is substantial Currently, chemical transport models are the main tools for.
NATIONAL AERONAUTICS AND SPACE ADMINISTRATION ESDS Reuse Working Group Earth Science Data Systems Reuse Working Group Case Study: SHAirED Services for.
DATAFED Application Programs. Dvoy Data Flow and Processes DataView 1 View Data Abstract Portrayal Device Portrayal Render Device View Portrayal Device.
ESIP Air Quality Jan Air Quality Cluster Air Quality Cluster Technology Track Earth Science Information Partners Partners NASA NOAA EPA (?) USGS.
The Database Environment
4/5 May 2009 The Palazzo dei Congressi di Stresa Stresa, Italy
Presentation transcript:

Voyager Data Services Services for Finding, Exploring and Presenting Distributed Environmental Data Outline Prepared by Voyager Interest Group on Environmental Data Integration June 2002 Coordinated by CAPITA Supported by NSF, EPA and NOAA…..

Processes of the Information Value Chain Informing Knowledge ActionProductive Knowledge Information Organizing Grouping Classifying Formatting Displaying Analyzing Separating Evaluating Interpreting Synthesizing Judging Options Quality Advantages Disadvantages Deciding Matching goals, Compromising Bargaining Deciding CIRA VIEWSLangley IDEAAQ ManagerWG Summary Rpt Data (after Taylor, 1975) Examples:

Data Acquisition and Usage Value Chain Monitor Store Data 1 Monitor Store Data 2 Monitor Store Data n Virtual Int. Data Integrated Data 1 Integrated Data 2 Integrated Data n

Data Flow from Providers to AQ Management Environmental data arise from diverse sources, each having specific history, driving forces, delivery formats etc. Data analysis, i.e. turning the raw data into ‘actionable’ knowledge, requires combining data from these sources EPA Networks IMPROVE Visibility Satellites …… Surface Obs. Upper Air Satellites Forecasts …. National Inv. Local Invent. Satellite Fire… Aerosol Data Meteorological Emissions Data Status and Trends AQ Compliance Exposure Assess. Network Assess. Tracking Progress AQ Managm. Reports ‘Knowledge’ from Data Primary Data Diverse Providers Derived Data Filtered, Aggregated, Fused

Problems of Current Client-Server Architecture User Tasks: Fi nd servers Reformat data Custom process Server Client-Server Architecture: User accesses, processes and integrates data by customized tools User Problems in combining data: –Legacy data systems can not be altered to support integration –Data systems use different terms or meaning of similar terms –Some data sources, do not have a schema nor formal access methods User Carries the Burden In current client-server architectures the user carries much of the burden of integration.

Mediator-Based Integration Architecture (Wiederhold, 1992) Software agents (mediators) can perform many of the data integration chores Heterogeneous sources are wrapped by translation software local to global language Mediators (web services) obtain data from wrappers or other mediators and pass it on … Wrappers remove technical, while mediators resolve the logical heterogeneity The job of the mediator is to provide an answer to a user query (Ullman, 1997)Ullman, 1997 In database theory sense, a mediator is a view of the data found in one or more sources Wrapper Service User Query View Busse et. al, 1999

Value-Added Processing in Service Oriented Architecture Control Data Chain 1 Chain 2 Chain 3 Peer-to-peer network representation Data Service Catalog User Data, services and users are distributed throughout the network Users compose data processing chains form reusable services Intermediate data are also exposed for possible further use Chains can be linked to form compound value-adding processes Service chain representation User Tasks: Fi nd data and services Compose service chains Expose output Chain 2 Chain 1 Chain 3 Data Service User Carries less Burden In service-oriented peer-to peer architecture, the user is aided by software ‘agents’

Peer-to-Peer Computing (P2P) or Service-Oriented Architecture (??) P2P is an architectural principle based on decentralization and resource sharing It is replacing the current paradigm of client-server computing Data are passed directly from peer-to-peer, rather than through ‘data integrator’ servers The contribution of database community to P2P is the introduction of schemas DATAFED uses the multidimensional data cube as the global schema P2P was made popular by Napster; now it is an intense academic research topic Music files were all uniformly formatted MP3 files; science data need more descriptors Hence the challenge of scientific data sharing – even in P2P environment

4 D Geo-Environmental Data Cube (X, Y, Z, T) Environmental data represent measurements in the physical world which has space (X, Y, Z) and time (T) as its dimensions. The specific inherent dimensions for geo- environmental data are: Longitude X, Latitude Y, Elevation Z and DateTime T. Additional dimensions may include parameters, pollutant source, etc. The needs for finding, sharing and integration of geo-environmental data requires that data are coded in this multidimensional data space

4+ Dimensional Geo-Environmental Data Space

Environmental Data: Multi-Dimensional Data can be distributed over 1,2, …n dimensions 1 Dimensional e.g. Time dimension i j k j i Data Granule i 1 Dimensional e.g. Location & Time 1 Dimensional e.g. Location, Time & Parameter View 1 Data Space View 2 Views are generally orthogonal slices through multidimensional data cubes Spatial and temporal slices through the data space are most common

Common Views (slices) through 4D Data Space XY MAP: Z,T fixed Vertical Profile:XYT fixed Time Chart: X,Y,Z fixed Vertical Cross sect: YT fixedVertical Cross sect: XT fixed Vertical Profile Trend: X,Y fixed

Generic Data Flow in DATAFED Services DataView 1 DataProcessed Data Portrayed Data Process Data Portrayal/ Render Abstract Data Access View Wrapper Physical Data Abstract Data Physical Data Resides in autonomous servers; accessed by view- specific wrappers which yield abstract data ‘slices’ Abstract Data Abstract data slices are requested by viewers; uniform data are delivered by wrapper services DataView 2 DataView 3 View Data Processed and portrayed data are delivered to the user for building multi-layer views and web applications

Render Service Chaining in Spatio-Temporal Data Browser Spatial Slice Find/Bind Data nDim Data Cube Time Slice Time Portrayal Spatial PortrayalSpatial Overlay Time Overlay OGC-Compliant GIS Services Time-Series Services PortrayOverlay Homogenizer Catalog Wrapper Mediator Client Browser Cursor/Controller Maintain Data Vector GIS Data XDim Data SQL Table OLAP Satellite Images Data Sources

Overlay of multiple Datasets Each DataCube may have 0-n dimensions Each dimension is assigned a view 3 D DataCube 2 D DataCube DataView 3 Layer 2 Layer 1 DataView 1 DataView 2 In a view, the number of layers is the number of datasets If a DataCube does not have a data for a view, a Null Layer is assigned Null Layer

Overlay of multiple Datasets Each DataCube may have 0-n dimensions Each dimension is assigned a view DataView 3 DataView 1 DataView 2 In a view, the number of layers is the number of datasets If a DataCube does not have a data for a view, a Null Layer is assigned 3 D DataCube Data Access Connections Data Render Connections

Explain Wrappers

An Application Program: Voyager Data Browser The web-programs consists of a stable core and adoptive input/output layers The core maintains the state and executes the data selection, access and render services The adoptive, abstract I/O layers connects the core to evolving web data, flexible displays and to the a configurable user interface: –Wrappers encapsulate the heterogeneous external data sources and homogenize the access –Device Drivers translate generic, abstract graphic objects to specific devices and formats –Ports connect the internal parameters of the program to external controls –WDSL web service description documents Data Sources Controls Displays I/O Layer Device Drivers Wrappers App State Data Flow Interpreter Core Web Services WSDL Ports

Publish, Find, Bind Voyager Data Services Architecture Scatter Chart Text, Table Data View & Process Layered Map Cursor Homogeneous data used by Viewing and Processing services in ‘agile’ applications Heterogeneous data: different types, coding and access protocols Connects providers with users and transforms heterogeneous into homogeneous data Time Chart ProvidersUsers XML Web Services Satellite Vector GIS Data XDim Data OLAP Cube SQL Table HTTP, FTP Web Text OpenGIS Services Images Access Services Data & Tool Catalog Uniform Access/Retrieval

VOYAGER Web Services Layered Map Time Chart Vector GIS Data XDim Data SQL Tables Web Images Publish, Find, Bind Catalog, Data & Tools Uniform Access Scatter Chart S u p p o r tCoordination T e c h n o l o g i e s Users Select, Overlay, Explore; Multidimensional data Providers Maintain distributed data; Heterogeneous coding, access Voyager Web Services Homogenize data access Catalog, access, transform data C O M M U N I T Y

Browser Architecture (0312) View State MapView[1] Services Servicetype1[1] Servicetype2[1] Servicetype2[2] FlowProgram Controllers TimeView[1] Services FlowProgram Controllers MapView[1] TimeView [1] Controllers

Distributed Data Processing and Integration User Tasks: Fi nd servers Reformat data Custom process Server Client-Server Architecture: User accesses, processes and integrates data by customized tools User Control Data

DataFed Topology: Mediated Peer-to-Peer Network, MP2P MediatorPeers Mediated Peer-to Peer Network Broker maintains a catalog of accessible resources Peers find data and access instructions in the catalog Peers get resources directly from peer providers Google Example: Finding Images on Network Topology Google catalogs the images related to Network Topology User selects an image from the cached image catalogimage catalog User visits the provider web-page where the image residesweb-page SourceSource: Federal Standard 1037C

DataFed Topology: Mediated Peer-to-Peer Network, MP2P (Pretentious?) Google-DataFed Comparison Mediated Peer-to Peer Network DataFedGoogleSimilarityDifference Catalogs structured data Catalogs structured data from numerous sources e.g. daily TOMS satellite dataTOMS satellite Catalogs the WEB on data related to a topics, e.g. Network Topology img, HTMLimgHTML Data are cataloged, essence summarized and linked to provider DataFed catalogs only a subset of structured data; Google catalogs the WEB User selects data from the DataFed catalog DataFed catalog User selects an image from the cached image catalogimage catalog (some cashed) User can (1) display raw data (2) use generic viewer (3) custom view (4) XML Viewraw datageneric viewer custom view XML View User visits the provider web- page where the image residesweb- page

P2P Network

Distributed Data Processing and Integration Service Oriented Architecture: Users compose reconfigurable data processing chains form reusable services User Tasks: Fi nd data and services Compose service chains Expose output DataServices Catalog User User-composed service chains Control Data User Broker Peers Brokered Peer-to Peer Network

Data Focus Range Rendering Cursor Viewer Layers Dim1: Lon Dim2: Lat Data provided by each dimension of a View: Dim1.Type, Dim1.Min, Dim1.Max Dim2.Type, Dim2.Min, Dim2.Max …. Current Dim.Types: Latitude, Longitude, DateTime, Elevation