DATA ACCESS, QUERYING, ANALYSIS AND DATA MINING IN A DISTRIBUTED FRAMEWORK FOR EARTH SYSTEM SCIENCE SUPPORT Menas Kafatos * Center for Earth Observing and Space Research (CEOSR) George Mason University *on Behalf of the SIESIP Team GeoComputation 99
SCIENCE Seasonal to Interannual Earth Science Information Partner (SIESIP) Science Driver: Seasonal-Interannual Climate Variations, Predictability and Prediction
Multidisciplinary/Interdisciplinary Research Coupled atmosphere/ocean Effects on Biosphere Connection to Hydrological Cycle (tropical rainfall, convection, etc.) Multiple Phenomena ENSO Monsoons Teleconnections (effects at continental & sub-continental levels) Relation to Droughts, Event-driven Phenomena, etc. Multiple Time Scales Spans short-scale weather and longer-term climate variability Multi-Agency Data Sets (NASA, NOAA, …) Communities of Scientists (Data Providers and Users) Input being provided by Advisory Board with representation from S-I, TRMM, NSIPP, SCSMEX & IDS communities Seasonal-Interannual Climate
SIESIP Management Committee Science Advisory Board Federation Management & Members
SIESIP Federation Architecture User (Web) Internet GMU User (Web) Exchange Protocols COLA GDAAC Data Ingest Data Orders Data Ingest Data Orders Other Data Sources (e.g. NOAA) Interactive Operations Batch Operations Data Delivery Data Archiving
VDADC ENGINE (Current GMU Prototype) USER WEB BROWSER LOCAL STORAGE SEARCH ENGINE WORLD WIDE WEB VDADC ENGINE QUERY CONVERSION DATA CONVERSIO N (Images, Time Series, etc.) DATA RETRIEVAL Data Center 1 Data Center 2 Data Center N User Interface Java Applet SQL Query RDBMS (COTS) GODDARD DAAC Result interface DISCCD
Current SIESIP Data Sets
El Niño Effects on the U.S.
SIESIP Supports SCSMEX Data Analysis u SIESIP provides TRMM gridded, satellite coincidence data subsets, and GMS data for Field Campaign, seasonal & inter-annual analyses Data available at /TRMM_FE/scsmex/scsmex.html u SIESIP is producing TRMM SCSMEX data CD for international distribution at SCSMEX Science Team’s request
Tropical Cyclone Leo, 4/29/99 (TSDIS/GMU Orbit Viewer)
Climatology Interdisciplinary Data Collection (CIDC) (click on "Interdisciplinary"under DISCIPLINE SPECIFIC INFORMATION) Comes as a 4-CD-ROM set; in addition, all data is available free by electronic transfer. Over 70 Monthly Mean Global Climate Parameters - Land, Ocean, Sun, Cryosphere, Biosphere, Atmosphere. The CD-ROM set was produced in collaboration with the Center for Earth Observing and Space Research (CEOSR) at George Mason University with GrADS developed at the Center for Ocean Land Atmosphere Studies (COLA).
AVERAGE SEASONAL-CYCLE ESTIMATES FOR THE WORLD Archived are: climatologically averaged values of monthly and annual air temperature (T) and total precipitation (P) reinterpolated to a 0.5x0.5 degree grid, their associated cross-validation fields, and the climatic water balance computed at each grid point from T and P. Gridded datasets are archived on the SIESIP site, as well as on "climate.geog.udel.edu" under the userid "siesip" (password available on request) AVERAGE SEASONAL-CYCLE ESTIMATES FOR SOUTH AMERICA Archived are: climatologically averaged values of monthly and annual air temperature (T) and total precipitation (P) interpolated to a 0.5x0.5 degree grid, and their associated cross- validation fields. Genesis of Available Gridded Datasets a) Average monthly station T and P drawn from station climatology archives, spatially interpolated to each grid. b) Average monthly station T drawn from station climatology archives, spatially interpolated to each grid point using DEM-aided interpolation MONTHLY TIME-SERIES ESTIMATES FOR SOUTH AMERICA Archived are: monthly total precipitation (P) and average air temperature (T) interpolated to a 0.5x0.5 degree grid & associated cross-validation fields.
INFORMATION TECHNOLOGY STRATEGY u Development of science scenarios to serve particular user communities u Web accessibility u Development of user queries u Integration of tools accessibility with data set accessibility to allow meaningful, user-specified queries u Integration of freely/easily accessible analysis tool (GrADS); on-line visualization; data mining (pyramid); with metadata searches (XML and relational data base management systems)
Three-Phase Data Access Model u Phase 1: A user browses and searches the “static” (or description) metadata and content- based metadata provided by the SIESIP system u Phase 2: The user gets a quick look of the contents of the data through on-line data analysis u Phase 3: The user has located the data of interest and then orders the data u It is an interactive and iterative process
COLA IT: GrADS u Integrated User Interface Already in Place for –Selecting, Accessing, and Sampling Data Sets (grids, stations, future - images) –Computing and Deriving New Quantities –Quantitatively Visualizing of Results u Designed to Handle Geophysical Data Sets u Thousands of Users Worldwide
El Ni ñ o 1982/83 El Niño Event in March 1983 Sea Surface Temperature Anomaly (SSTA) and Wind Field High values of SSTA are found near the west coast of S. America Trade winds have dissipated Display using GrADS
SIESIP: Distributed Seasonal-Interannual Data System (Implementation Example) GrADS Server NOAA Data GrADS Server GrADS Server NASA Data GrADS Analysis Workbench Class Libraries J-GrADS Class Libraries SIESIP Data Sets SWIL Local NOAA Server Internet DODS MetaData Server Data Pyramid Server Datamining Interface Applet/Plug-In ContentBrowsing Analysis Data Order Applet/Plug-In Data Order GUI HTML Data Order Server Data Pyramid Metadata User Interface Driver 1 Inter- Operability Wrapper MetaData Search HTML/CGI Data and Metadata Systems on the Internet Outside of SIESIP Internet
Phenomenon Instance Predefined Region Cell Value Specific Parameter Data Product Contact Data File E-R Diagram for SIESIP Parameter Platform Instrument Data Format Temporal Coverage Altitude Coverage Cell
Pyramid Data Model u Motivation -- to support the interactive content-based browsing of large volumes of data u For example, queries on the statistical properties of the data can be used in a content-based browsing process u The challenge in query processing performance for large data volumes u Solution -- to speed up query evaluations by precomputing intermediate results which contribute to answering user queries. u What kind of precomputations? & How to apply them?
Precomputed Data Attributes u Query evaluation performance can be improved through precomputation ( i.e. precompute the predefined data attributes which contribute to query evaluations) and approximation ( i.e. query answers could be derived approximately based on the precomputed data attributes) u Choosing what kind of precomputed data attributes vary with the types of queries to be answered, which further depend on specific domain applications
SIESIP GUI
Data Interoperability SIESIP is one of DODS data server sites. GrADS has been added to the DODS suite of client software. DODS data access enabled through SIESIP GUI interface. COLA ftp data access enabled though SIESIP GUI interface GrADS as part of DODS server -To manipulate DODS data before transferring -To support more data types and data formats