ESSE Environmental Scenario Search Engine for the Data Services Grid Mikhail Zhizhin, Geophysical Center Russian Academy of Sciences Eric Kihn,

Slides:



Advertisements
Similar presentations
Space Physics Interactive Data Resource – SPIDR :Dr. ZHI N, Mik hail Dr. ZHI ZHI N, Mik hail (Ge oph ysic al Cen ter Rus sian Aca d. Sci. ) Dr. KIH N,
Advertisements

RAMADDA for Big Climate Data Don Murray NOAA/ESRL/PSD and CU-CIRES Boulder/Denver Big Data Meetup - June 18, 2014.
NCAR GIS Program : Bridging Gaps
CPSC 695 Future of GIS Marina L. Gavrilova. The future of GIS.
THORPEX-Pacific Workshop Kauai, Hawaii Polar Meteorology Group, Byrd Polar Research Center, The Ohio State University, Columbus, Ohio David H. Bromwich.
ERAU Weather Archive Status Update 6 October 2009.
Development of a Community Hydrologic Information System Jeffery S. Horsburgh Utah State University David G. Tarboton Utah State University.
Looking Forward Mike Goodchild. Where is ESRI going? 9.0 –massively expanded toolbox –script management and metadata –Python, JScript, Perl –visual modeling.
NASA Derived and Validated Climate Data For RETScreen Use: Description and Access Paul Stackhouse NASA Langley Research Center Charles Whitlock, Bill Chandler,
SpaceGRID and EGSO Satu Keski-Jaskari Maria Vappula Parallal Computing – Seminar
The International Surface Pressure Databank (ISPD) and Twentieth Century Reanalysis at NCAR Thomas Cram - NCAR, Boulder, CO Gilbert Compo & Chesley McColl.
Активное распределенное хранилище для многомерных массивов Дмитрий Медведев ИКИ РАН.
UNIVERSITY of MARYLAND GLOBAL LAND COVER FACILITY High Performance Computing in Support of Geospatial Information Discovery and Mining Joseph JaJa Institute.
INFSO-RI Enabling Grids for E-sciencE Intelligent Distributed Data Management in Earth system science K. Ronneberger, DKRZ, Germany.
World Renewable Energy Forum May 15-17, 2012 Dr. James Hall.
Introduction Downloading and sifting through large volumes of data stored in differing formats can be a time-consuming and sometimes frustrating process.
October 16-18, Research Data Set Archives Steven Worley Scientific Computing Division Data Support Section.
Coordinated Energy and water-cycle Observations Peroject A Well Organized Data Archive System Data Integrating/Archiving Center at University of Tokyo.
EARTH SCIENCE MARKUP LANGUAGE “Define Once Use Anywhere” INFORMATION TECHNOLOGY AND SYSTEMS CENTER UNIVERSITY OF ALABAMA IN HUNTSVILLE.
Chapter 9: Weather Forecasting
Research Data at NCAR 1 August, 2002 Steven Worley Scientific Computing Division Data Support Section.
DISTRIBUTED DATA FLOW WEB-SERVICES FOR ACCESSING AND PROCESSING OF BIG DATA SETS IN EARTH SCIENCES A.A. Poyda 1, M.N. Zhizhin 1, D.P. Medvedev 2, D.Y.
, Implementing GIS for Expanded Data Accessibility and Discoverability ASDC Introduction The Atmospheric Science Data Center (ASDC) at NASA Langley Research.
Water Web Services David R. Maidment Center for Research in Water Resources University of Texas at Austin Open Waters Symposium Delft, the Netherlands.
Moving Beyond IGY: An electronic Geophysical Year (eGY) Concept D.N. Baker Laboratory for Atmospheric and Space Physics University of Colorado - Boulder.
1 T.C. TURKISH STATE METEOROLOGİCAL SERVICE DEPARTMENT OF RESEARCH AND INFORMATION TECHNOLOGIES METEOROLOGICAL DATA MANAGEMENT Mustafa Sert October 2011.
[The Virtual Radiation Belt Observatory] Bob Weigel (George Mason University) Software: Eric Kihn (NOAA/NGDC, ViRBO Web and API) Mikhail Zhizhin (RFO,
Introduction to Apache OODT Yang Li Mar 9, What is OODT Object Oriented Data Technology Science data management Archiving Systems that span scientific.
Scientific Investigations; Support from Research Data Archives for Joint Office for Science Support 26 February, 2002 Steven Worley SCD/DSS.
Mathematics and Computer Science & Environmental Research Divisions ARGONNE NATIONAL LABORATORY Regional Climate Simulation Analysis & Vizualization John.
Victoria, May 2006 DAL for theorists: Implementation of the SNAP service for the TVO Claudio Gheller, Giuseppe Fiameni InterUniversitary Computing Center.
EARTH SCIENCE MARKUP LANGUAGE Why do you need it? How can it help you? INFORMATION TECHNOLOGY AND SYSTEMS CENTER UNIVERSITY OF ALABAMA IN HUNTSVILLE.
1 Global Systems Division (GSD) Earth System Research Laboratory (ESRL) NextGen Weather Data Cube Chris MacDermaid October, 2010.
Integrated Model Data Management S.Hankin ESMF July ‘04 Integrated data management in the ESMF (ESME) Steve Hankin (NOAA/PMEL & IOOS/DMAC) ESMF Team meeting.
Integrated Grid workflow for mesoscale weather modeling and visualization Zhizhin, M., A. Polyakov, D. Medvedev, A. Poyda, S. Berezin Space Research Institute.
June 20-22, nomads.ncdc.noaa.gov Being developed and integrated to provide one-stop.
INTRODUCTION TO GEOGRAPHICAL INFORMATION SCIENCE RSG620 Week 1, Lecture 2 April 11, 2012 Department of RS and GISc Institute of Space Technology, Karachi.
, Key Components of a Successful Earth Science Subsetter Architecture ASDC Introduction The Atmospheric Science Data Center (ASDC) at NASA Langley Research.
ESIP Federation 2004 : L.B.Pham S. Berrick, L. Pham, G. Leptoukh, Z. Liu, H. Rui, S. Shen, W. Teng, T. Zhu NASA Goddard Earth Sciences (GES) Data & Information.
Modern Era Retrospective-analysis for Research and Applications: Introduction to NASA’s Modern Era Retrospective-analysis for Research and Applications:
Data Discovery and Access to The International Surface Pressure Databank (ISPD) 1 Thomas Cram Gilbert P. Compo* Doug Schuster Chesley McColl* Steven Worley.
NIST Data Science SymposiumMarch 4, 2014 NIST Data Science SymposiumMarch 4, Climate Archives in NOAA: Challenges and Opportunities March 4, 2014.
Artificial Intelligence Research Center Pereslavl-Zalessky, Russia Program Systems Institute, RAS.
Correlation of temperature with solar activity (SSN) Alexey Poyda and Mikhail Zhizhin Geophysical Center & Space Research Institute, Russian Academy of.
NQuery: A Network-enabled Data-based Query Tool for Multi-disciplinary Earth-science Datasets John R. Osborne.
CCGrid, 2012 Supporting User Defined Subsetting and Aggregation over Parallel NetCDF Datasets Yu Su and Gagan Agrawal Department of Computer Science and.
Slide 1 GO-ESSP Paris. June 2007 Slide 1 (TIGGE and) the EU Funded BRIDGE project Baudouin Raoult Head of Data and Services Section ECMWF.
NOAAServer: Unified access to distributed NOAA data Ernest Daddio, NOAA/ESDIM Steve Hankin, NOAA/PMEL Donald Denbo, NOAA/PMEL/JISAO Nancy Soreide, NOAA/PMEL.
Distributed Data Analysis & Dissemination System (D-DADS ) Special Interest Group on Data Integration June 2000.
SEEK Science Environment for Ecological Knowledge l EcoGrid l Ecological, biodiversity and environmental data l Computational access l Standardized, open.
IDV Perspective: Climatology of the 2005 Hurricane Season Shelley O. Holmberg University of North Carolina at Charlotte, Charlotte, NC Brian J. Etherton.
SCD Research Data Archives; Availability Through the CDP About 500 distinct datasets, 12 TB Diverse in type, size, and format Serving 900 different investigators.
GIS for Atmospheric Sciences and Hydrology By David R. Maidment University of Texas at Austin National Center for Atmospheric Research, 6 July 2005.
Welcome to the PRECIS training workshop
The HDF Group January 8, ESIP Winter Meeting Data Container Study: HDF5 in a POSIX File System or HDF5 C 3 : Compression, Chunking,
AIRS/AMSU-A/HSB Data Subsetting and Visualization Services at GES DAAC Sunmi Cho, Jason Li, Donglian Sun, Jianchun Qin and Carrie Phelps, Code 902, NASA.
Data Discovery and Access to The International Surface Pressure Databank (ISPD) 1 Thomas Cram Gilbert P. Compo* Doug Schuster Chesley McColl* Steven Worley.
InSilicoLab – Grid Environment for Supporting Numerical Experiments in Chemistry Joanna Kocot, Daniel Harężlak, Klemens Noga, Mariusz Sterzel, Tomasz Szepieniec.
Proposal of Satellite Data Center India Meteorological Department A.K.Sharma (Chairman), Virendera Singh (Member), R.K.Giri (Member) and N.Puviarasan (Member.
SNPP data access for agricultural monitoring
PRP Weather Data Transfer PRPv2 Workshop February 21, 2017
Zhong Liu George Mason University and NASA GES DISC
MERRA Data Access and Services
Remote Sensing ET Online Tool Ted W. Sammis1, Junming Wang1, Vince P
TIGGE Archives and Access
NA4 F2F Meeting ES Cluster
IDV Perspective: Climatology of the 2005 Hurricane Season
GeoFEST tutorial What is GeoFEST?
Research Data Archives at NCAR
Data Curation in Climate and Weather
Presentation transcript:

ESSE Environmental Scenario Search Engine for the Data Services Grid Mikhail Zhizhin, Geophysical Center Russian Academy of Sciences Eric Kihn, National Geophysical Data Center NOAA

Geophysical Center Russian Academy of Sciences World Data Centers for Solid Earth and Solar- Terrestrial Physics Environmental data archives – paper, tapes, files, databases, e-journals… International network for geophysical data exchange with the US, Japan, China, … Computer center, Linux cluster, fiber optics Part of the European GRID infrastructure EGEE, Russian GRID Virtual Organization e-Earth

50 years ago – International Geophysical Year – IGY1957 Total data volume ~ 1 Gb Exchange ~ 1 Mb/year

Yesterday – databases, Internet, web – Y2K Total data volume ~ 1 Tb Exchange ~ 1 Gb/year

Tomorrow – Electronic Geophysical Year – EGY2007 Total data volume ~ 1 Pb Exchange ~ 1 Tb/year

Boulder Moscow Kamchatka Nagoya SydneyGrahamstown SPIDR – Space Physics Interactive Data Resource SPIDR 3 SPIDR 2 Beijing

Cross-disciplinary data exchange Users need data from different disciplines Rapid growth of the data volume and data demand requires new tools for the data management and the data mining

“Metcalfe’s law” for databases The utility of N independent data sets seems to increase super-linearly One can find N(N-1) ≈ N 2 relations between data sources, that is their utility grows ≈ N 2 It is more efficient ot use several data sources than one archive

Sources of data inflation? 1.New versions 2.Derived data products 3.Reanalysis Products of Level 1 (NASA terminology) take 10% of the Level 0 volume, but the number of the Level 1 products is increasing. If the volume of the Level 0 data grows as N, then the volume of Level 1 data is growing as N 2.

Observations + Model = Reanalysis 1.Direct observations, including raw and processed data, e.g. meteorological station or satellite. 2.Numerical model “knows” physics, uses direct observations as boundary values, e.g. Global Circulation Model. Input data volume (irregular grid) is less than the output volume (regular grid). 3.Reanalysis – accumulated output of the numerical model runs based on the direct observations for a long time period, say 50 years.

D-day reanalysis – morning (after ECMWF) June 6 th, 1944, midnightJune 6 th, 1944, 6 AM

D-day reanalysis – evening (after ECMWF) June 6 th, 1944, 6 PMJune 6 th, 1944, 12 AM

Data inflation after reanalysis Modern global atmospheric circulation model (GCM) at 2.5 o (latitude) x 2.5 o (longitude) x 20 (levels) = 10 6 gridpoints. GCM outputs "high-frequency" data every six hours of simulation time, so ~ 1 Gb of data per simulation day. By contrast, the world-wide daily meteorological observational data collected over the Global Telecommunications System, is ~ 200 Mb. As an extreme, to run the GCM for 50 years of simulation time will provide 40 Tb of data.

Input: ground and satellite data from SPIDR Space weather numerical models Output: high-resolution representation of the near-Earth space Space Weather Reanalysis

ESSE solutions Do not use data files, use distributed databases Optimize data model for the typical data request Virtualize data sources using grid (web) services Metadata schema describes parameters, grids, formulas for virtual parameters (e.g., wind speed from U- and V-wind) Search for events in the environment by the “scenario” in natural language terms Translate the scenario into the parallel request to the databases using fuzzy logic

ESSE architecture Fuzzy logic engine performs searching and statistical analysis of the distribution of the identified events Parallel mining of several distributed data sources, possibly from different subject areas Both the fuzzy logic engine and data sources implemented as Grid (web) services Interfaces and data structures can be obtained from the definitions of the web-services (WSDL) Web services and prototype user interface are installed on two mirror servers: −Boulder, US −Moscow, Russia

Parallel database cluster (NCEP reanalysis)

ESSE “time series” data model Indexed lat-lon grids of time series in BLOBs

What is fuzzy logic? Fuzzy logic uses set membership values between and including 0 and 1, allowing for partial membership in a set. Fuzzy logic is convenient for representing human linguistic terms and imprecise concepts (“slightly”, “quite”, “very”). Fuzzy membership functions

What good is fuzzy logic for ESSE? Fuzzy engine allows to build queries in human linguistic terms: (VERY LARGE “wind speed") AND (AVERAGE "surface temperature") AND (“relative humidity“ ABOUT 60%) You can use the same terms for different value ranges: AVERAGE TEMPERATURE for Africa is not the same as for Syberia. Results are given as a list of “most likely” events. Each event is assigned a value, representing its “likeliness”.

“High” Wind “Average” Temperature “About” 60% Humidity

Prototype workflow and UI Prototype UI implemented as a web-application Discover data sources by keyword-based metadata search Use predefined weather events (e.g. “ice storm”, “flood”) Define the event as a combination of fuzzy conditions on a set of environmental parameters (e.g. “high temperature and low relative humidity”) Review statistics for the detected events Visualize the selected event as time series plots or contour maps Download the event data in self-describing format (NetCDF or HDF) to the user’s workstation

Setting spatial locations Select a set of "probes" (representing spatial locations of interest, e.g. New York) where the desired event may occur.

Defining fuzzy search criteria Select several parameters for the event from a list. Set the fuzzy constraints on the parameters for the event (e.g. “very high temperature”, “very high humidity”).

Working with scenarios The user may search for a desired scenario by describing several subsequent events

Search Results “Score” represents the “likeliness” of each event in a numerical form. The results page provides links to visualization and data export pages.

Visualizing event as time series

Visualizing event in 5D

Visualizing event from satellites

What do we get at the end? Using the “time machine”, we can see the weather on the D-day, or the Rita hurricane, or the typical September day in San Diego. Statistics to estimate risk from natural disasters, global climate change, realistic weather in movies, computer games, simulators When Tim Berners-Lee uses semantic web to find a photo of the Eiffel Tower on a sunny summer day, ESSE can provide a list of sunny days to be merged with the list of images named with “eiffel”