1 GFDL Data Portal Current Status, Achievements and Future Development NOAATECH-2006 K.Dixon, V.Balaji, S.Nikonov GFDL, Princeton.

Slides:



Advertisements
Similar presentations
LEAD Portal: a TeraGrid Gateway and Application Service Architecture Marcus Christie and Suresh Marru Indiana University LEAD Project (
Advertisements

Metadata Development in the Earth System Curator Spanning the Gap Between Models and Datasets Rocky Dunlap, Georgia Tech.
1 OBJECTIVES To generate a web-based system enables to assemble model configurations. to submit these configurations on different.
1 Software & Grid Middleware for Tier 2 Centers Rob Gardner Indiana University DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National.
Implementation of Web Service Technologies in GFDL's FMS Runtime Environment Y. Malysheva, S. Nikonov, V. Balaji GFDL The 7 th GO-ESSP Workshop September.
Interpret Application Specifications
Homework 2 In the docs folder of your Berkeley DB, have a careful look at documentation on how to configure BDB in main memory. In the docs folder of your.
Hyrax Installation and Customization Dan Holloway James Gallagher.
Overview of the ODP Data Provider Sergey Sukhonosov National Oceanographic Data Centre, Russia Expert training on the Ocean Data Portal technology, Buenos.
QCDgrid Technology James Perry, George Beckett, Lorna Smith EPCC, The University Of Edinburgh.
Apache Airavata GSOC Knowledge and Expertise Computational Resources Scientific Instruments Algorithms and Models Archived Data and Metadata Advanced.
GRAPPA Part of Active Notebook Science Portal project A “notebook” like GRAPPA consists of –Set of ordinary web pages, viewable from any browser –Editable.
The Asset Inventory Management module assists with data collection and discovery management processes. Collected information is interpreted and automatically.
PHP With Oracle 11g XE By Shyam Gurram Eastern Illinois University.
10/6/2015 ©2007 Scott Miller, University of Victoria 1 2a) Systems Introduction to Systems Introduction to Software Systems Rev. 2.0.
Center-to-Peer-to-Center A model for building maximal value from peer services.
ESP workshop, Sept 2003 the Earth System Grid data portal presented by Luca Cinquini (NCAR/SCD/VETS) Acknowledgments: ESG.
Configuration Management (CM)
QCDGrid Progress James Perry, Andrew Jackson, Stephen Booth, Lorna Smith EPCC, The University Of Edinburgh.
Guide to Linux Installation and Administration, 2e1 Chapter 2 Planning Your System.
Contents 1.Introduction, architecture 2.Live demonstration 3.Extensibility.
The PROGRESS Grid Service Provider Maciej Bogdański Portals & Portlets 2003 Edinburgh, July 14th-17th.
Database Design and Management CPTG /23/2015Chapter 12 of 38 Functions of a Database Store data Store data School: student records, class schedules,
Resource Brokering in the PROGRESS Project Juliusz Pukacki Grid Resource Management Workshop, October 2003.
“curator” DB design Curator meeting, GFDL, Sep 20.
Andrey Meeting 7 October 2003 General scheme: jobs are planned to go where data are and to less loaded clusters SUNY.
CLASS Information Management Presented at NOAATECH Conference 2006 Presented by Pat Schafer (CLASS-WV Development Lead)
IODE Ocean Data Portal - ODP  The objective of the IODE Ocean Data Portal (ODP) is to facilitate and promote the exchange and dissemination of marine.
GO-ESSP Workshop, LLNL, Livermore, CA, Jun 19-21, 2006, Center for ATmosphere sciences and Earthquake Researches Construction of e-science Environment.
Web Portal Design Workshop, Boulder (CO), Jan 2003 Luca Cinquini (NCAR, ESG) The ESG and NCAR Web Portals Luca Cinquini NCAR, ESG Outline: 1.ESG Data Services.
Microsoft Virtual Academy. STANDARDIZATION SELF SERVICEAUTOMATION Give Customers of IT services the ability to identify, access and request services.
The CERA2 Data Base Data input – Data output Hans Luthardt Model & Data/MPI-M, Hamburg Services and Facilities of DKRZ and Model & Data Hamburg,
NQuery: A Network-enabled Data-based Query Tool for Multi-disciplinary Earth-science Datasets John R. Osborne.
A Data Access Framework for ESMF Model Outputs Roland Schweitzer Steve Hankin Jonathan Callahan Kevin O’Brien Ansley Manke.
Med-CORDEX database Med-CORDEX database = = netcdf files+ their info = File System + relational database = XFS+ mysql db = file server + LAMP server Linux,
H. Widmann (M&D) Data Discovery and Processing within C3Grid GO-ESSP/LLNL / June, 19 th 2006 / 1 Data Discovery and Basic Processing within the German.
FRErator – the Bridge between FRE and Curator DB.
1 Adventures in Web Services for Large Geophysical Datasets Joe Sirott PMEL/NOAA.
DSpace System Architecture 11 July 2002 DSpace System Architecture.
Curator: Gap Analysis (from a schema perspective) Rocky Dunlap Spencer Rugaber Georgia Tech.
Building Community and Capability through Common Infrastructure: ESMF and the Earth System Curator Cecelia DeLuca MAP Meeting College.
AHM04: Sep 2004 Nottingham CCLRC e-Science Centre eMinerals: Environment from the Molecular Level Managing simulation data Lisa Blanshard e- Science Data.
IPS Infrastructure Technological Overview of Work Done.
Global ADC Job Monitoring Laura Sargsyan (YerPhI).
Module 1: Introduction to Microsoft SQL Server Reporting Services
Pavel Nevski DDM Workshop BNL, September 27, 2006 JOB DEFINITION as a part of Production.
Climate-SDM (1) Climate analysis use case –Described by: Marcia Branstetter Use case description –Data obtained from ESG –Using a sequence steps in analysis,
What was done for AR4. Software developed for ESG was modified for CMIP3 (IPCC AR4) Prerelease ESG version 1.0 Modified data search Advance search Pydap.
1 2.5 DISTRIBUTED DATA INTEGRATION WTF-CEOP (WGISS Test Facility for CEOP) May 2007 Yonsook Enloe (NASA/SGT) Chris Lynnes (NASA)
GFDL Data Portal Update: Curator DB Approach S.Nikonov, V.Balaji, K.Dixon GFDL The 5 th GO-ESSP Workshop June , LLNL.
Simulation Production System Science Advisory Committee Meeting UW-Madison March 1 st -2 nd 2007 Juan Carlos Díaz Vélez.
Scenario use cases Szymon Mueller PSNC. Agenda 1.General description of experiment use case. 2.Detailed description of use cases: 1.Preparation for observation.
Hydroinformatics Lecture 15: HydroServer and HydroServer Lite The CUAHSI HIS is Supported by NSF Grant# EAR CUAHSI HIS Sharing hydrologic data.
Scientific Linux Inventory Project (SLIP) Troy Dawson Connie Sieh.
A Web Based Job Submission System for a Physics Computing Cluster David Jones IOP Particle Physics 2004 Birmingham 1.
Embedding Live Access Server into GFDL Data Portal Infrastructure K.O’Brien (PMEL), S.Nikonov (GFDL), R.Schweitzer (PMEL), S.Hankin (PMEL), V.Balaji (GFDL)
Scientific Data Processing Portal and Heterogeneous Computing Resources at NRC “Kurchatov Institute” V. Aulov, D. Drizhuk, A. Klimentov, R. Mashinistov,
PLM, Document and Workflow Management
VI-SEEM Data Discovery Service
CUAHSI HIS Sharing hydrologic data
Software Configuration Management
Deploying and Configuring SSIS Packages
Cloud based Open Source Backup/Restore Tool
LCG Monte-Carlo Events Data Base: current status and plans
Solutions: Backup & Restore
HYCOM CONSORTIUM Data and Product Servers
Synthesizing knowledge During Project
Introduction to D4Science
Metadata Development in the Earth System Curator
Features Overview.
Presentation transcript:

1 GFDL Data Portal Current Status, Achievements and Future Development NOAATECH-2006 K.Dixon, V.Balaji, S.Nikonov GFDL, Princeton

2  Data Portal was launched in 1995 as simple ftp server.  The idea and the term “Data Portal” arose 3 years ago.  Originally it served data by occasional requests.  Now the main assets are IPCC data. History NOAATECH-2006

3 Common technical characteristics Software  Red Hat Linux  Apache Web Server  DODS Aggregation Server  THREDDS  LAS Server  GrADS-DODS NOAATECH-2006

4 Hardware  Dell Power Edge 2650 machine  Dual Processor Intel Xeon 2.4 GHz  3 GB RAM  7 Dell Power Vault 220S with 14 HDs in each, 19 TB total (expansion pending up to 35 TB) 14 HDs in each, 19 TB total (expansion pending up to 35 TB)  Network bandwidth: internet – 9 Mbit/s internet-2 – 100 Mbit/s NOAATECH-2006

5 WEB Site Structure NOAATECH-2006

6 Basic Metadata  Model description  Experiment description  Institution  Extra metadata for treating tripolar grids (including ferret scripts for their visualization) visualization)  Metadata is compliant with standard CF  Metadata accompanies each data file NOAATECH-2006

7  Dynamic data presentation chosen by user  Spatial/time subsampling with included metadata  Defining on a fly new variables calculated by given formula  ferret visualization NOAATECH-2006 Basic features GFDL LAS server Basic features GFDL LAS server

8 General Statistics 01-Oct-2004 to 01-Oct-2005  Total amount of CM2 Climate Model Data: 12 TB  More then NetCDF files, average file size: 1 GB  Successful requests: ~62,000  Average successful requests per day: ~200  Distinct files requested: 5,000  Distinct hosts served: ~850  Data transferred: 15 TB  Average data transferred per day: ~42 GB  Number of journal articles submitted that include analyses of GFDL CM2 model output: > 100 NOAATECH-2006

9 Current standard procedure of publishing data  Climate Model Output Rewriter (CMOR) processing  manual configuring for different models, experiments, variables  triggered manually  Quality Control  made by scientist, includes checking metadata, time ranges, values diapasons, etc.  Splitting up CMORized, QC-ed data into small (<2GB) NCDF files and pushing them out of firewall to Data Portal  manual configuring scripts doing this  starting scripts manually  Preparing checksum report on Data Portal  running cron started script  Configuring Aggregation Server and LAS  made manually NOAATECH-2006

10 Current Data Portal workflow NOAATECH-2006

11 Desirable Features of Data Portal  Relational Database storing metadata with description of  model components and model configuration  scenarios  postprocessing (model output and CMOR)  experiments  variables  formulized rules of Quality Control  data locations in Archive  task scheduler  users and groups accounts  XML as data exchange format  for compliance with FMS Runtime Environment (FRE)  working format of existing third party software  good fitted for hierarchical metadata description  prevalent in world, easy to exchange with others Data Portals  Publisher Control Center (PCC)  controls CMOR subsystem  controls Data Publisher Manager  controls data quality (QAC) NOAATECH-2006

12 Desirable Features of Data Portal (continue)  Climate Model Output Rewriter (CMOR) subsystem  prepares data consistently with specific project requirements  Data Publisher Manager  transfers data to target destination in accordance to settings from DB  Front-end Data Portal Software Package  Configuration Manager (configures Aggregation Server and Data Portal Interface)  Search Catalog Engine  Data Subsampling Engine  Data Computation Engine  Data Visualization  Data Delivery Manager NOAATECH-2006

13 Proposed functionality schema of ‘GFDL Data Factory’ NOAATECH-2006

14 Standard scenario of functioning Model Data Factory (ideal picture)  Scientist builds model in existing GFDL FMS Runtime Environment System (FRE) using available model components, datasets and forcing scenario.  FRE puts metadata about built model, scenario, experiment into “curator” DB and runs experiment;  Postprocessing subsystem extracts metadata about postprocessing plan from “curator” DB and executes it, and on finish puts metadata about processed experiment back into DB.  Data Publisher (DP) regularly checks “curator” DB for new experiments marked as “public” and if finds any invokes CMOR.  CMOR goes to “curator” DB for metadata and processes needed data following metadata instructions.  DP calls QAC and then transfers data to Data Portal storage.  Configuration Manager configures Aggregation Server and Data Portal Interface and puts records about new public data in “curator” DB.  End of process, data is ready to go. NOAATECH-2006

15 Database Compartments:  Model Metadata Compartment contains models’ descriptions, allows to build coupled model of needed configuration contains models’ descriptions, allows to build coupled model of needed configuration  Variables Compartment List of all related physical variables List of all related physical variables  Workflow Compartment contains scenarios, experiments, institutions, projects and users info contains scenarios, experiments, institutions, projects and users info  Postprocessing Compartment defines postprocessing plan for conducting experiment defines postprocessing plan for conducting experiment  Data Portal Compartment contains info about experiment data contains info about experiment data Database ‘curator’ design Database ‘curator ’ design NOAATECH-2006

16 Interaction between compartments NOAATECH-2006

17 MySQL DB CURATOR NOAATECH-2006

18 Model Metadata Compartment (in development) Coupled_Models Model_List Component_Medias Models Experiments Workflow Compartment Variables Variables Compartment NOAATECH-2006

19 Data Samples from Model Compartment Components_Medias Coupled_Models Model_List Models NOAATECH-2006

20 Variables Compartment Projects Workflow Compartment Variables Variable_Bundles Variable_Lists Variable_List_Contents Proj_Var_Names NOAATECH-2006

21 Variable_Lists Variable_List_Contents Data Sample from Variables Compartment Proj_Var_Names Variables Variable_Bundles NOAATECH-2006

22 Workflow Compartment InstitutionsGFDL_USERS Experiment_Status Realization Projects Experiments Scenarios NOAATECH-2006

23 Data Samples from Workflow Compartment Experiments Scenarios NOAATECH-2006

24 Coupled_Models Postprocessing Compartment PP_Units Post_Proc PP_Content Data Samples from Postprocessing Compartment PP_Units PP_Content Variable_Lists Projects GFDL_USERS Average_Periods NOAATECH-2006

25 Data Portal Compartment MissedData_Descriptors Data_GridsData_Files Variables Experiments Variable_Bundles Coupled_Models NOAATECH-2006

26 Data Samples from Data Portal Compartments Data_Files Data_Grids MissedData_Descriptors NOAATECH-2006

27 Curator DB on Data Portal stream  Curator DB is already used on GFDL Data Portal.  JSP technology with servlets on backend was applied  New data transferred onto Data Portal is automatically registered in Curator DB with all accompanied metadata.  It turned out the fastest way to search for data on Data Portal: CM2.0 CM2.0CM2.0 CM2.1 CM2.1CM2.1 NOAATECH-2006

28 Another Aspects of Future Development  Set up model metadata schema standards in scientific community and develop SQL metadata schema.  Populate Curator with real metadata extracted from GFDL models.  Conjugate Curator DB with GFDL FMS Modeling System  Customize LAS server to use the Curator DB  Design user interfaces NOAATECH-2006

29 END ENDQuestions?Thanks! NOAATECH-2006