GFDL Data Portal Update: Curator DB Approach S.Nikonov, V.Balaji, K.Dixon GFDL The 5 th GO-ESSP Workshop June 19-21 2006, LLNL.

Slides:



Advertisements
Similar presentations
Network II.5 simulator ..
Advertisements

Testing Relational Database
Database System Concepts and Architecture
Metadata Development in the Earth System Curator Spanning the Gap Between Models and Datasets Rocky Dunlap, Georgia Tech.
Tag line, tag line Perforce Benchmark with PAM over NFS, FCP & iSCSI Bikash R. Choudhury.
1 OBJECTIVES To generate a web-based system enables to assemble model configurations. to submit these configurations on different.
Implementation of Web Service Technologies in GFDL's FMS Runtime Environment Y. Malysheva, S. Nikonov, V. Balaji GFDL The 7 th GO-ESSP Workshop September.
23/04/2008VLVnT08, Toulon, FR, April 2008, M. Stavrianakou, NESTOR-NOA 1 First thoughts for KM3Net on-shore data storage and distribution Facilities VLV.
The new The new MONARC Simulation Framework Iosif Legrand  California Institute of Technology.
Overview of Data Management solutions for the Control and Operation of the CERN Accelerators Database Futures Workshop, CERN June 2011 Zory Zaharieva,
Research on cloud computing application in the peer-to-peer based video-on-demand systems Speaker : 吳靖緯 MA0G rd International Workshop.
Metadata Creation with the Earth System Modeling Framework Ryan O’Kuinghttons – NESII/CIRES/NOAA Kathy Saint – NESII/CSG July 22, 2014.
FIX Repository based Products Infrastructure for the infrastructure Presenter Kevin Houstoun.
Rational Unified Process Fundamentals Module 4: Disciplines II.
Cluster currently consists of: 1 Dell PowerEdge Ghz Dual, quad core Xeons (8 cores) and 16G of RAM Original GRIDVM - SL4 VM-Ware host 1 Dell PowerEdge.
Metadata for the Coupled Ocean/Atmosphere Mesoscale Prediction System (COAMPS) using the Earth System Modeling Framework (ESMF) Peter Bosler University.
NE II NOAA Environmental Software Infrastructure and Interoperability Program Cecelia DeLuca Sylvia Murphy V. Balaji GO-ESSP August 13, 2009 Germany NE.
A Metadata Based Approach For Supporting Subsetting Queries Over Parallel HDF5 Datasets Vignesh Santhanagopalan Graduate Student Department Of CSE.
Configuration Management (CM)
ESMF Code Generation Rocky Dunlap Spencer Rugaber Leo Mark Georgia Tech College of Computing.
ALMA Integrated Computing Team Coordination & Planning Meeting #1 Santiago, April 2013 Relational APDM & Relational ASDM models effort done in online.
CMAQ Runtime Performance as Affected by Number of Processors and NFS Writes Patricia A. Bresnahan, a * Ahmed Ibrahim b, Jesse Bash a and David Miller a.
Bookkeeping Tutorial. Bookkeeping & Monitoring Tutorial2 Bookkeeping content  Contains records of all “jobs” and all “files” that are created by production.
The huge amount of resources available in the Grids, and the necessity to have the most up-to-date experimental software deployed in all the sites within.
CEN Advanced Software Engineering
What is Sure Stats? Sure Stats is an add-on for SAP that provides Organizations with detailed Statistical Information about how their SAP system is being.
The european ITM Task Force data structure F. Imbeaux.
Resource Brokering in the PROGRESS Project Juliusz Pukacki Grid Resource Management Workshop, October 2003.
5 - 1 Copyright © 2006, The McGraw-Hill Companies, Inc. All rights reserved.
“curator” DB design Curator meeting, GFDL, Sep 20.
CCGrid, 2012 Supporting User Defined Subsetting and Aggregation over Parallel NetCDF Datasets Yu Su and Gagan Agrawal Department of Computer Science and.
Mark E. Fuller Senior Principal Instructor Oracle University Oracle Corporation.
TSS Database Inventory. CIRA has… Received and imported the 2002 and 2018 modeling data Decided to initially store only IMPROVE site-specific data Decided.
Assoc. Prof. Dr. Ahmet Turan ÖZCERİT.  The concept of Data, Information and Knowledge  The fundamental terms:  Database and database system  Database.
FRErator – the Bridge between FRE and Curator DB.
D R A T D R A T ABSTRACT Every semester each department at Iowa State University has to assign its faculty members and teaching assistants (TAs) to the.
1 Accomplishments. 2 Overview of Accomplishments  Sustaining the Production Earth System Grid Serving the current needs of the climate modeling community.
Rational Unified Process Fundamentals Module 4: Core Workflows II - Concepts Rational Unified Process Fundamentals Module 4: Core Workflows II - Concepts.
Bookkeeping Tutorial. 2 Bookkeeping content  Contains records of all “jobs” and all “files” that are produced by production jobs  Job:  In fact technically.
1 GFDL Data Portal Current Status, Achievements and Future Development NOAATECH-2006 K.Dixon, V.Balaji, S.Nikonov GFDL, Princeton.
Chapter 8: Installing Linux The Complete Guide To Linux System Administration.
LHCb File-Metadata: Bookkeeping Carmine Cioffi Department of Physics, Oxford University UK Metadata Workshop Oxford, 04 July 2006.
Welcome to the PRECIS training workshop
Curator: Gap Analysis (from a schema perspective) Rocky Dunlap Spencer Rugaber Georgia Tech.
DDRIII BASED GENERAL PURPOSE FIFO ON VIRTEX-6 FPGA ML605 BOARD PART B PRESENTATION STUDENTS: OLEG KORENEV EUGENE REZNIK SUPERVISOR: ROLF HILGENDORF 1 Semester:
Correlator GUI Sonja Vrcic Socorro, April 3, 2006.
Pavel Nevski DDM Workshop BNL, September 27, 2006 JOB DEFINITION as a part of Production.
Network integration with PanDA Artem Petrosyan PanDA UTA,
Grid Activities in CMS Asad Samar (Caltech) PPDG meeting, Argonne July 13-14, 2000.
Metadata Development in the Earth System Curator Spanning the Gap Between Models and Datasets Rocky Dunlap, Georgia Tech 5 th GO-ESSP Community Meeting.
Simulation Production System Science Advisory Committee Meeting UW-Madison March 1 st -2 nd 2007 Juan Carlos Díaz Vélez.
The Database Project a starting work by Arnauld Albert, Cristiano Bozza.
“Making Frames for given Predicate : An approach for Unification of knowledge”
VIEWS b.ppt-1 Managing Intelligent Decision Support Networks in Biosurveillance PHIN 2008, Session G1, August 27, 2008 Mohammad Hashemian, MS, Zaruhi.
CERN IT Department CH-1211 Genève 23 Switzerland t Load testing & benchmarks on Oracle RAC Romain Basset – IT PSS DP.
Servicing Seismic and Oil Reservoir Simulation Data through Grid Data Services Sivaramakrishnan Narayanan, Tahsin Kurc, Umit Catalyurek and Joel Saltz.
Embedding Live Access Server into GFDL Data Portal Infrastructure K.O’Brien (PMEL), S.Nikonov (GFDL), R.Schweitzer (PMEL), S.Hankin (PMEL), V.Balaji (GFDL)
Compute and Storage For the Farm at Jlab
Linux Standard Base Основной современный стандарт Linux, стандарт ISO/IEC с 2005 года Определяет состав и поведение основных системных библиотек.
Business System Development
The SCEC CSEP TESTING Center Operations Review
Simulation Production System
VI-SEEM Data Discovery Service
COTS testing Tor Stålhane.
Design Unit 26 Design a small or home office network
An Introduction to Software Architecture
Implementing KFS Release 2 (Let’s Get Cookin’!)
Metadata Development in the Earth System Curator
Overview Activities from additional UP disciplines are needed to bring a system into being Implementation Testing Deployment Configuration and change management.
Presentation transcript:

GFDL Data Portal Update: Curator DB Approach S.Nikonov, V.Balaji, K.Dixon GFDL The 5 th GO-ESSP Workshop June , LLNL

Outlines  GFDL Data Portal Hardware Upgrade  Data Portal Statistics  Metadata Database design for Data Portal usage and for whole modeling process The 5 th GO-ESSP Workshop June , LLNL

 Dell PowerEdge 2850  Two Intel 3.2GHz Xeon processors  2GB RAM  300GB system disk  Two QLogic QLA2340 fiber channel controllers (2Gb/s)  Red Hat Enterprise Linux 4.0 ES operating system  Ten StorageTek FlexLine FLC200 fiber channel disk arrays  Fourteen 250GB SATA drives per array  140 drives, total 35TB raw (27TB usable)  Increasing by 40% data transferring and processing speed  Future plan is to double storage capacity every 2 yrs Data Portal Hardware Upgrade The 5 th GO-ESSP Workshop June , LLNL

Data Statistics 01-Oct-2004 to 1-June Oct-2004 to 1-June-2006  Total amount of data: 8 TB (increased by 50% for 1 yr)  12,500 NetCDF files, average file size: 650 MB  Distinct files requested: 6,000  Distinct hosts served: ~1,200  Data transferred: ~20 TB (increased in 2 times)  Average data transferred per day: ~25 GB The 5 th GO-ESSP Workshop June , LLNL

 There is already progress done here in CGAM (NMM Suite); also Curator project is devoted partially to developing model and model output metadata standards. Those ideas and discussions were extremely useful for our design.  For comprehensive data analysis Data Portal should give description not only data but also how this data was generated.  It should use the same metadata database as modeling system (Flexible Runtime Environment). This database is a joining element of whole system.  Analysis of existing data through Data Portal will help to modelers in improving models and planning new experiments.  Thus Data Portal can be considered not as a separate independent system, but subsystem of modeling system The 5 th GO-ESSP Workshop June , LLNL Metadata Database Design

Common functionality schema of modeling system The 5 th GO-ESSP Workshop June , LLNL

Metadata Database usage on different stages of modeling process The 5 th GO-ESSP Workshop June , LLNL Metadata Database Component Building Model Composition Experiment Preparation Postprocessing Plan Data Portal Service

Main Database Compartments and their relationships The 5 th GO-ESSP Workshop June , LLNL 4. Composition 5. Simulation 1. Domains 2. Physical Processes 3. Algorithmization

Scheme Rationales  Process Domains: arenas where physical processes play.  Physical Process: descriptions of accepted theoretical approaches for given processes considered in modeling.  Algorithmization: describes program modules of elementary physical processes  Composition: components, couplers; drivers; technical environment  Simulation: describe model output data and its location, including all accompanied administrative information. The 5 th GO-ESSP Workshop June , LLNL

Process Domains They define phase spaces of the equations expressing in mathematical form physical phenomena. Also they serve as containers where elements are put (gases, aerosols, where elements are put (gases, aerosols, clouds). It contains common descriptions clouds). It contains common descriptions and sets of elements constituent domain. and sets of elements constituent domain. Examples: atmosphere or ocean 3D space Examples: atmosphere or ocean 3D space for dynamics. for dynamics.

Physical Processes  It contains theoretical assumptions, full description, references and other information specific for process.  Identified by name and domain where they act.  Described individually in different tables.  All process tables have subset of the same fields: process id process name domain full description. Others reflect process specific.  Process name and domain are the one of the criteria for preventing to include the same process into component or coupled model twice. The 5 th GO-ESSP Workshop June , LLNL

Algorithmization  Process codebase – set of modules implementing process including input data description (namelists and datasets) and accompanied with CVS tag  Numeric artifices – set of modules implementing numeric smoothing (filters, artificial viscosity, general algorithms, etc)  Tracer models – descriptions with pointing to fieldtables files associated with tracers  Grid specs  Boundary conditions  Namelists & datasets (model parameters), fieldtables (tracers) – their locations, versions, descriptions, checksums. The 5 th GO-ESSP Workshop June , LLNL

Composition  Main actors here are components.  Component can be of 2 types – physical component and coupler.  Component consists of modules.  Modules constituent of component are defined by physical process to be participating in final model. These set of modules are described in Algorithmization part of database.  Another entity of Composition compartment is a driver. It is a program unit responsible for running components (solely or as whole coupled model).  Component is a minimal unit capable to be run by driver  Components have PMIOD description and system should make decision about components compatibility using it. Other criteria working at component building stage is that there should not be two the same processes of the same domain in component or in couple model.  Coupled Model table describes set of components are member of final coupled model The 5 th GO-ESSP Workshop June , LLNL

Simulation  Institution  Author  Project  Scenario  Experiment  Realization  Postprocessing plan  Variables  Variable bundles  Metadata standards  Data fields  Files Contains tables having full description of conducted experiment that includes: The 5 th GO-ESSP Workshop June , LLNL

Process Domain Process Algorithmization Composition Simulation Domains Atmosphere Ocean Ice Land Surf_Boundary Layer Rivers Lakes Compartment Structure of Curator Database Dynamics Radiation IceProc BiotaProc Hydrology CloudProc Chemistry Convection ProcCodeBase NumArtificies GridSpecs BoundCond TracerModels NameLists DataSets Components CmpPMIOD CmpDrivers Services Versioning Compiling PlatformEnv Others...Projects Experiments Scenarios PostProc Variables OutDataFields OutDataFiles Realizations DomConstituents The 5 th GO-ESSP Workshop June , LLNLInitCond CouplModels

Modes of working with database  Research mode - modeler introduces new physical processes in modeling or new algorithmizations and new components from newly developed modules for future usage in coupled models. New components are to be described in database. The model runs conducted for this developed purpose are not to be recorded in DB excepting final ones proving physical correctness of new approach.  Production mode – experimenter composes coupled model from available components described in database, builds scenario, postprocessing plan and runs experiment. All this activity is recorded in database. Thoroughly elaborated very friendly GUI is critical need for these modes otherwise users will avoid the database based way of working, DB will be empty, project will fail. Thoroughly elaborated very friendly GUI is critical need for these modes otherwise users will avoid the database based way of working, DB will be empty, project will fail.  Automatic mode – applications fill metadata into database grabbing it from data files or reads metadata for their needs during execution. The most progress was done here with usage Simulation compartment of Curator DB The most progress was done here with usage Simulation compartment of Curator DB The 5 th GO-ESSP Workshop June , LLNL

Current usage of Curator DB  Currently the Simulation part of DB is designed for operational usage and it’s kept updated and used in Data Portal activity.  DB serves for GFDL Data Portal web site for data discovery and navigation: IPCC CM2.1. The daemon screens Data Portal storage seeking newly put data files and records metadata extracted from files and system information about them into DB. IPCC CM2.1IPCC CM2.1  It’s used for bringing metadata consistency data files on Data Portal with standards defined in DB. The application accesses to DB for metadata standard assumed for given file and compares/fixes it in the file.  It’s used by automatic tool for configuring DODS Aggregation Server. The tool checks the experiment status (public/not public) into DB and requests all needed metadata for generating DODS xml configuration file and creates this file. The 5 th GO-ESSP Workshop June , LLNL

Tables examples - 1 TracerModel ProcCodeBase Algorithmization Radiation Dynamics Physical Processes The 5 th GO-ESSP Workshop June , LLNL

Tables examples - 2 Simulation Experiments OutDataFields Composition Components Realization OutDataFiles The 5 th GO-ESSP Workshop June , LLNL CoupledModels

Data examples - 1 The 5 th GO-ESSP Workshop June , LLNL Experiments

Data examples - 2 OutDataFields OutDataFiles The 5 th GO-ESSP Workshop June , LLNL

Thanks!Questions?