Dirk Düllmann CERN Openlab storage workshop 17th March 2003

Slides:

Advertisements

Similar presentations

Data Management Expert Panel. RLS Globus-EDG Replica Location Service u Joint Design in the form of the Giggle architecture u Reference Implementation.

Advertisements

The POOL Persistency Framework POOL Summary and Plans.

RLS Production Services Maria Girone PPARC-LCG, CERN LCG-POOL and IT-DB Physics Services 10 th GridPP Meeting, CERN, 3 rd June What is the RLS -

D. Düllmann - IT/DB LCG - POOL Project1 POOL Release Plan for 2003 Dirk Düllmann LCG Application Area Meeting, 5 th March 2003.

Blueprint RTAGs1 Coherent Software Framework a Proposal LCG meeting CERN- 11 June Ren é Brun ftp://root.cern.ch/root/blueprint.ppt.

David Adams ATLAS DIAL Distributed Interactive Analysis of Large datasets David Adams BNL March 25, 2003 CHEP 2003 Data Analysis Environment and Visualization.

POOL Project Status GridPP 10 th Collaboration Meeting Radovan Chytracek CERN IT/DB, GridPP, LCG AA.

Automated Tests in NICOS Nightly Control System Alexander Undrus Brookhaven National Laboratory, Upton, NY Software testing is a difficult, time-consuming.

D. Duellmann, CERN Data Management at the LHC1 Data Management at CERN’s Large Hadron Collider (LHC) Dirk Düllmann CERN IT/DB, Switzerland

SEAL V1 Status 12 February 2003 P. Mato / CERN Shared Environment for Applications at LHC.

David Adams ATLAS ATLAS Distributed Analysis David Adams BNL March 18, 2004 ATLAS Software Workshop Grid session.

Conditions DB in LHCb LCG Conditions DB Workshop 8-9 December 2003 P. Mato / CERN.

1 DIRAC – LHCb MC production system A.Tsaregorodtsev, CPPM, Marseille For the LHCb Data Management team CHEP, La Jolla 25 March 2003.

The european ITM Task Force data structure F. Imbeaux.

NOVA Networked Object-based EnVironment for Analysis P. Nevski, A. Vaniachine, T. Wenaus NOVA is a project to develop distributed object oriented physics.

CHEP 2003 March 22-28, 2003 POOL Data Storage, Cache and Conversion Mechanism Motivation Data access Generic model Experience & Conclusions D.Düllmann,

NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.

POOL Status and Plans Dirk Düllmann IT-DB & LCG-POOL Application Area Meeting 10 th March 2004.

SEAL Core Libraries and Services CLHEP Workshop 28 January 2003 P. Mato / CERN Shared Environment for Applications at LHC.

David Adams ATLAS DIAL/ADA JDL and catalogs David Adams BNL December 4, 2003 ATLAS software workshop Production session CERN.

CASTOR evolution Presentation to HEPiX 2003, Vancouver 20/10/2003 Jean-Damien Durand, CERN-IT.

The POOL Persistency Framework POOL Project Review Introduction & Overview Dirk Düllmann, IT-DB & LCG-POOL LCG Application Area Internal Review October.

6/23/2005 R. GARDNER OSG Baseline Services 1 OSG Baseline Services In my talk I’d like to discuss two questions:  What capabilities are we aiming for.

NOVA A Networked Object-Based EnVironment for Analysis “Framework Components for Distributed Computing” Pavel Nevski, Sasha Vanyashin, Torre Wenaus US.

INFSO-RI Enabling Grids for E-sciencE Ganga 4 – The Ganga Evolution Andrew Maier.

Database authentication in CORAL and COOL Database authentication in CORAL and COOL Giacomo Govi Giacomo Govi CERN IT/PSS CERN IT/PSS On behalf of the.

D. Duellmann - IT/DB LCG - POOL Project1 The LCG Pool Project and ROOT I/O Dirk Duellmann What is Pool? Component Breakdown Status and Plans.

Some Ideas for a Revised Requirement List Dirk Duellmann.

LCG Distributed Databases Deployment – Kickoff Workshop Dec Database Lookup Service Kuba Zajączkowski Chi-Wei Wang.

David Adams ATLAS ATLAS-ARDA strategy and priorities David Adams BNL October 21, 2004 ARDA Workshop.

D. Duellmann - IT/DB LCG - POOL Project1 The LCG Dictionary and POOL Dirk Duellmann.

Overview of C/C++ DB APIs Dirk Düllmann, IT-ADC Database Workshop for LHC developers 27 January, 2005.

1 A Scalable Distributed Data Management System for ATLAS David Cameron CERN CHEP 2006 Mumbai, India.

G.Govi CERN/IT-DB 1GridPP7 June30 - July 2, 2003 Data Storage with the POOL persistency framework Motivation Strategy Storage model Storage operation Summary.

Status of tests in the LCG 3D database testbed Eva Dafonte Pérez LCG Database Deployment and Persistency Workshop.

David Adams ATLAS ATLAS Distributed Analysis (ADA) David Adams BNL December 5, 2003 ATLAS software workshop CERN.

POOL & ARDA / EGEE POOL Plans for 2004 ARDA / EGEE integration Dirk Düllmann, IT-DB & LCG-POOL LCG workshop, 24 March 2004.

David Adams ATLAS ATLAS Distributed Analysis and proposal for ATLAS-LHCb system David Adams BNL March 22, 2004 ATLAS-LHCb-GANGA Meeting.

D. Duellmann - IT/DB LCG - POOL Project1 Internal Pool Release V0.2 Dirk Duellmann.

D. Duellmann, IT-DB POOL Status1 POOL Persistency Framework - Status after a first year of development Dirk Düllmann, IT-DB.

INFSO-RI Enabling Grids for E-sciencE File Transfer Software and Service SC3 Gavin McCance – JRA1 Data Management Cluster Service.

POOL Based CMS Framework Bill Tanenbaum US-CMS/Fermilab 04/June/2003.

David Adams ATLAS DIAL Distributed Interactive Analysis of Large datasets David Adams BNL May 19, 2003 BNL Technology Meeting.

Introduction to Operating Systems Concepts

Architecture Review 10/11/2004

Jean-Philippe Baud, IT-GD, CERN November 2007

LCG Applications Area Milestones

(on behalf of the POOL team)

By: S.S.Tomar Computer Center CAT, Indore, India On Behalf of

More Interfaces Workplan

3D Application Tests Application test proposals

Overall Architecture and Component Model

Database Readiness Workshop Intro & Goals

POOL: Component Overview and use of the File Catalog

Software Design and Architecture

Distribution and components

POOL File Catalog: Design & Status

POOL persistency framework for LHC

OGSA Data Architecture Scenarios

CSI 400/500 Operating Systems Spring 2009

First Internal Pool Release 0.1

The POOL Persistency Framework

Applied Software Implementation & Testing

POOL/RLS Experience Current CMS Data Challenges shows clear problems wrt to the use of RLS Partially due to the normal “learning curve” on all sides in.

Grid Data Integration In the CMS Experiment

Lecture 1: Multi-tier Architecture Overview

PROOF - Parallel ROOT Facility

POOL Status & Release Plan for V0.4

Overview Activities from additional UP disciplines are needed to bring a system into being Implementation Testing Deployment Configuration and change management.

Implementation Plan system integration required for each iteration

Presentation transcript:

Dirk Düllmann CERN Openlab storage workshop 17th March 2003 POOL Project Overview Dirk Düllmann CERN Openlab storage workshop 17th March 2003

What is POOL? POOL is the LCG Persistency Framework Pool of persistent objects for LHC Started by LCG-SC2 in April ’02 Common effort in which the experiments take over a major share of the responsibility for defining the system architecture for development of POOL components ramping up over the last year from 1.5 to ~10FTE

POOL and the LCG Architecture Blueprint POOL is a component based system A technology neutral API Abstract C++ interfaces Implemented reusing existing technology ROOT I/O for object streaming complex data, simple consistency model (write once) RDBMS for consistent meta data handling simple data, transactional consistency POOL does not replace any of it’s components technologies It integrates them to provides higher level services Insulates physics applications from implementation details of components and technologies used today

Pool as a LCG component Persistency is just one of several projects in the LCG Applications Area Sharing a common architecture and s/w process as described in the Blueprint and Persistency RTAG documents Persistency is important… …but not important enough to allow for uncontrolled direct dependencies eg of experiment code on its implementation Common effort in which the experiments take over a major share of the responsibility for defining the overall and detailed architecture for development of Pool components

LCG Blueprint Software Decomposition

POOL Work Package breakdown Based on outcome of SC2 persistency RTAG File Catalog keep track of files (and their physical and logical names) and their description resolve a logical file reference (FileID) into a physical file pool::IFileCatalog Collections keep track of (large) object collection and their description pool::Collection<T> Storage Service stream transient C++ objects into/from storage resolve a logical object reference into a physical object Object Cache (DataService) keep track of already read objects to speed up repeated access to the same data pool::IDataSvc and pool::Ref<T>

POOL Internal Organisation

POOL and the GRID GRID mostly deals with data of file level granularity File Catalog connects POOL to Grid Resources eg via our EDG-RLS backend POOL Storage Service deals with intra file structure need connection via standard Grid File access Both File and Object based Collections are seen as important End User concepts POOL offers a consistent interface to both types Need to understand to what extend these can be provided in a Grid environment

How does POOL fit into the environment Exp. DB Services Book Keeping Production Workflow POOL client on a CPU Node POOL will be mainly used from experiment frameworks mostly as client library loaded from user application Production Manager Creates and maintains shared file catalogs and (event) collections eg add the catalog fragment for the new simulation data to the published analysis catalog End User Uses shared collections eg iterate over collection X User Application Experiment Framework RDBMS Services Collection Description? POOL Collection Location? Collection Access remote access via ROOT I/O Grid (File) Services Replica Location File Description Remote File I/O?

POOL File Catalog POOL uses GUID implementation for FileID Logical Naming Object Lookup POOL uses GUID implementation for FileID unique and immutable identifier for a file (generated at create time) allows to produce sets of file with internal references without requiring a central ID allocation service catalog fragments created independently can later be merged without modification to data files. Object lookup is based only on right side box! Logical filenames are supported but not required

Use Case: Working in Isolation The user extracts a set of interesting files and a catalog fragment describing them from a (central) grid based catalog into a local (eg XML based) catalog. Selection is performed based on file or collection descriptions After disconnecting from the grid the user executes some standard jobs navigating through the extracted data. New output files are registered into the local catalog Once the new data is ready for publishing and the user is connected the new catalog fragment is submitted to the grid based catalog. File Catalog & Descr Grid File Storage Local File Catalog Local Files New Catalog & Descr New Files Extraction Local Processing Result Publishing

Use Case: Farm Production Production Node 1 Production Node 2 Production Node n Local File Catalog Production manager may pre-register output files with the catalog (eg a “local” MySQL or XML catalog) File ID, physical filename job ID and optionally also logical filenames A production job runs and creates files and their catalog entries locally. During the production the catalog can be used to cleanup files (and their registration) from unsuccessful jobs based on their associated job ID. Once the data quality checks have been passed the production manager decides to publishes the production catalog fragment to the grid based catalog. Local File Catalog Local File Catalog Local Files Local Files Local Files Post Processing New Files New Catalog & Descr Result Publishing Grid Cataloge File Catalog & Descr Grid File Storage

POOL Storage Hierarchy A application may access databases (eg ROOT files) from a set of catalogs Each database has containers of one specific technology (eg ROOT trees) Smart Pointers are used to transparently load objects into a client side cache define object associations across file or technology boundaries

Client Data Access Data Cache Data Service Ref<T> Client

Dictionary:Population/Conversion .h ROOTCINT CINT dictionary code Dictionary Generation .xml Code Generator GCC-XML LCG dictionary code CINT dictionary I/O Data I/O LCG dictionary Gateway Reflection Other Clients

Project Status & Plans First four POOL releases delivered planned functionality on time Aggressive schedule so far focusing on adding functionality no consistent attempt of performance optimisation yet Functional complete (LCG-1 feature set) POOL V1.0 release scheduled for April several functional extensions compared to V0.4 automated system tests are being Bug fix and performance release POOL V1.1 in June Aim to be ready for first deployment together with LCG-1 environment Will release Work on proof of concept storage service re-implementation based on an RDBMS back end starting

Summary The LCG Pool project provides a hybrid store integrating object streaming (eg Root I/O) with RDBMS technology (eg MySQL) for consistent meta data handling Strong emphasis on component decoupling and well defined communication/dependencies Transparent cross-file and cross-technology object navigation via C++ smart pointers Integration with Grid technology (via EDG-RLS) but preserving networked and grid-decoupled working modes Next two releases (V1.0-functionality and V1.1-reliability & performance) will be crucial for POOL acceptance Need tight coupling with experiment development and production teams to validate the feature set Assume tight integration with LCG deployment activities

How to find out more about POOL? POOL Home Page http://lcgapp.cern.ch/project/persist/ POOL savannah portal http://lcgappdev.cern.ch/savannah/projects/pool