POOL & ARDA / EGEE POOL Plans for 2004 ARDA / EGEE integration Dirk Düllmann, IT-DB & LCG-POOL LCG workshop, 24 March 2004.

Slides:



Advertisements
Similar presentations
Object Persistency & Data Handling Session C - Summary Object Persistency & Data Handling Session C - Summary Dirk Duellmann.
Advertisements

Data Management Expert Panel. RLS Globus-EDG Replica Location Service u Joint Design in the form of the Giggle architecture u Reference Implementation.
Database System Concepts and Architecture
Database Architectures and the Web
1 User Analysis Workgroup Update  All four experiments gave input by mid December  ALICE by document and links  Very independent.
The POOL Persistency Framework POOL Summary and Plans.
RLS Production Services Maria Girone PPARC-LCG, CERN LCG-POOL and IT-DB Physics Services 10 th GridPP Meeting, CERN, 3 rd June What is the RLS -
D. Düllmann - IT/DB LCG - POOL Project1 POOL Release Plan for 2003 Dirk Düllmann LCG Application Area Meeting, 5 th March 2003.
Technical Architectures
GGF Toronto Spitfire A Relational DB Service for the Grid Peter Z. Kunszt European DataGrid Data Management CERN Database Group.
POOL Project Status GridPP 10 th Collaboration Meeting Radovan Chytracek CERN IT/DB, GridPP, LCG AA.
Chapter 1 Introduction to Databases
Web Application Architecture: multi-tier (2-tier, 3-tier) & mvc
Objectives of the Lecture :
MDC Open Information Model West Virginia University CS486 Presentation Feb 18, 2000 Lijian Liu (OIM:
Don Quijote Data Management for the ATLAS Automatic Production System Miguel Branco – CERN ATC
Database System Concepts and Architecture
EGEE is a project funded by the European Union under contract IST Testing processes Leanne Guy Testing activity manager JRA1 All hands meeting,
Scalable Web Server on Heterogeneous Cluster CHEN Ge.
Session-8 Data Management for Decision Support
1 Some initial Design suggestions… Getting started… where to begin? Find out whether your design architecture will work… as soon as possible. If you need.
Frontiers in Massive Data Analysis Chapter 3.  Difficult to include data from multiple sources  Each organization develops a unique way of representing.
ATLAS Detector Description Database Vakho Tsulaia University of Pittsburgh 3D workshop, CERN 14-Dec-2004.
3D Workshop Outline & Goals Dirk Düllmann, CERN IT More details at
Distribution and components. 2 What is the problem? Enterprise computing is Large scale & complex: It supports large scale and complex organisations Spanning.
Replica Management Services in the European DataGrid Project Work Package 2 European DataGrid.
POOL Status and Plans Dirk Düllmann IT-DB & LCG-POOL Application Area Meeting 10 th March 2004.
Feedback from the POOL Project User Feedback from the POOL Project Dirk Düllmann, LCG-POOL LCG Application Area Internal Review October 2003.
DATABASE MANAGEMENT SYSTEM ARCHITECTURE
Metadata Mòrag Burgon-Lyon University of Glasgow.
David Adams ATLAS DIAL/ADA JDL and catalogs David Adams BNL December 4, 2003 ATLAS software workshop Production session CERN.
The POOL Persistency Framework POOL Project Review Introduction & Overview Dirk Düllmann, IT-DB & LCG-POOL LCG Application Area Internal Review October.
From Digital Objects to Content across eInfrastructures Content and Storage Management in gCube Pasquale Pagano CNR –ISTI on behalf of Heiko Schuldt Dept.
DGC Paris WP2 Summary of Discussions and Plans Peter Z. Kunszt And the WP2 team.
D. Duellmann - IT/DB LCG - POOL Project1 The LCG Pool Project and ROOT I/O Dirk Duellmann What is Pool? Component Breakdown Status and Plans.
Some Ideas for a Revised Requirement List Dirk Duellmann.
Object storage and object interoperability
Data Placement Intro Dirk Duellmann WLCG TEG Workshop Amsterdam 24. Jan 2012.
David Adams ATLAS ATLAS-ARDA strategy and priorities David Adams BNL October 21, 2004 ARDA Workshop.
ATLAS-specific functionality in Ganga - Requirements for distributed analysis - ATLAS considerations - DIAL submission from Ganga - Graphical interfaces.
D. Duellmann - IT/DB LCG - POOL Project1 The LCG Dictionary and POOL Dirk Duellmann.
Overview of C/C++ DB APIs Dirk Düllmann, IT-ADC Database Workshop for LHC developers 27 January, 2005.
Copyright 2007, Information Builders. Slide 1 iWay Web Services and WebFOCUS Consumption Michael Florkowski Information Builders.
Site Services and Policies Summary Dirk Düllmann, CERN IT More details at
David Adams ATLAS ATLAS Distributed Analysis (ADA) David Adams BNL December 5, 2003 ATLAS software workshop CERN.
1 Configuration Database David Forrest University of Glasgow RAL :: 31 May 2009.
1 Information Retrieval and Use De-normalisation and Distributed database systems Geoff Leese September 2008, revised October 2009.
LHCC Referees Meeting – 28 June LCG-2 Data Management Planning Ian Bird LHCC Referees Meeting 28 th June 2004.
D. Duellmann, IT-DB POOL Status1 POOL Persistency Framework - Status after a first year of development Dirk Düllmann, IT-DB.
David Adams ATLAS ADA: ATLAS Distributed Analysis David Adams BNL December 15, 2003 PPDG Collaboration Meeting LBL.
A quick summary and some ideas for the 2005 work plan Dirk Düllmann, CERN IT More details at
E-commerce Architecture Ayşe Başar Bener. Client Server Architecture E-commerce is based on client/ server architecture –Client processes requesting service.
1 LCG Distributed Deployment of Databases A Project Proposal Dirk Düllmann LCG PEB 20th July 2004.
OGSA-DAI.
System Software Laboratory Databases and the Grid by Paul Watson University of Newcastle Grid Computing: Making the Global Infrastructure a Reality June.
(on behalf of the POOL team)
Open Source distributed document DB for an enterprise
3D Application Tests Application test proposals
Distribution and components
POOL persistency framework for LHC
Dirk Düllmann CERN Openlab storage workshop 17th March 2003
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 2 Database System Concepts and Architecture.
Database Management System (DBMS)
POOL/RLS Experience Current CMS Data Challenges shows clear problems wrt to the use of RLS Partially due to the normal “learning curve” on all sides in.
Data, Databases, and DBMSs
Grid Data Integration In the CMS Experiment
Lecture 1: Multi-tier Architecture Overview
SISAI STATISTICAL INFORMATION SYSTEMS ARCHITECTURE AND INTEGRATION
Database System Concepts and Architecture
Presentation transcript:

POOL & ARDA / EGEE POOL Plans for 2004 ARDA / EGEE integration Dirk Düllmann, IT-DB & LCG-POOL LCG workshop, 24 March 2004

POOL & ARDA / EGEED.Duellmann2 POOL Focus in 2004 Stabilise POOL s/w productsStabilise POOL s/w products –Focus on performance improvements rather than large functionality changes –In line with the experiments plans for the data challenges Help to simplify the integration into experiment frameworksHelp to simplify the integration into experiment frameworks –Tight coupling between POOL and experiment development and production teams –Automated schema loading, usability tools, documentation improvements Production release of Conditions DBProduction release of Conditions DB –After a initial interface consolidation round Achieve POOL independence of the RDBMS backendAchieve POOL independence of the RDBMS backend –And extend the set of supported RDBMS systems

POOL & ARDA / EGEED.Duellmann Work Plan Draft document has been discussed in POOL and the Architect ForumDraft document has been discussed in POOL and the Architect Forum – –Based on WP work plans and experiment priorities –No significant objections received Hope to finalize the plan soon and present it to PEB and CS2Hope to finalize the plan soon and present it to PEB and CS2 –Significant overlap between the different experiment requests –Thanks to all experiments for their concrete and detailed input

POOL & ARDA / EGEED.Duellmann4 POOL interest in ARDA/EGEE POOL will be one major client of ARDA/EGEE servicesPOOL will be one major client of ARDA/EGEE services –ARDA will be a framework for running POOL based (analysis) tasks –Currently POOL is mainly used in production activities POOL will need to address analysis area this year –POOL needs to be able to integrate with middleware (EGEE) provided services Mainly catalogs, file/database access, meta data POOL will align with the main ARDA concepts and implementationsPOOL will align with the main ARDA concepts and implementations In particular Collections, Filesets and File

POOL & ARDA / EGEED.Duellmann5 ARDA/EGEE interest in POOL? ARDA/EGEE interest in POOL? POOL provides deployment models, interfaces and implementations as input to ARDAPOOL provides deployment models, interfaces and implementations as input to ARDA –Eg model for cross population between implementation using catalog meta data queries POOL file catalog may be seen as an implementation of file catalogsPOOL file catalog may be seen as an implementation of file catalogs –See first successful use of POOL file catalogs in production activities POOL use of cascading catalogs seems to be well accepted and should be considered is input to ARDA/EGEE –Also see first performance problems and design traps for backend services (eg the current RLS - see slides further on) A consistent distribution (extraction and publishing) mechanism will be required for all services which keep/provide meta dataA consistent distribution (extraction and publishing) mechanism will be required for all services which keep/provide meta data –This should be designed into the various catalogs, conditions db, … Ie support for meta data based query and bulk update from eg XML –Could greatly profit from a single basic model and not a different treatment per meta data service

POOL & ARDA / EGEED.Duellmann6 POOL Collections Role of event level collections and meta data in a grid environment needs clarification and prototypingRole of event level collections and meta data in a grid environment needs clarification and prototyping –Expect active collaboration with ARDA to come up with a model for deploying collections in production and analysis environments –Integration of POOL collections with experiment frameworks just starting –Still many open questions about requirements Need a collection catalog (with associated meta data) –Re-use of file catalog interface and implementations? –Or are collections/datasets just more general files? Consistent catalog & meta data distribution – how ? –XML on application level? Directly between backend databases? What collection meta level data needs to be kept? Collection implementation in POOL is only a first stepCollection implementation in POOL is only a first step –The real issue is not the implementation but a conceptual model which fits with the GRID provided services –Need active experiment involvement (and some agreement) in this area How do POOL collections tie in with ARDA and EGEE?How do POOL collections tie in with ARDA and EGEE?

POOL & ARDA / EGEED.Duellmann7 Collections as End-to-end Services End-to-end services in a layered systemEnd-to-end services in a layered system May involve several concepts/services Each of which may “live” on a different layer Example: POOL CollectionsExample: POOL Collections POOL (Event) Collection - end user concept –Access to collections of POOL objects –Integrated with PROOF analysis back end Fileset – ARDA layer concept? –Access to abstract data from a set of closely related files File – EGEE layer concept? –Access to a unstructured set of bytes in file somewhere on the grid

POOL & ARDA / EGEED.Duellmann8 Collections & ARDA Joined Work Package between POOL and ARDAJoined Work Package between POOL and ARDA –Still some uncertainties concerning the ARDA side of the work plan –Will continue work to address the outstanding issues on the POOL side –POOL has asked experiments for principal contacts in this area Collection cataloguing, extraction and publishing toolsCollection cataloguing, extraction and publishing tools –Can we achieve a single baseline model for distributed meta data catalogs? File Catalog, Collection Catalog, Conditions Folder Catalog One basic mechanism of data exchange across RDBMS vendor boundaries based on the POOL relation abstraction layer? Separation of logical and physical collection identificationSeparation of logical and physical collection identification –Introduce a Collection (Fileset?) catalog –First implementation could simply be based on the existing File Catalog components, but as a separate service Integrate POOL collections with ARDA provided servicesIntegrate POOL collections with ARDA provided services –Migrate to ARDA/EGEE provided catalog and meta data(?) services

POOL & ARDA / EGEED.Duellmann9 POOL Storage Manager Plans Optimisation required in several areasOptimisation required in several areas –Client side resource usage - memory, CPU, file handles (Q2) –Mass storage handling (minimise costly requests) (Q3) –“Transparent” double to float mapping (Q3) Automated schema loading (Q2)Automated schema loading (Q2) –Based on SEAL service –In cooperation with ROOT team to allow late integration of data types for already open files Bug fixes - more complex casesBug fixes - more complex cases Eg std containers with user defined allocators which define local data – aka CLHEP Matrix (Q1) RDBMS backend based on the RDBMS Abstraction LayerRDBMS backend based on the RDBMS Abstraction Layer –Storage of simple data structures into a RDBMS via the same interface as for objects stored on the streaming layer –Two step plan: First allow for objects which can trivially be mapped to SQL tables (Q2/Q3) Possibly later an extension to more complex C++ objects (Q1 ‘05?)

POOL & ARDA / EGEED.Duellmann10 POOL and Grid Files POOL needs to access data from the gridPOOL needs to access data from the grid –POOL rarely deals with files directly - almost all I/O is delegated to the storage backend (ROOT or RDBMS) –Open problem of integration with storage element interface How to access data from a file somewhere in the grid –Without hardcoding to the (mass) storage backend Currently POOL can handle any files which ROOT can open SRM integration and data access to files on LCG storage elements needs to be providedSRM integration and data access to files on LCG storage elements needs to be provided –We share problem with vanilla ROOT Proposal: work together with ROOT team to make sure that the LCG SE integration into ROOT fulfills both and POOL needsProposal: work together with ROOT team to make sure that the LCG SE integration into ROOT fulfills both and POOL needs

POOL & ARDA / EGEED.Duellmann11 POOL and RDBMS Today POOL stores all user data in ROOT filesToday POOL stores all user data in ROOT files During this year we plan to provide storage of objects also into relation database back endsDuring this year we plan to provide storage of objects also into relation database back ends –Required for frequently updated meta data, complex server side query support Will need to register backend database connections with associated meta data in a very similar way as filesWill need to register backend database connections with associated meta data in a very similar way as files In the file catalog? In another dedicated database catalog? Other grid services may require access to registered logical/physical database connections as wellOther grid services may require access to registered logical/physical database connections as well –Is there a need for another catalog?

POOL & ARDA / EGEED.Duellmann12 POOL File Catalog Significant developments completed already (Q1)Significant developments completed already (Q1) –Support for LCG-2 (V1.5) –Support for Composite Catalogs (V1.6) File Catalog as model for handling and exchanging data could be a prototype for other (very similar) meta data catalogs (Q2/Q3)File Catalog as model for handling and exchanging data could be a prototype for other (very similar) meta data catalogs (Q2/Q3) –Collection catalog and Collection entries –Condition Folder catalog and Condition Data Cataloguing, extraction based on meta data, publishing are all very similarCataloguing, extraction based on meta data, publishing are all very similar –Even the catalog component implementation could be factorised out and shared –Performance of XML as exchange format for larger data amounts needs to be evaluated

POOL & ARDA / EGEED.Duellmann13 POOL/RLS Experience Current CMS Data Challenges shows clear problems wrt to the use of RLS Partially due to the normal “learning curve” on all sides in using a new systemsPartially due to the normal “learning curve” on all sides in using a new systems Some reasons areSome reasons are –Not yet fully optimised service –Inefficient use of the query facilities POOL and RLS service people works closely with production teams to understand their issuesPOOL and RLS service people works closely with production teams to understand their issues –Which queries are needed? –How to structure the meta data? –Which catalog interface? –Which indices?

POOL & ARDA / EGEED.Duellmann14 More POOL/RLS Experience But poor performance also due to known RLS design problems!But poor performance also due to known RLS design problems! File names and related meta data are used for queriesFile names and related meta data are used for queries –Current RLS split of mapping data from file meta data (LRC vs. RMC) results in rather poor performance for combined queries –Forces the applications (eg POOL) to perform large joins on the client side rather than fully exploit the database backend Many catalog operations are bulk operationsMany catalog operations are bulk operations –Current RLS interface is very low level and results in large overheads on bulk operations (too many network round-trips) Transaction support would greatly simplify the deploymentTransaction support would greatly simplify the deployment –A partially successful bulk insert/update requires recovery “by hand” These are not really special requirements imposed by POOLThese are not really special requirements imposed by POOL –Still acceptable performance and scalability needs a catalog design which keeps the data which is used in one query close to each other –Try to work around some of this know issues on the POOL side

POOL & ARDA / EGEED.Duellmann15Summary POOL Focus for 2004POOL Focus for 2004 –Consolidation and Optimisation –RDBMS vendor independence –Common model for distributed, heterogeneous meta data catalogs –ConditionsDB production release and integration with POOL POOL will be a major ARDA/EGEE clientPOOL will be a major ARDA/EGEE client –Needs to stay aligned with ARDA concepts and EGEE services –Provider of persistent object storage and collections Joint work package between POOL and ARDA in particular in the Collections areaJoint work package between POOL and ARDA in particular in the Collections area –Need more active experiment involvement Gaining valuable real life (data challenge) experience with POOL/RLS as input for next roundGaining valuable real life (data challenge) experience with POOL/RLS as input for next round –Produces concrete experiment requirements as input to ARDA/EGEE –POOL may be able to workaround some of the RLS design problems A real solution will be required from ARDA/EGGE to achieve the performance and scalability goalsA real solution will be required from ARDA/EGGE to achieve the performance and scalability goals

POOL & ARDA / EGEED.Duellmann16 Input for a next software generation Catalogs of “things” annotated with their meta data exist all over the systemCatalogs of “things” annotated with their meta data exist all over the system –These catalogs services could/should share the implementation and distribution mechanism Separation of catalog mapping data from associated meta data makes meta data almost uselessSeparation of catalog mapping data from associated meta data makes meta data almost useless –Efficient queries require that mapping and meta data are handled by (in!) one same database backend Higher level interface for bulk insert and bulk query is requiredHigher level interface for bulk insert and bulk query is required –The current use of SOAP RPC call for each individual data entry will not scale to larger productions Transaction concept is required for a maintainable stable production environmentTransaction concept is required for a maintainable stable production environment –User transactions may span span several services!