1 LCG Distributed Deployment of Databases A Project Proposal Dirk Düllmann LCG PEB 20th July 2004.

Slides:



Advertisements
Similar presentations
Chapter 10: Designing Databases
Advertisements

Data Management Expert Panel. RLS Globus-EDG Replica Location Service u Joint Design in the form of the Giggle architecture u Reference Implementation.
Database System Concepts and Architecture
Database Architectures and the Web
Cloud Computing: Theirs, Mine and Ours Belinda G. Watkins, VP EIS - Network Computing FedEx Services March 11, 2011.
Objektorienteret Middleware Presentation 2: Distributed Systems – A brush up, and relations to Middleware, Heterogeneity & Transparency.
RLS Production Services Maria Girone PPARC-LCG, CERN LCG-POOL and IT-DB Physics Services 10 th GridPP Meeting, CERN, 3 rd June What is the RLS -
1 Software & Grid Middleware for Tier 2 Centers Rob Gardner Indiana University DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National.
D. Düllmann - IT/DB LCG - POOL Project1 POOL Release Plan for 2003 Dirk Düllmann LCG Application Area Meeting, 5 th March 2003.
Network Management Overview IACT 918 July 2004 Gene Awyzio SITACS University of Wollongong.
GGF Toronto Spitfire A Relational DB Service for the Grid Peter Z. Kunszt European DataGrid Data Management CERN Database Group.
OCT1 Principles From Chapter One of “Distributed Systems Concepts and Design”
Client-Server Computing in Mobile Environments
Introduction to Databases Transparencies 1. ©Pearson Education 2009 Objectives Common uses of database systems. Meaning of the term database. Meaning.
LCG Milestones for Deployment, Fabric, & Grid Technology Ian Bird LCG Deployment Area Manager PEB 3-Dec-2002.
RLS Tier-1 Deployment James Casey, PPARC-LCG Fellow, CERN 10 th GridPP Meeting, CERN, 3 rd June 2004.
EMI INFSO-RI SA2 - Quality Assurance Alberto Aimar (CERN) SA2 Leader EMI First EC Review 22 June 2011, Brussels.
Grid Workload Management & Condor Massimo Sgaravatto INFN Padova.
Workshop Summary (my impressions at least) Dirk Duellmann, CERN IT LCG Database Deployment & Persistency Workshop.
INFSO-RI Enabling Grids for E-sciencE Distributed Metadata with the AMGA Metadata Catalog Nuno Santos, Birger Koblitz 20 June 2006.
Advanced Computer Networks Topic 2: Characterization of Distributed Systems.
Responsibilities of ROC and CIC in EGEE infrastructure A.Kryukov, SINP MSU, CIC Manager Yu.Lazin, IHEP, ROC Manager
Lecture # 3 & 4 Chapter # 2 Database System Concepts and Architecture Muhammad Emran Database Systems 1.
CERN Physics Database Services and Plans Maria Girone, CERN-IT
3D Workshop Outline & Goals Dirk Düllmann, CERN IT More details at
Configuration Management and Change Control Change is inevitable! So it has to be planned for and managed.
CERN-IT Oracle Database Physics Services Maria Girone, IT-DB 13 December 2004.
EGEE User Forum Data Management session Development of gLite Web Service Based Security Components for the ATLAS Metadata Interface Thomas Doherty GridPP.
6/23/2005 R. GARDNER OSG Baseline Services 1 OSG Baseline Services In my talk I’d like to discuss two questions:  What capabilities are we aiming for.
INFSO-RI Enabling Grids for E-sciencE ARDA Experiment Dashboard Ricardo Rocha (ARDA – CERN) on behalf of the Dashboard Team.
D. Duellmann - IT/DB LCG - POOL Project1 The LCG Pool Project and ROOT I/O Dirk Duellmann What is Pool? Component Breakdown Status and Plans.
Some Ideas for a Revised Requirement List Dirk Duellmann.
Oracle for Physics Services and Support Levels Maria Girone, IT-ADC 24 January 2005.
LCG Distributed Databases Deployment – Kickoff Workshop Dec Database Lookup Service Kuba Zajączkowski Chi-Wei Wang.
ATLAS Database Access Library Local Area LCG3D Meeting Fermilab, Batavia, USA October 21, 2004 Alexandre Vaniachine (ANL)
David Foster LCG Project 12-March-02 Fabric Automation The Challenge of LHC Scale Fabrics LHC Computing Grid Workshop David Foster 12 th March 2002.
Maria Girone CERN - IT Tier0 plans and security and backup policy proposals Maria Girone, CERN IT-PSS.
Overview of C/C++ DB APIs Dirk Düllmann, IT-ADC Database Workshop for LHC developers 27 January, 2005.
Rights Management for Shared Collections Storage Resource Broker Reagan W. Moore
1 A Scalable Distributed Data Management System for ATLAS David Cameron CERN CHEP 2006 Mumbai, India.
Testing CernVM-FS scalability at RAL Tier1 Ian Collier RAL Tier1 Fabric Team WLCG GDB - September
Enabling Grids for E-sciencE INFSO-RI Enabling Grids for E-sciencE Gavin McCance GDB – 6 June 2007 FTS 2.0 deployment and testing.
Site Services and Policies Summary Dirk Düllmann, CERN IT More details at
Status of tests in the LCG 3D database testbed Eva Dafonte Pérez LCG Database Deployment and Persistency Workshop.
David Adams ATLAS ATLAS Distributed Analysis (ADA) David Adams BNL December 5, 2003 ATLAS software workshop CERN.
44222: Information Systems Development
LHCC Referees Meeting – 28 June LCG-2 Data Management Planning Ian Bird LHCC Referees Meeting 28 th June 2004.
INFSO-RI Enabling Grids for E-sciencE File Transfer Software and Service SC3 Gavin McCance – JRA1 Data Management Cluster Service.
Breaking the frontiers of the Grid R. Graciani EGI TF 2012.
A quick summary and some ideas for the 2005 work plan Dirk Düllmann, CERN IT More details at
CMS Database Projects Lee Lueking CMS Activity Coordination Meeting July 20, 2004.
CERN - IT Department CH-1211 Genève 23 Switzerland t Service Level & Responsibilities Dirk Düllmann LCG 3D Database Workshop September,
LCG 3D Project Status For more details: Dirk Düllmann, CERN IT-ADC LCG Application Area Meeting 6 April, 2005.
GridOS: Operating System Services for Grid Architectures
Database Replication and Monitoring
EGEE Middleware Activities Overview
(on behalf of the POOL team)
N-Tier Architecture.
LCG 3D Distributed Deployment of Databases
Ian Bird GDB Meeting CERN 9 September 2003
3D Application Tests Application test proposals
POOL persistency framework for LHC
LCG Distributed Deployment of Databases A Project Proposal
Dirk Düllmann CERN Openlab storage workshop 17th March 2003
Workshop Summary Dirk Duellmann.
Data Management cluster summary
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 2 Database System Concepts and Architecture.
POOL/RLS Experience Current CMS Data Challenges shows clear problems wrt to the use of RLS Partially due to the normal “learning curve” on all sides in.
Introduction to Databases Transparencies
Presentation transcript:

1 LCG Distributed Deployment of Databases A Project Proposal Dirk Düllmann LCG PEB 20th July 2004

2 Why a LCG Database Deployment Project?  What’s missing?  LCG provides an infrastructure for distributed access to file based data and data replication  Physics applications (and Grid services) require a similar infrastructure for data stored in relational databases Several applications and services already use RDBMS Several sites have already experience in providing RDBMS services  Need for some standardisation as part of LCG To allow applications to access data in a consistent, location independent way To allow to connect existing db services via data replication mechanisms To simplify a shared deployment and administration of this infrastructure during 24*7 operation To increase the availability and scalability of the total LCG system  Need to bring service providers (site technology experts) closer to database users/developers to define a LCG database service for the upcoming data challenges in 2005

3 Project Goals  Define distributed database services and application access allowing LCG applications and services to find relevant database back-ends, authenticate and use the provided data in a location independent way.  Help to avoid the costly parallel development of data distribution, backup and high availability mechanisms in each experiment or grid site in order to limit the support costs.  Enable a distributed deployment of an LCG database infrastructure with a minimal number of LCG database administration personnel.

4 Project Non-Goals  Store all database data Experiments are free to deploy databases and replicate data under their responsibility  Setup a single monolithic distributed database system Given constraints like WAN connections one can not assume that a single synchronously updated database would work or give sufficient availability.  Setup a single vendor system Technology independence and multi-vendor implementation will be required to minimize the long term risks and to adapt to the different requirements/constraints on different tiers.  Impose a CERN centric infrastructure to participating sites CERN is one equal partner of other LCG sites on each tier  Decide on an architecture, implementation, new services, policies Produce a technical proposal for all of those to LCG PEB/GDB

5 Database Service Situation at LCG Sites  Several sites run Oracle based services for HEP and non- HEP applications  Deployment experience and procedures exists  … and can not be changed easily without affecting other site activities  MySQL is very popular in the developer community  Expected (just by developers?) to be easy to deploy down to T2 sites where db administration resources are very limited  So far no larger scale production service exists at LCG sites but many applications which are bound to MySQL  Expect a significant role for both database flavors  In different areas of the LCG infrastructure

6 Situation on the Application Side  Databases are used by many applications in the physics production chain  Currently many of these applications are centralized ones One (or more) copies of an application run against a single database Many of these applications expect to move to a distributed model for scalability and availability reasons This move can be simplified by an LCG database replication infrastructure - but this does not happen by magic  Selection of the db vendor is often made by application developers Not necessarily yet with the full deployment environment in mind  Need to continue to make central applications vendor neutral  DB abstraction layers exist or are being implemented in many foundation libraries ROOT, POOL, OGSA-DAI, ODBC and JDBC are steps in this direction Degree of the achieved abstraction varies  Still many applications which are only available for one vendor Or have significant schema differences which forbid DB DB replications

7 OracleMySQL RAL APP network db file storage db & cache servers client s/w web cache web cache SQLite file Application s/w and Distribution Options

8 Distribution Options - and Impact on Deployment and Apps  DB Vendor native replication  Requires same (or at least similar) schema for all applications running against replicas of the database  Relational Abstraction based replication  Requires that applications are based on an agreed mapping between different back-ends  Possibily inforced by the abstraction layer Otherwise by the application programmer  Application level replication  Requires common API (or data exchange format) for different implementations of one application Eg POOL File catalogs, ConditionsDB (MySQL/Oracle)  Free to choose backend database schema to exploit specific capabilities of a database vendor Eg large table partitioning in the case of the Conditions Database

9 Candidate Distribution Technologies Vendor native distribution  Oracle replication and related technologies  Table-to-Table replication via asyncronous update streams  Transportable tablespaces  Little (but non-zero) impact on application design  Potentially extensible to other back-end database through API  Evaluations done at FNAL and CERN  MySQL native replication  Little experience in HEP so far?  Feasible (or necessary) in the WAN?

10 Local Database vs. Local Cache  FNAL experiments deploy a combination of http based database access with web proxy caches close to the client  Significant performance gains reduced real database access for largely read-only data reduced transfer overhead than SOAP RPC based approaches  Web caches (eg squid) are much simpler to deploy than databases could remove the need for a local database deployment on some tiers no vendor specific database libraries on the client side  “Network friendly” tunneling of requests requests via a single port  Expect this technology to play a significant role especially towards the higher Tiers which may not have the resources to maintain a reliable database service

11 Tiers and Requirements  Different requirements and service capabilities for different Tiers  Database Backbone (Tier1 Tier1) High volume, often complete replication of RDBMS data Can expect good, but not continuous network connection to other T1 sites Symmetric, possibly multi-master replication Large scale central database service, local dba team  Tier1 Tier2 Medium volume, sliced extraction of data Asymmetric, possibly only uni-directional replication Part time administration (shared with fabric administration)  Tier1/2 Tier4 (Laptop extraction) Support fully disconnected operation Low volume, sliced extraction from T1/T2  Need a catalog of implementation/distribution technologies  Each addressing parts of the problem  But all together forming a consistent distribution model

12 Where to start?  Too many options and constraints to solve the complete problem at once  Need to start from a pragmatic model which can be implemented by 2005  Extend this model to make more applications distributable more freely over time This is an experiment and LCG Application Area activity which is strongly coupled to the replication mechanisms provided by the deployment side  No magic solution for 2005  Some applications will be still bound to a database vendor and therefore LCG to a particular tier  Some site specific procedures will likely remain

13 M Starting Point for a Service Architecture? O O O M T1- db back bone - all data replicated - reliable service T2 - local db cache -subset data -only local service T3/4 MM T0 - autonomous Oracle Streams Cross vendor extract MySQL Files Proxy Cache

14 Staged Project Evolution  Proposal Phase 1 (in place for 2005 data challenges)  Focus on T1 back-bone understand the bulk data transfer issues Given the current service situation a T1 back-bone based on Oracle with streams based replication seems the most promising implementation Start with T1 sites who have sufficient manpower to actively participate in the project  Prototype vendor independent T1 T2 extraction based on application level or relational abstraction level This would allow to run vendor dependent database applications on the T2 subset of the data  Define a MySQL service with interested T2 sites Experiments should point out their MySQL service requirements to the sites Need candidate sites which are interested in providing a MySQL service and are able to actively contribute its definition  Proposal Phase 2  Try to extend the heterogeneous T2 setup to T1 sites By this time real MySQL based services should be established and reliable Cross vendor replication based on either Oracle streams bridges or relational abstraction may have proven to work and to handle the data volumes

15 Proposed Project Structure  Data Inventory and Distribution Requirements  Members are s/w providers from experiments and grid services based on RDBMS data  Gather data properties (volume, ownership) requirements and integrate the provided service into their software  Database Service Definition and Implementation  Members are site technology and deployment experts  Propose an agreeable deployment setup and common deployment procedures  Evaluation Tasks  Short, well defined technology evaluations against the requirements delivered by wp1  Evaluation are proposed by people WP2 (evaluation plan) and typically executed by the people proposing a technology for the service implementation and result in a short evaluation report

16 Data Inventory  Collect and maintain a catalog of main RDBMS data types  Select from catalog of well defined replication options which are be supported as part of the service  Ask the experiments and s/w providers to fill a simple table for each main data type which is candidate for storage and replication via this service  Basic storage properties Data description, expected volume on T0/1/2 in 2005 (and evolution) Ownership model: read-only, single user update, single site update, concurrent update  Replication/Caching properties Replication model: site local, all t1, sliced t1, all t2, sliced t2 … Consistency/Latency: how quickly do changes need to reach other sites/tiers Application constraints: DB vendor and DB version constraints  Reliability and Availability requirements Essential for whole grid operation, for site operation, for experiment production, Backup and Recovery policy –Acceptable time to recover, location of backup(s)

17 DB Service Definition and Implementation  Service Discovery  How does a job find a replica of the database it needs? Transparent relocation of services?  Connectivity, firewalls and constraints on outgoing connections  Authentication and authorization  Integration between DB vendor and LCG security models  Installation and configuration  Database server and client installation kits Which client bindings are required? –C, C++, Java(JDBC), Perl,..  Server administration procedures and tools? Even basic agreements to simplify the distributed operation  Server and client version upgrades (eg security patches) How, if transparency is required for high availability?  Backup and recovery  Backup policy templates, Responsible site(s) for a particular data type?  Acceptablelatency for recovery?

18 Initial list of possible evaluation tasks  Oracle replication study  Eg Continue/extend work started during CMS DC04  Focus: stability, data rates, conflict handling  DB File based distribution  Eg shipping complete MySQL DBs or Oracle tablespaces  Focus: deployment impact on existing applications  Application specific cross vendor extraction  Eg Extracting a subset of Conditions Data to a T2 site  Focus: complete support of experiment computing model use cases  Web Proxy based data distribution  Eg Integrate this technology into relational abstraction layer  Focus: cache control, efficient data transfer  Other Generic Vendor-to-Vendor bridges  Eg Streams interface to MySQL  Focus: feasibility, fault tolerance, application impact

19 Proposed Mandate, Timescale & Deliverables  Define in collaboration with the experiments and Tier0-2 service providers an reliable LCG infrastructure which allows to store the database data and distribute it (if necessary) for use from physics applications and grid services. The target delivery date for a first service should be in time for the 2005 data challenges.  The project could/should run as part of the LCG deployment area in close collaboration with the application area as provider of application requirements and db abstraction solutions.  Main deliverables should be  An inventory of data types and their properties (incl. distribution)  A service definition document to be agreed between experiments and LCG sites  A service implementation document to be agreed between LCG sites  Status reports to the established LCG comitees  Final decisions are obtained via PEB and GDB

20 Summary  Sufficient agreement about necessity of a db deployment service in ATLAS, CMS and LHCb and indirectly (via gLite) from ALICE  Propose to start database deployment project now as part of the LCG deployment area with strong application area and Tier 0/1/2 site involvement  First step would be to nominate  Two participants who represent the experiment s/w development and deployment activities in data inventory discussions Participants from major grid services (LCG RLS, EGEE)  Participents from T1 & T2 sites which can put in technology expertise and some effort to participate in technology evaluations  Both will require significant work (>0.3 FTE)  Present a first data inventory and a more detailed work plan at a GDB 2 month after project start  Validate proposed distribution model against requirements from data inventory  Confirm feasibility of model components in evaluation tasks with sites  Understand model consequences with developers of key applications