10/28/2005Distributed Databases in HENPC Group Meeting (LBNL)1 Distributed Databases in HEP Igor A. Gaponenko (LBNL/NERSC)

10/28/2005Distributed Databases in HEP @ HENPC Group Meeting (LBNL)1 Distributed Databases in HEP Igor A. Gaponenko (LBNL/NERSC) IAGaponenko@lbl.gov

10/28/2005Distributed Databases in HEP @ HENPC Group Meeting (LBNL)2 Contents Why is that a problem? –Mainly to get a smooth start of the talk LCG 3D Project –Goals, clients, approaches, technologies, status… RAL ORACLE Streams FroNTier News from the ROOT databases front! The distributed CDB of BaBar (whiteboard drawings?) Conclusions

10/28/2005Distributed Databases in HEP @ HENPC Group Meeting (LBNL)3 Why is that a problem at all? Shortly, because… –Objective reasons: HEP experiments has overgrown limits of a single computer center, where all data used to be stored and most of the data processing/analysis used to be conducted –Subjective reasons: We’ve been “spoiled” with tremendous advances in programming and database technologies (RDBMS, OODBMS) Contemporary experiments have distributed: –Data processing (reconstruction) –Events simulation (production) –Physics analysis As a result, the data: –Get created in many locations simultaneously –Get consumed in many more locations (if count analysis) Not only need we to distributed the event data but an environment to interpret the data has to be passed around. The typical “environment” includes: –Detector geometry –Detector alignments –Conditions –Calibrations –Run parameters, configurations, etc…

10/28/2005Distributed Databases in HEP @ HENPC Group Meeting (LBNL)4 Why Databases? One may ask a question: Why not to ship _all_ the environment along with events? –…As it used to be like this in the old “good” data.. An answer is found in a storage and usage model for the environment data: –Significantly more complex and diverse from a structural point of view –Produced separately from events –The same event (collection) can be produced and/or interpreted in different environments (calibrations is an example) quite often resulting in one-to-many relationship between an events and environments Sometimes a choice should be done dynamically –Besides, storing the environment for each event (and even for each collection) can be too expensive A bottom line: –Even though event data and the environment are related to each other – objectively they produced, distributed in essentially different ways! Contemporary databases (as a technology in a broad sense) provide a good mechanism to store the environment. And that mechanism is: –Flexible –Extendable –Tunable

10/28/2005Distributed Databases in HEP @ HENPC Group Meeting (LBNL)5 LCG 3D (some slides borrowed from others)

10/28/2005Distributed Databases in HEP @ HENPC Group Meeting (LBNL)6 LCG 3D 3D stands for “Distributed Databases Deployment” Established: Fall 2004 Project Leader: Dirk Duellmann (CERN/IT) Web site: https://uimon.cern.ch/twiki/bin/view/ADCgroup/LCG3DWiki 3 workshops so far: –Oct 2004, Jan 2005: technology oriented, evaluations –Oct 2005 : preparation for large scale deployments Join project between: –“Service users” (experiments, s/w projects) –“Service providers” (LCG tiers) Major clients/participants/contributors: –(ATLAS, ALICE, LHCb) –CMS goes its own way (FroNTier)!!! 3D 3 Workshops 3 Clients

10/28/2005Distributed Databases in HEP @ HENPC Group Meeting (LBNL)7 LCG 3D: Goals Given: Experiments are using (or planning to use) RDBMS-s 3D is an attempt to introduce common standards, services as a part LCG to help experiments with a problem of distributing non-event data Declared goals: Define distributed database services and application access allowing LCG applications and services to find relevant database back-ends, authenticate and use the provided data in a location independent way. Help to avoid the costly parallel development of data distribution, backup and high availability mechanisms in each experiment or grid site in order to limit the support costs. Enable a distributed deployment of an LCG database infrastructure with a minimal number of LCG database administration personnel. Quoted from LCG 3D Web Site

10/28/2005Distributed Databases in HEP @ HENPC Group Meeting (LBNL)8 LCG 3D: Non-Goals Store all database data Experiments are free to deploy databases and replicate data under their responsibility Setup a single monolithic distributed database system Given constraints like WAN connections one can not assume that a single synchronously updated database would work or give sufficient availability. Setup a single vendor system Technology independence and multi-vendor implementation will be required to minimize the long term risks and to adapt to the different requirements/constraints on different tiers. Impose a CERN centric infrastructure to participating sites CERN is one equal partner of other LCG sites on each tierDecide on an architecture, implementation, new services, policies Produce a technical proposal for all of those to LCG PEB/GDB Dirk Duellmann’s slide

10/28/2005Distributed Databases in HEP @ HENPC Group Meeting (LBNL)9 Supported database technologies (“database services” in 3D) ORACLE –Tier0 (and perhaps Tier1) sites MySQL –Tier0 and Tier1 sites –Engines: InnoDB (fully ACID compliant) or MyISAM engins –Also available in a server-less mode SQLite –Tier1+ sites –Server-less technology, all in one “database” file ROOT I/O –Tier1+ sites –Not quite a database technology Wasn’t originally in a scope of 3D But users badly want it!!!

10/28/2005Distributed Databases in HEP @ HENPC Group Meeting (LBNL)10 Targeted database applications Those used in events reconstruction and/or analysis: –Run configurations/parameters –Detector description/geometry –Detector alignment –Conditions (calibrations, constants) General kinds –Detector construction –Monitoring –Bookkeeping –LCG LFC catalogs –Etc. Quite often these three are combined into the Conditions/DB

10/28/2005Distributed Databases in HEP @ HENPC Group Meeting (LBNL)11 General approach to the distribution Generally follow BaBar’s CDB approach (deployed 3 years ago): –Writable Master(-s) -> read-only Replicas Simple to manage/synchronize Unlimited scalability (for readers) Have writable database(-s) at central location(-s) (Tier 0) –Use a reliable technology (Oracle, MySQL/InnoDB) Produce read-only copies to be used read-only elsewhere (Tier 1, 2, …) –Use “free” database technologies MySQL, SQLite –Translate into non-database ROOT based format (ALICE) Synchronize database installations using LCG 3D services or by other (experiment specific) methods (see subsequent slides for more info) Also an alternative (to local database replicas) option of using automatic caches is under investigation (by 3D): –FroNTier (FNAL) –Not much progress so far

10/28/2005Distributed Databases in HEP @ HENPC Group Meeting (LBNL)12 M Starting Point for a Service Architecture? O O O M T1- db back bone - all data replicated - reliable service T2 - local db cache -subset data -only local service T3/4 MM T0 - autonomous Oracle Streams Cross vendor extract MySQL Files Proxy Cache Dirk Duellmann’s slide

10/28/2005Distributed Databases in HEP @ HENPC Group Meeting (LBNL)13 Main issues with the distribution A variety of existing database applications Databases services (ORACLE, MySQL, etc.) aren’t compatible: –At a level of implementing SQL standards, database schemas –At a level of a common programmatic API –Lack of a “out-of-box-across-the-borders” replication tools One of the options suggested by LCG 3D: –Introduce RAL – Relation Abstraction Layer (sort of ODBC, JDBC) –RAL is (almost) SQL-free C++ “true OO” API –Rewrite applications in terms of RAL –Makes it easy to implement the data distribution based on RAL (on of the methods) See a separate slideshow

10/28/2005Distributed Databases in HEP @ HENPC Group Meeting (LBNL)14 OracleMySQL RAL APP network db file storage db & cache servers client s/w web cache web cache SQLite file Application s/w and Distribution Options RAL = relational abstraction layer Dirk Duellmann’s slide

10/28/2005Distributed Databases in HEP @ HENPC Group Meeting (LBNL)15 (All) Distribution Options - and Impact on Deployment and Apps DB Vendor native replication Requires same (or at least similar) schema for all applications running against replicas of the database Commercial heterogeneous database replication solutions Relational Abstraction based replication Requires that applications are based on an agreed mapping between different back-ends Possibly enforced by the abstraction layer Otherwise by the application programmer Application level replication Requires common API (or data exchange format) for different implementations of one application Eg POOL File catalogs, ConditionsDB (MySQL/Oracle) Free to choose backend database schema to exploit specific capabilities of a database vendor Eg large table partitioning in the case of the Conditions Database Dirk Duellmann’s slide

10/28/2005Distributed Databases in HEP @ HENPC Group Meeting (LBNL)16 DB Vendor Native Distribution ORACLE –Table-to-table via asynchronous “streams” (see next slides) –Potentially extensible to other database vendors through API There seem to be troubles with this (there is a talk on the Oct 2005 LCG 3D Workshop) –Has been successfully evaluated by CERN/IT MySQL –Native replication mechanism exists –ATLAS has some progress in testing this in a cooperation with 3D –BaBar is considering this for the migrated CDB and Configuration databases

10/28/2005Distributed Databases in HEP @ HENPC Group Meeting (LBNL)17 (ORACLE) STREAMS Overview Flexible feature for information sharing Basic elements: –Capture –Staging –Consumption Replicate data from one database to one or more databases Databases can be non identical copies Eva Dafonte Perez’s slide

10/28/2005Distributed Databases in HEP @ HENPC Group Meeting (LBNL)18 STREAMS Architecture CAPTURE PROCESS APPLY PROCESS user changes REDO LOG log changes capture changes LCRs SOURCE QUEUE DESTINATION QUEUE propagate events LCRs apply changes SOURCE DATABASE TARGET DATABASE (replica) capture stagingconsumption Eva Dafonte Perez’s slide

10/28/2005Distributed Databases in HEP @ HENPC Group Meeting (LBNL)19 TESTBED Configuration CERN CNAF RAL Sinica FNAL GridKA BNL CERN SOURCE DATABASE FroNtier & OEM create table emp ( id number, name varchar2, ….) EMP insert into emp values ( 03, “Manuel”, ….) EMP 03Manuel… EMP 03Manuel… EMP 03Manuel… EMP 03Manuel… EMP 03Manuel… EMP 03Manuel… Eva Dafonte Perez’s slide

10/28/2005Distributed Databases in HEP @ HENPC Group Meeting (LBNL)20 MySQL replication & clusters “Replication” –One-way asynchronous replication, similar to what’s found in ORACLE –Designed for performance of read-only operations SELECT and alike queries –Based on capturing changes stored in a “binary log” file –Full and incremental replications supported –A chain (tree) of replications is also possible –Provides a foundation for non-intrusive (to a master database) backups Backups require to make a shutdown of a server, replications – don’t. Therefore backups can be made on a slave rather than directly on a master. “Cluster” –Synchronous replication –Designed for performance of both update and read-only operations CREATE, INSERT, UPDATE queries

10/28/2005Distributed Databases in HEP @ HENPC Group Meeting (LBNL)21

10/28/2005Distributed Databases in HEP @ HENPC Group Meeting (LBNL)22 FroNTier (FNAL) (slides borrowed)

10/28/2005Distributed Databases in HEP @ HENPC Group Meeting (LBNL)23 The FroNtier Project Goal: Assemble a toolkit, using standard web technologies, to provide high performance, scalable, database access through a stateless, multi- tier architecture. Pilot project Ntier tested the technology: –Tomcat, HTTP, Squid –Client monitoring w/ existing CDF tools (udp messages) FroNtier project was established to provide a production system for CDF and other interested users http://whcdf03.fnal.gov/ntier-wiki/FrontPage

10/28/2005Distributed Databases in HEP @ HENPC Group Meeting (LBNL)24 FroNtier Overview CDF Persistent Object Templates (Java) FroNtier components in yellow Client Caching FroNtier Server Database FroNtier Client API Library Squid Proxy/Caching Server FroNtier Servlet running under Tomcat Database (or other persistency service) XML Server Descriptors DDL for Table Descriptions C++ Headers and Stubs JDBC HTTP

10/28/2005Distributed Databases in HEP @ HENPC Group Meeting (LBNL)25 CalibrationDatabase The FroNtier Servlet 1.Client sends request (URI) 2.Command Parser translates URI into commands + values 3.Servicer Factory gets XSD (XML Server Descriptor) from database and 4.Instantiates a Servicer 5.Servicer queries database and 6.Results sent for encoding 7.Encoder marshals (serializes) the data to requesting client XSDDatabase CommandParser Servicer Factory Servicer Encoder Client 1 2 3 4 5 6 7

10/28/2005Distributed Databases in HEP @ HENPC Group Meeting (LBNL)26 FroNtier XML Server Descriptor (XSD) Object name and version information Response description The SQL mapping to the database –Select statement –From statement –Where clause –Special modifiers (order by, etc) calib_run, calib_version, data_status CalibRunLists cid = @param

10/28/2005Distributed Databases in HEP @ HENPC Group Meeting (LBNL)27 FroNtier use of Squid Cache HTTP Proxy Caching Server: http://www.squid-cache.org –Well documented, widespread operational experience –Easily installed and maintained –Highly configurable for access control, disk cache tuning, distributed cache peer relationships, and more. –Monitoring built in through SNMP-2 interface Cache Refresh options –Servlet: expiration time sent in HTTP header –Client: forced object refresh through request –Administrative: Delete each Squid’s cache files and rebuild the cache However, the objects being delivered are generally not changing, so a static cache meets most requirements.

10/28/2005Distributed Databases in HEP @ HENPC Group Meeting (LBNL)28 FroNtier client API features Compatible with C and C++ Portable –32 and 64 bit systems tested Transparent object access –Type conversion detection –Preserves data integrity Multi-object requests Easy runtime configuration Extensive error reporting –Adjustable log levels FroNtier Service User application FroNtier API

10/28/2005Distributed Databases in HEP @ HENPC Group Meeting (LBNL)29 CDF FroNtier Testing at FNAL/SDSC (San Diego Super Computing Center) FNAL Launchpad SDSC Squid SDCS CAF CDF Oracle @FNAL SiChipPed objects are usually about 0.5 MB, up to 1.7 MB in size. (Silicon Chip Pedestals) SvxBeamPosition objects are 502 Bytes (Silicon tracker beam position) The real savings are also in the reduced DB access. Access times for direct Oracle and Frontier Oracle Frontier Oracle Frontier SiChipPed SvxBeamPosition 1e-03 1e+011e-03 1.0 Access time (s)

10/28/2005Distributed Databases in HEP @ HENPC Group Meeting (LBNL)30 News from the ROOT v5 front ( ROOT team has made a bid in the distributed DB business ) (slides borrowed from Rene’s talk presented at October 2005 LCG 3D Workshop)

10/28/2005Distributed Databases in HEP @ HENPC Group Meeting (LBNL)31 ROOT File types & Access (SQL implemented in 1999) Local File X.xml RFIOChirp CastorDcache Local File X.root http rootd/xrootd Oracle SapDb PgSQL MySQL TFile TKey/TTree TStreamerInfo user TSQLServer TSQLRow TSQLResult

10/28/2005Distributed Databases in HEP @ HENPC Group Meeting (LBNL)32 RDBC (from V.Onuchin) (implemented in 2000) The RDBC aims for JDBC 2.0 compliance. –It contains the set of classes corresponding to JDBC 2.0 one –TSQLDriverManager, TSQLConnection, TSQLStatement, TSQLPreparedStatement, – TSQLCallableStatement, TSQLResultSet, TSQLResultSetMetadata, TSQLDatabaseMetadata The RDBC aims for ROOT SQL compliance, e.g. TSQLResult is subclass of TSQLResultSet RDBC implementation is based on libodbc++ library (http://orcane.net/freeodbc++) developedhttp://orcane.net/freeodbc by Manush Dodunekov manush@stendahls.netmanush@stendahls.net Connection string can by either JDBC style i.e. :// [: ][/ ], or – ODBC style (as DSN) e.g. "dsn=minos;uid=scott;pwd=tiger" Exceptions handling is implemented via ROOT signal-slot communication mechanism. RDBC has an interface which allows to store ROOT objects in relational database as BLOBs. – For example, it is possible to store ROOT histograms, trees as a cells of SQL table. RDBC provides connection pooling, i.e. reusing opened connections during ROOT session. RDBC has an interface which allows to convert TSQLResultSets to ROOT TTrees RDBC with Carrot (ROOT Apache Module) allows to create three-tier architecture. used by Phenix and Minos

10/28/2005Distributed Databases in HEP @ HENPC Group Meeting (LBNL)33 File types & Access in 5.04 Local File X.xml RFIOChirp CastorDcache Local File X.root http rootd/xrootd Oracle SapDb PgSQL MySQL TFile TKey/TTree TStreamerInfo user TSQLServer TSQLRow TSQLResult TTreeSQL

10/28/2005Distributed Databases in HEP @ HENPC Group Meeting (LBNL)34 New RDBMS interface in v5 New class TTreeSQL –support the TTree containing branches created using a leaf list (eg. hsimple.C). Access any RDBMS tables from TTree::Draw Create a TTree in split mode –  creating a RDBMS table and filling it. The table can be processed by SQL directly. The interface uses the normal I/O engine –including support for Automatic Schema Evolution.

10/28/2005Distributed Databases in HEP @ HENPC Group Meeting (LBNL)35 TTreeSQL Syntax Currently: –ROOT: –MySQL: Coming: – TFile *file = new TFile("simple.root","RECREATE"); TTree *tree; file->GetObject(“ntuple”,tree); TSQLServer*dbserver = TSQLServer::Connect("mysql://…”,db,user,passwd); TTree *tree = new TTreeSQL(dbserver,"rootDev","ntuple"); TTree *tree = TTree::Open(“root:/simple.root/ntuple”); TTree *tree = TTree::Open(“mysql://host../rootDev/ntuple”);

10/28/2005Distributed Databases in HEP @ HENPC Group Meeting (LBNL)36 ROOT & RDBMS go & nogo ROOT interface with RDBMS is minimal Because there are many different use cases, we see many users with their own interface that seems appropriate in most cases. Because of scalability issues, the move to read-only files in a distributed environment is becoming obvious. We prefer to invest in a direction that we believe is very important for data analysis: –Optimize the use of read-only files in a distributed environment: size, read speed, read ahead & cache, selective reads (rows &columns) with Trees. –Optimize the performance: xrootd, load balancing, authentication with caching for interaction, robustness.

10/28/2005Distributed Databases in HEP @ HENPC Group Meeting (LBNL)37 TArchiveFile and TZIPFile TArchiveFile is an abstract class that describes an archive file containing multiple sub- files, like a ZIP or TAR archive. The TZIPFile class describes a ZIP archive file containing multiple ROOT sub-files. Notice that the ROOT files should not be compressed when being added to the ZIP file, since ROOT files are normally already compressed. To create the file multi.zip do: The ROOT files in an archive can be simply accessed like this: A TBrowser and TChain interface will follow shortly. zip –n root multi file1.root file2.root TFile *f = TFile::Open("multi.zip#file2.root") or TFile *f = TFile::Open("root://mymachine/multi.zip#2")

10/28/2005Distributed Databases in HEP @ HENPC Group Meeting (LBNL)38 Class TGrid (abstract interface) //--- General GRID const char *GridUrl() const const char *GetGrid() const const char *GetHost() const const char *GetUser() const const char *GetPw() const const char *GetOptions() const Int_t GetPort() const //--- Catalogue Interface virtual TGridResult *Command(const char *command, Bool_t interactive = kFALSE, UInt_t stream = kFALSE) virtual TGridResult *Query(const char *path, const char *pattern, const char *conditions, const char *options) virtual TGridResult *LocateSites() virtual TGridResult *ls(const char*ldn ="", Option_t*options ="") virtual Bool_t cd(const char*ldn ="",Bool_t verbose =kFALSE) virtual Bool_t mkdir(const char*ldn ="", Option_t*options ="") virtual Bool_t rmdir(const char*ldn ="", Option_t*options ="") virtual Bool_t register(const char *lfn, const char *turl, Long_t size, const char *se, const char *guid) virtual Bool_t rm(const char*lfn, Option_t*option ="") //--- Job Submission Interface virtual TGridJob *Submit(const char *jdl) virtual TGridJDL *GetJDLGenerator() //--- Load desired plugin and setup conection to GRID static TGrid *Connect(const char *grid, const char *uid, const char *pw, const char *options)

10/28/2005Distributed Databases in HEP @ HENPC Group Meeting (LBNL)39 Access to File Catalogues eg Alien FC Same style interface could be implemented for Other GRID File Catalogues

10/28/2005Distributed Databases in HEP @ HENPC Group Meeting (LBNL)40 // Connect TGrid alien = TGrid::Connect(“alien://”); // Query TGridResult *res =alien.Query (“/alice/cern.ch/user/p/peters/analysis/miniesd/”, ”*.root“); // List of files TList *listf = res->GetFileInfoList(); // Create chain TChain chain(“Events", “session"); Chain.AddFileInfoList(listf); // Start PROOF TProof proof(“remote”); // Process your query Chain.Process(“selector.C”); TGrid example with Alien

10/28/2005Distributed Databases in HEP @ HENPC Group Meeting (LBNL)41 Replica of a DB subset local TZipFile remote TZipFile T0 T1 http, xrootd, castor, dcache..

10/28/2005Distributed Databases in HEP @ HENPC Group Meeting (LBNL)42 Current status: Who uses What ALICE –PostgreSQL: Detector Construction DB –ORACLE: Detector Construction (read-only copy at CERN) –MySQL: DAQ/ONLINE –ROOT files Condition/DB: –Basically the only database required to be distributed –Using GRID distributed catalogs service –Very little (if any?) use of 3D ATLAS –Most advanced use of databases compared to others –ORACLE, MySQL via RAL for Conditions (COOL), Geometry, DD, POOL catalogs, some POOL collections (via Object-to-Relational mapping) –SQLite for distributed Geometry LHCb –ORACLE: Everything in Tier0 & Tier1, no databases in Tier2+ CMS (also CDF) –Much less developed compared to others –FroNtier: For everything? Actual databases are hidden behind the scene BaBar –Objectivity/DB (in a process of phasing out) –ROOT I/O for all database applications –MySQL + ROOT (as BLOB-s) for CDB only, MySQL only for Config/DB

10/28/2005Distributed Databases in HEP @ HENPC Group Meeting (LBNL)43 Conclusions Extensive use of database technology in (LHC) HEP experiments and keep growing Various RDBMS and non (ROOT) are in use Very little progress in establishing common database distribution services –LCG 3D doesn’t seem to play the role it may potentially do, perhaps it’s just a matter of time(?) Very little progress in establishing common standards for database applications and their implementations –COOL is the only noticeable exception. Though it has its own problems in a conceptual model (not technology neutral, no room for ROOT) A bottom line: –Still a “zoo”, even within experiments  –Though, a significant progress has been made in understanding on how things “should look like”

10/28/2005Distributed Databases in HENPC Group Meeting (LBNL)1 Distributed Databases in HEP Igor A. Gaponenko (LBNL/NERSC)

Similar presentations

Presentation on theme: "10/28/2005Distributed Databases in HENPC Group Meeting (LBNL)1 Distributed Databases in HEP Igor A. Gaponenko (LBNL/NERSC)"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

10/28/2005Distributed Databases in HENPC Group Meeting (LBNL)1 Distributed Databases in HEP Igor A. Gaponenko (LBNL/NERSC)

Similar presentations

Presentation on theme: "10/28/2005Distributed Databases in HENPC Group Meeting (LBNL)1 Distributed Databases in HEP Igor A. Gaponenko (LBNL/NERSC)"— Presentation transcript:

Similar presentations

About project

Feedback