Distributed Databases in HEP

Distributed Databases in HEP
Igor A. Gaponenko (LBNL/NERSC) 10/28/2005 Distributed Databases in HENPC Group Meeting (LBNL)

Distributed Databases in HEP @ HENPC Group Meeting (LBNL)
Contents Why is that a problem? Mainly to get a smooth start of the talk  LCG 3D Project Goals, clients, approaches, technologies, status… RAL ORACLE Streams FroNTier News from the ROOT databases front! The distributed CDB of BaBar (whiteboard drawings?) Conclusions 10/28/2005 Distributed Databases in HENPC Group Meeting (LBNL)

Why is that a problem at all?
Shortly, because… Objective reasons: HEP experiments has overgrown limits of a single computer center, where all data used to be stored and most of the data processing/analysis used to be conducted Subjective reasons: We’ve been “spoiled” with tremendous advances in programming and database technologies (RDBMS, OODBMS) Contemporary experiments have distributed: Data processing (reconstruction) Events simulation (production) Physics analysis As a result, the data: Get created in many locations simultaneously Get consumed in many more locations (if count analysis) Not only need we to distributed the event data but an environment to interpret the data has to be passed around. The typical “environment” includes: Detector geometry Detector alignments Conditions Calibrations Run parameters, configurations, etc… 10/28/2005 Distributed Databases in HENPC Group Meeting (LBNL)

Why Databases? One may ask a question: Why not to ship _all_ the environment along with events? …As it used to be like this in the old “good” data.. An answer is found in a storage and usage model for the environment data: Significantly more complex and diverse from a structural point of view Produced separately from events The same event (collection) can be produced and/or interpreted in different environments (calibrations is an example) quite often resulting in one-to-many relationship between an events and environments Sometimes a choice should be done dynamically Besides, storing the environment for each event (and even for each collection) can be too expensive A bottom line: Even though event data and the environment are related to each other – objectively they produced, distributed in essentially different ways! Contemporary databases (as a technology in a broad sense) provide a good mechanism to store the environment. And that mechanism is: Flexible Extendable Tunable 10/28/2005 Distributed Databases in HENPC Group Meeting (LBNL)

(some slides borrowed from others)
LCG 3D (some slides borrowed from others) 10/28/2005 Distributed Databases in HENPC Group Meeting (LBNL)

LCG 3D 3D stands for “Distributed Databases Deployment” Established: Fall 2004 Project Leader: Dirk Duellmann (CERN/IT) Web site: 3 workshops so far: Oct 2004, Jan 2005: technology oriented, evaluations Oct 2005 : preparation for large scale deployments Join project between: “Service users” (experiments, s/w projects) “Service providers” (LCG tiers) Major clients/participants/contributors: (ATLAS, ALICE, LHCb) CMS goes its own way (FroNTier)!!! 3D 3 Workshops 3 Clients  10/28/2005 Distributed Databases in HENPC Group Meeting (LBNL)

LCG 3D: Goals Given: Experiments are using (or planning to use) RDBMS-s 3D is an attempt to introduce common standards, services as a part LCG to help experiments with a problem of distributing non-event data Declared goals: Define distributed database services and application access allowing LCG applications and services to find relevant database back-ends, authenticate and use the provided data in a location independent way. Help to avoid the costly parallel development of data distribution, backup and high availability mechanisms in each experiment or grid site in order to limit the support costs. Enable a distributed deployment of an LCG database infrastructure with a minimal number of LCG database administration personnel. Quoted from LCG 3D Web Site 10/28/2005 Distributed Databases in HENPC Group Meeting (LBNL)

LCG 3D: Non-Goals Dirk Duellmann’s slide Store all database data Experiments are free to deploy databases and replicate data under their responsibility Setup a single monolithic distributed database system Given constraints like WAN connections one can not assume that a single synchronously updated database would work or give sufficient availability. Setup a single vendor system Technology independence and multi-vendor implementation will be required to minimize the long term risks and to adapt to the different requirements/constraints on different tiers. Impose a CERN centric infrastructure to participating sites CERN is one equal partner of other LCG sites on each tierDecide on an architecture, implementation, new services, policies Produce a technical proposal for all of those to LCG PEB/GDB 10/28/2005 Distributed Databases in HENPC Group Meeting (LBNL)

Supported database technologies (“database services” in 3D)
ORACLE Tier0 (and perhaps Tier1) sites MySQL Tier0 and Tier1 sites Engines: InnoDB (fully ACID compliant) or MyISAM engins Also available in a server-less mode SQLite Tier1+ sites Server-less technology, all in one “database” file ROOT I/O Not quite a database technology Wasn’t originally in a scope of 3D But users badly want it!!! 10/28/2005 Distributed Databases in HENPC Group Meeting (LBNL)

Targeted database applications
Those used in events reconstruction and/or analysis: Run configurations/parameters Detector description/geometry Detector alignment Conditions (calibrations, constants) General kinds Detector construction Monitoring Bookkeeping LCG LFC catalogs Etc. Quite often these three are combined into the Conditions/DB 10/28/2005 Distributed Databases in HENPC Group Meeting (LBNL)

General approach to the distribution
Generally follow BaBar’s CDB approach (deployed 3 years ago): Writable Master(-s) -> read-only Replicas Simple to manage/synchronize Unlimited scalability (for readers) Have writable database(-s) at central location(-s) (Tier 0) Use a reliable technology (Oracle, MySQL/InnoDB) Produce read-only copies to be used read-only elsewhere (Tier 1, 2, …) Use “free” database technologies MySQL, SQLite Translate into non-database ROOT based format (ALICE) Synchronize database installations using LCG 3D services or by other (experiment specific) methods (see subsequent slides for more info) Also an alternative (to local database replicas) option of using automatic caches is under investigation (by 3D): FroNTier (FNAL) Not much progress so far 10/28/2005 Distributed Databases in HENPC Group Meeting (LBNL)

Starting Point for a Service Architecture?
Dirk Duellmann’s slide M M O T0 - autonomous T3/4 T1- db back bone - all data replicated - reliable service T2 - local db cache -subset data -only local service O M O M Oracle Streams Cross vendor extract MySQL Files Proxy Cache 10/28/2005 Distributed Databases in HENPC Group Meeting (LBNL)

Main issues with the distribution
A variety of existing database applications Databases services (ORACLE, MySQL, etc.) aren’t compatible: At a level of implementing SQL standards, database schemas At a level of a common programmatic API Lack of a “out-of-box-across-the-borders” replication tools One of the options suggested by LCG 3D: Introduce RAL – Relation Abstraction Layer (sort of ODBC, JDBC) RAL is (almost) SQL-free C++ “true OO” API Rewrite applications in terms of RAL Makes it easy to implement the data distribution based on RAL (on of the methods) See a separate slideshow 10/28/2005 Distributed Databases in HENPC Group Meeting (LBNL)

Application s/w and Distribution Options
Dirk Duellmann’s slide client s/w APP RAL RAL = relational abstraction layer web cache network SQLite file web cache Oracle MySQL db & cache servers db file storage 10/28/2005 Distributed Databases in HENPC Group Meeting (LBNL)

(All) Distribution Options - and Impact on Deployment and Apps
Dirk Duellmann’s slide DB Vendor native replication Requires same (or at least similar) schema for all applications running against replicas of the database Commercial heterogeneous database replication solutions Relational Abstraction based replication Requires that applications are based on an agreed mapping between different back-ends Possibly enforced by the abstraction layer Otherwise by the application programmer Application level replication Requires common API (or data exchange format) for different implementations of one application Eg POOL File catalogs, ConditionsDB (MySQL/Oracle) Free to choose backend database schema to exploit specific capabilities of a database vendor Eg large table partitioning in the case of the Conditions Database 10/28/2005 Distributed Databases in HENPC Group Meeting (LBNL)

DB Vendor Native Distribution
ORACLE Table-to-table via asynchronous “streams” (see next slides) Potentially extensible to other database vendors through API There seem to be troubles with this (there is a talk on the Oct 2005 LCG 3D Workshop) Has been successfully evaluated by CERN/IT MySQL Native replication mechanism exists ATLAS has some progress in testing this in a cooperation with 3D BaBar is considering this for the migrated CDB and Configuration databases 10/28/2005 Distributed Databases in HENPC Group Meeting (LBNL)

(ORACLE) STREAMS Overview
Eva Dafonte Perez’s slide Flexible feature for information sharing Basic elements: Capture Staging Consumption Replicate data from one database to one or more databases Databases can be non identical copies 10/28/2005 Distributed Databases in HENPC Group Meeting (LBNL)

TARGET DATABASE (replica)
STREAMS Architecture Eva Dafonte Perez’s slide capture staging consumption CAPTURE PROCESS LCRs SOURCE QUEUE capture changes propagate events log changes LCRs APPLY PROCESS apply changes REDO LOG DESTINATION QUEUE SOURCE DATABASE TARGET DATABASE (replica) user changes 10/28/2005 Distributed Databases in HENPC Group Meeting (LBNL)

TESTBED Configuration
Eva Dafonte Perez’s slide insert into emp values ( 03, “Manuel”, ….) create table emp ( id number, name varchar2, ….) EMP 03 Manuel … EMP CERN CNAF RAL Sinica FNAL GridKA BNL SOURCE DATABASE FroNtier & OEM EMP EMP 03 Manuel … EMP 03 Manuel … EMP EMP 03 Manuel … EMP EMP 03 Manuel … EMP EMP EMP 03 Manuel … 10/28/2005 Distributed Databases in HENPC Group Meeting (LBNL)

MySQL replication & clusters
One-way asynchronous replication, similar to what’s found in ORACLE Designed for performance of read-only operations SELECT and alike queries Based on capturing changes stored in a “binary log” file Full and incremental replications supported A chain (tree) of replications is also possible Provides a foundation for non-intrusive (to a master database) backups Backups require to make a shutdown of a server, replications – don’t. Therefore backups can be made on a slave rather than directly on a master. “Cluster” Synchronous replication Designed for performance of both update and read-only operations CREATE, INSERT, UPDATE queries 10/28/2005 Distributed Databases in HENPC Group Meeting (LBNL)

10/28/2005 Distributed Databases in HENPC Group Meeting (LBNL)

FroNTier (FNAL) (slides borrowed) 10/28/2005 Distributed Databases in HENPC Group Meeting (LBNL)

The FroNtier Project Goal: Assemble a toolkit, using standard web technologies, to provide high performance, scalable, database access through a stateless, multi-tier architecture. Pilot project Ntier tested the technology: Tomcat, HTTP, Squid Client monitoring w/ existing CDF tools (udp messages) FroNtier project was established to provide a production system for CDF and other interested users 10/28/2005 Distributed Databases in HENPC Group Meeting (LBNL)

FroNtier Overview C++ Headers and Stubs Client FroNtier Client API Library HTTP Caching Squid Proxy/Caching Server HTTP CDF Persistent Object Templates (Java) XML Server Descriptors FroNtier Server FroNtier Servlet running under Tomcat This shows the same structure as the previous slide, along with the implementation explanation on the left. In white: CDF pieces, from previous solution In yellow: FroNtier components JDBC Database (or other persistency service) DDL for Table Descriptions Database FroNtier components in yellow 10/28/2005 Distributed Databases in HENPC Group Meeting (LBNL)

The FroNtier Servlet Client Client sends request (URI) Command Parser translates URI into commands + values Servicer Factory gets XSD (XML Server Descriptor) from database and Instantiates a Servicer Servicer queries database and Results sent for encoding Encoder marshals (serializes) the data to requesting client 7 1 Encoder Command Parser 6 2 Servicer 4 Servicer Factory STRESS that the descriptor is retrieved from “outside” not sent by the client. Descriptor in the database is a convenience. Stress the flexibility given by the descriptor: database table can be changed without changing code. -- no client code changes (which was a requirement) -- not even servelet code changes! 5 3 Calibration Database XSD Database 10/28/2005 Distributed Databases in HENPC Group Meeting (LBNL)

FroNtier XML Server Descriptor (XSD)
<descriptor type="CalibRunLists“ version="1" xsdversion="1"> <attribute position="1" type="int" field="calib_run" /> <attribute position="2" type="int" field="calib_version" /> <attribute position="3" type="string" field="data_status" /> <select> calib_run, calib_version, data_status </select> <from> CalibRunLists </from> <where> <clause> cid </clause> <param position="1" type="int" key="cid"/> </where> <final> </final> </descriptor> Object name and version information Response description The SQL mapping to the database Select statement From statement Where clause Special modifiers (order by, etc) 10/28/2005 Distributed Databases in HENPC Group Meeting (LBNL)

FroNtier client API features
Compatible with C and C++ Portable 32 and 64 bit systems tested Transparent object access Type conversion detection Preserves data integrity Multi-object requests Easy runtime configuration Extensive error reporting Adjustable log levels User application FroNtier API “Transparent object access”: user doesn’t have to know about transport, etc. He only knows about the object he deals with. FroNtier Service 10/28/2005 Distributed Databases in HENPC Group Meeting (LBNL)

CDF FroNtier Testing at FNAL/SDSC (San Diego Super Computing Center)
Access times for direct Oracle and Frontier SDCS CAF SiChipPed SvxBeamPosition Oracle Oracle SDSC Squid Frontier Frontier FNAL Launchpad 1e-03 1e+01 1e-03 1.0 Access time (s) Access time (s) SiChipPed objects are usually about 0.5 MB, up to 1.7 MB in size. (Silicon Chip Pedestals) SvxBeamPosition objects are 502 Bytes (Silicon tracker beam position) The real savings are also in the reduced DB access. CDF Oracle @FNAL 10/28/2005 Distributed Databases in HENPC Group Meeting (LBNL)

News from the ROOT v5 front (ROOT team has made a bid in the distributed DB business ) (slides borrowed from Rene’s talk presented at October 2005 LCG 3D Workshop) 10/28/2005 Distributed Databases in HENPC Group Meeting (LBNL)

ROOT File types & Access (SQL implemented in 1999)
user Local File X.xml TFile TKey/TTree TStreamerInfo TSQLServer TSQLRow TSQLResult http rootd/xrootd Oracle Local File X.root MySQL Dcache Castor PgSQL RFIO Chirp SapDb 10/28/2005 Distributed Databases in HENPC Group Meeting (LBNL)

RDBC (from V.Onuchin) (implemented in 2000)
used by Phenix and Minos The RDBC aims for JDBC 2.0 compliance. It contains the set of classes corresponding to JDBC 2.0 one TSQLDriverManager, TSQLConnection, TSQLStatement, TSQLPreparedStatement, TSQLCallableStatement, TSQLResultSet, TSQLResultSetMetadata, TSQLDatabaseMetadata The RDBC aims for ROOT SQL compliance, e.g. TSQLResult is subclass of TSQLResultSet RDBC implementation is based on libodbc++ library ( developed by Manush Dodunekov Connection string can by either JDBC style i.e. <dbms>://<host>[:<port>][/<database>], or ODBC style (as DSN) e.g. "dsn=minos;uid=scott;pwd=tiger" Exceptions handling is implemented via ROOT signal-slot communication mechanism. RDBC has an interface which allows to store ROOT objects in relational database as BLOBs. For example, it is possible to store ROOT histograms, trees as a cells of SQL table. RDBC provides connection pooling, i.e. reusing opened connections during ROOT session. RDBC has an interface which allows to convert TSQLResultSets to ROOT TTrees RDBC with Carrot (ROOT Apache Module) allows to create three-tier architecture. 10/28/2005 Distributed Databases in HENPC Group Meeting (LBNL)

File types & Access in 5.04 user Local File X.xml TTreeSQL TFile TKey/TTree TStreamerInfo TSQLServer TSQLRow TSQLResult http rootd/xrootd Oracle Local File X.root MySQL Dcache Castor PgSQL RFIO Chirp SapDb 10/28/2005 Distributed Databases in HENPC Group Meeting (LBNL)

New RDBMS interface in v5
New class TTreeSQL support the TTree containing branches created using a leaf list (eg. hsimple.C). Access any RDBMS tables from TTree::Draw Create a TTree in split mode  creating a RDBMS table and filling it. The table can be processed by SQL directly. The interface uses the normal I/O engine including support for Automatic Schema Evolution. 10/28/2005 Distributed Databases in HENPC Group Meeting (LBNL)

TTreeSQL Syntax Currently: ROOT: MySQL: Coming: TFile *file = new TFile("simple.root","RECREATE"); TTree *tree; file->GetObject(“ntuple”,tree); TSQLServer*dbserver = TSQLServer::Connect("mysql://…”,db,user,passwd); TTree *tree = new TTreeSQL(dbserver,"rootDev","ntuple"); TTree *tree = TTree::Open(“root:/simple.root/ntuple”); TTree *tree = TTree::Open(“mysql://host../rootDev/ntuple”); 10/28/2005 Distributed Databases in HENPC Group Meeting (LBNL)

ROOT & RDBMS go & nogo ROOT interface with RDBMS is minimal Because there are many different use cases, we see many users with their own interface that seems appropriate in most cases. Because of scalability issues, the move to read-only files in a distributed environment is becoming obvious. We prefer to invest in a direction that we believe is very important for data analysis: Optimize the use of read-only files in a distributed environment: size, read speed, read ahead & cache, selective reads (rows &columns) with Trees. Optimize the performance: xrootd, load balancing, authentication with caching for interaction, robustness. 10/28/2005 Distributed Databases in HENPC Group Meeting (LBNL)

TArchiveFile and TZIPFile
TArchiveFile is an abstract class that describes an archive file containing multiple sub-files, like a ZIP or TAR archive. The TZIPFile class describes a ZIP archive file containing multiple ROOT sub-files. Notice that the ROOT files should not be compressed when being added to the ZIP file, since ROOT files are normally already compressed. To create the file multi.zip do: The ROOT files in an archive can be simply accessed like this: A TBrowser and TChain interface will follow shortly. zip –n root multi file1.root file2.root TFile *f = TFile::Open("multi.zip#file2.root") or TFile *f = TFile::Open("root://mymachine/multi.zip#2") 10/28/2005 Distributed Databases in HENPC Group Meeting (LBNL)

Class TGrid (abstract interface)
//--- General GRID const char *GridUrl() const const char *GetGrid() const const char *GetHost() const const char *GetUser() const const char *GetPw() const const char *GetOptions() const Int_t GetPort() const //--- Catalogue Interface virtual TGridResult *Command(const char *command, Bool_t interactive = kFALSE, UInt_t stream = kFALSE) virtual TGridResult *Query(const char *path, const char *pattern, const char *conditions, const char *options) virtual TGridResult *LocateSites() virtual TGridResult *ls(const char*ldn ="", Option_t*options ="") virtual Bool_t cd(const char*ldn ="",Bool_t verbose =kFALSE) virtual Bool_t mkdir(const char*ldn ="", Option_t*options ="") virtual Bool_t rmdir(const char*ldn ="", Option_t*options ="") virtual Bool_t register(const char *lfn , const char *turl , Long_t size, const char *se, const char *guid) virtual Bool_t rm(const char*lfn , Option_t*option ="") //--- Job Submission Interface virtual TGridJob *Submit(const char *jdl) virtual TGridJDL *GetJDLGenerator() //--- Load desired plugin and setup conection to GRID static TGrid *Connect(const char *grid, const char *uid, const char *pw, const char *options) 10/28/2005 Distributed Databases in HENPC Group Meeting (LBNL)

Access to File Catalogues eg Alien FC
Same style interface could be implemented for Other GRID File Catalogues 10/28/2005 Distributed Databases in HENPC Group Meeting (LBNL)

TGrid example with Alien // Connect TGrid alien = TGrid::Connect(“alien://”); // Query TGridResult *res =alien.Query (“/alice/cern.ch/user/p/peters/analysis/miniesd/”, ”*.root“); // List of files TList *listf = res->GetFileInfoList(); // Create chain TChain chain(“Events", “session"); Chain.AddFileInfoList(listf); // Start PROOF TProof proof(“remote”); // Process your query Chain.Process(“selector.C”); 10/28/2005 Distributed Databases in HENPC Group Meeting (LBNL)

Replica of a DB subset T0 T1 local TZipFile remote TZipFile http, xrootd, castor, dcache.. 10/28/2005 Distributed Databases in HENPC Group Meeting (LBNL)

Current status: Who uses What
ALICE PostgreSQL: Detector Construction DB ORACLE: Detector Construction (read-only copy at CERN) MySQL: DAQ/ONLINE ROOT files Condition/DB: Basically the only database required to be distributed Using GRID distributed catalogs service Very little (if any?) use of 3D ATLAS Most advanced use of databases compared to others ORACLE, MySQL via RAL for Conditions (COOL), Geometry, DD, POOL catalogs, some POOL collections (via Object-to-Relational mapping) SQLite for distributed Geometry LHCb Everything in Tier0 & Tier1, no databases in Tier2+ CMS (also CDF) Much less developed compared to others FroNtier: For everything? Actual databases are hidden behind the scene BaBar Objectivity/DB (in a process of phasing out) ROOT I/O for all database applications MySQL + ROOT (as BLOB-s) for CDB only, MySQL only for Config/DB 10/28/2005 Distributed Databases in HENPC Group Meeting (LBNL)

Conclusions Extensive use of database technology in (LHC) HEP experiments and keep growing Various RDBMS and non (ROOT) are in use Very little progress in establishing common database distribution services LCG 3D doesn’t seem to play the role it may potentially do, perhaps it’s just a matter of time(?) Very little progress in establishing common standards for database applications and their implementations COOL is the only noticeable exception. Though it has its own problems in a conceptual model (not technology neutral, no room for ROOT) A bottom line: Still a “zoo”, even within experiments  Though, a significant progress has been made in understanding on how things “should look like”  10/28/2005 Distributed Databases in HENPC Group Meeting (LBNL)

Distributed Databases in HEP

Similar presentations

Presentation on theme: "Distributed Databases in HEP"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Distributed Databases in HEP

Similar presentations

Presentation on theme: "Distributed Databases in HEP"— Presentation transcript:

Similar presentations

About project

Feedback