CMS Database Projects Lee Lueking CMS Activity Coordination Meeting July 20, 2004
Topics Status report on HCAL Testbeam Detector DB project, and… Status report on HCAL Testbeam Detector DB project, and… How does the testbeam work fit into the broader CMS DB context? How does the testbeam work fit into the broader CMS DB context? Building a POOL plug-in for FroNtier Building a POOL plug-in for FroNtier LCG Distributed Deployment of Database LCG Distributed Deployment of Database
HCAL Testbeam Detector DB
HCAL Det DB Focus Equipment Configuration DB Equipment Configuration DB –Relationships for all HCAL detector components: Wedges, layers, read-out box (RBX), cables, HCAL Trigger (HTR) cards. –Test results for various components e.g. RBX, QIE. Conditions DB Conditions DB –DCS Slow Controls logging DB: Temperatures, HV, LV, Beam properties, etc. –Calibration DB: Pedestals, gains, timing information for each channel. –Configuration DB: Conf. info that is downloaded to RBX
HCAL Detector Configuration Details
Manpower & Status FNAL Manpower FNAL Manpower –PPD/CMS: Shuichi Kunori (UMD, 0.2), Taka Yasuda (0.2), Stefan Piperov (0.8 0.5), Jordan Damgov(0.8 0.5), Gennadiy Lukhanin ( 1.0 new hire) –CD/CEPA/DBS: Lee Lueking (0.2), Yuyi Guo (0.8) –CD/CSS: Anil Kumar (0.2), Maurine Mihalek (0.0 0.2) Status Status –Schema designs completed for EqConf, SlowCont, Calib DB’s. Extensive reviews of DDL’s finished. Installation on Devel machine in progress. –New Prod servers (DELL PE 2650) at FNAL and CERN w/ RH ES 3.0 loaded. FNAL machine has OS patches installed and Oracle 10G loaded soon. Machine will move to rack in machine room. –Also, a new devel server to get Oracle 10G this week. –Loading scripts for SC logs are ready, tested, and in CVS. SC data waiting to be loaded.
HCAL DetDB in the Broader Context of CMS
The LCG Conditions DB Project Lead by Andrea Valassi (CERN IT). Several (many ATLAS) participants. Strong BaBar influence. Lead by Andrea Valassi (CERN IT). Several (many ATLAS) participants. Strong BaBar influence. The purpose of the ConditionsDB project is to develop software libraries and tools for the LHC experiments to store, retrieve and manipulate conditions data. The purpose of the ConditionsDB project is to develop software libraries and tools for the LHC experiments to store, retrieve and manipulate conditions data. The deliverables of the project include: The deliverables of the project include: –A C++ API to store and retrieve condition data –Concrete implementations of the API using different persistent backends (such as Oracle and MySQL) –Tools to manage, browse, replicate and manipulate the data –Test and example programs Weekly Phone conferences to discuss progress and ideas. Weekly Phone conferences to discuss progress and ideas.
Schema for HCDB calibration Ped, gain, or t0 Algorithm Blob w/ calib info HCAL Calib DBCond DB
Comparing Cond DB HVS w/ HCAL Calib DB (HCDB) Hardware Hardware –HCDB is for a specific sub-detector structure –Cond DB HVS approach is generic Time Time –HCDB uses run ranges for the test beam –Cond DB has IOV (Interval of Validity) Data Data –HCDB is concerned about the relation of the data as it goes in, as well as how it is used. Includes algorithm used. –Cond DB seems to be focused on access to the data. Tagging: Similar concept for both. Tagging: Similar concept for both. –Cond DB offers more flexible approach. –HCDB is simpler with the constraints that run-ranges impose.
The Equipment Management DB (EMDB) Designed and implemented at CERN by Frank Glege to track the location of irradiated hardware for French Government legal reasons. Designed and implemented at CERN by Frank Glege to track the location of irradiated hardware for French Government legal reasons. Production version being used for existing components as the detector is built. Production version being used for existing components as the detector is built. Comparison w/ EqConf DB: Comparison w/ EqConf DB: Feature\DBEMDBEqConfDB Component relationships no (planned) yes Component History no (planned) yes Detailed set of components no (only major) yes Currently in CERN yesno Works w/ all sub-detectors yes no (HCAL)
Sizing DB Resources for CMS Gennadiy is working w/ Frank Glege to estimate the DB needs for CMS. A process to get more detailed info from each detector group is planned. Gennadiy is working w/ Frank Glege to estimate the DB needs for CMS. A process to get more detailed info from each detector group is planned. Configuration Database Configuration Database –Start run info needed daily: 10 GB –Number of configurations (early): 10/Month –Expected addition each Month: ~100 GB Conditions database Conditions database –Average daily dataflow: 2 GB –Expected size after 1 month: ~60 GB
POOL Plug-in for FroNtier
RDBMS Abstraction in POOL From the POOL Project Plan Motivation Motivation –Vendor neutral access to RDBMS backend for relational components in POOL: FileCatalog, collection, relational storage manager. –Driven by CMS requirements: access existing relational data as C++ objects using POOL. E.g. conditions data, configuration data, etc. Requirements collection and component analysis started late last year and finished in March. Requirements collection and component analysis started late last year and finished in March. Implementation started in March and expected to complete in Q3 this year. Implementation started in March and expected to complete in Q3 this year.
POOL Software Design RelationalAccess ObjectRelationalAccess Seal reflection Relational Catalog RelationalC ollection RelationalStorageSvc ODBC OracleSQLite FileCatalogCollectionStorageSvc Technology dependent module Experiment framework uses Abstract interface Implementation implements FroNtier
POOL RDBM Status – FroNtier Proposal POOL RDBM interface status (as of June 2004) POOL RDBM interface status (as of June 2004) –Interface completed: Technology neutral and SQL-free –plugin modules »Oracle(9i OCI), SQLite completed and unit-tested »ODBC in progress –Relational file catalog completed and tested: Validated RelationalAccess interface and the existing plugin modules FroNtier plug-in proposal FroNtier plug-in proposal –Build a FroNtier component for POOL –Reuse existing XML CDF Descriptors or adapt to LCG standards.
Distributed Deployment of Database Information from Dirk Duelmann Presentation made 7/20 (today) to LCG Project Execution Board
Project Goals Help to avoid the costly parallel development of data distribution and backup mechanisms in each experiment or grid site in order to limit the support costs Help to avoid the costly parallel development of data distribution and backup mechanisms in each experiment or grid site in order to limit the support costs Enable a distributed operation of the LCG database infrastructure with a minimal number of LCG database administration personnel. Enable a distributed operation of the LCG database infrastructure with a minimal number of LCG database administration personnel. Define on the application access to database services to allow any LCG application or service to find relevant database back-ends, authenticate and use the provided data in a location independent way Define on the application access to database services to allow any LCG application or service to find relevant database back-ends, authenticate and use the provided data in a location independent way
Project Non-Goals Store all database data Store all database data –Experiments are free to deploy databases and replicate data under their responsibility Setup a single monolithic distributed database system Setup a single monolithic distributed database system –Given the WAN connections and service we can not assume that a single synchronously updated database would work or give sufficient availability. Setup a single vendor system Setup a single vendor system –Technology independence and multi-vendor implementation will be required to minimise the risks and to adapt to the different requirements at T1 and T2 sites Impose a CERN centric infrastructure to participating sites Impose a CERN centric infrastructure to participating sites –CERN is one equal partner of other LCG sites
M Starting Point for a Service Architecture? O O O M T1- db back bone - all data replicated - reliable service T2 - local db cache -subset data -only local service T3/4 M M T0 - autonomous Oracle Streams Cross vendor extract MySQL Files Proxy Cache
Staged Project Evolution Proposal Phase 1 (in place for 2005 data challenges) Proposal Phase 1 (in place for 2005 data challenges) –Focus on T1 back-bone understand the bulk data transfer issues »Given the current service situation a T1 back-bone based on Oracle with streams based replication seems the most promising implementation »Start with T1 sites who have sufficient manpower to actively participate in the project –Prototype vendor independent T1 T2 extraction based on application level or relational abstraction level »This would allow to run vendor dependent database applications on the T2 subset of the data –Define a MySQL service with interested T2 sites »Experiments should point out their MySQL service requirements to the sites »Need candidate sites which are interested in providing a MySQL service and are able to actively contribute its definition Proposal Phase 2 Proposal Phase 2 –Try to extend the heterogeneous T2 setup to T1 sites »By this time real MySQL based services should be established and reliable »Cross vendor replication based on either Oracle streams bridges or relational abstraction may have proven to work and to handle the data volumes
Proposed Project Structure Data Inventory and Distribution Requirements Data Inventory and Distribution Requirements –Members are s/w providers from experiments and grid services based on RDBMS data –Gather data properties (volume, ownership) requirements and integrate the provided service into their software Database Service Definition and Implementation Database Service Definition and Implementation –Members are site technology and deployment experts –Propose an agreeable deployment setup and common deployment procedures Evaluation Tasks Evaluation Tasks –Short, well defined technology evaluations against the requirements delivered by wp1 –Evaluation are proposed by people WP2 (evaluation plan) and typically executed by the people proposing a technology for the service implementation and result in a short evaluation report
Data Inventory Collect and maintain a catalog of main RDBMS data types Collect and maintain a catalog of main RDBMS data types –Select from catalog of well defined replication options –Determine which are to be supported as part of the service Ask the experiments and s/w providers to fill a simple table for each main data type which is candidate for storage and replication via this service Ask the experiments and s/w providers to fill a simple table for each main data type which is candidate for storage and replication via this service –Basic storage properties »Data description, expected volume on T0/1/2 in 2005 (and evolution) »Ownership model: read-only, single user update, single site update, concurrent update –Replication/Caching properties »Replication model: site local, all t1, sliced t1, all t2, sliced t2 … »Consistency/Latency: how quickly do changes need to reach other sites/tiers »Application constraints: DB vendor and DB version constraints –Reliability and Availability requirements »Essential for whole grid operation, for site operation, for experiment production, »Backup and Recovery policy, Acceptable time to recover, location of backup(s), etc.
DB Service Definition and Implementation Service Discovery Service Discovery –How does a job find a replica of the database it needs? –Do we need transparent relocation of services? How? Connectivity, firewalls and constraints on outgoing connections Connectivity, firewalls and constraints on outgoing connections Authentication and authorization Authentication and authorization –Integration between DB vendor and LCG security models Installation and configuration Installation and configuration –Database server and client installation kits »Which client bindings are required? C, C++, Java(JDBC), Perl,.. C, C++, Java(JDBC), Perl,.. –Server administration procedures and tools? »Even basic agreements to simplify the distributed operation –Server and client version upgrades (eg security patches) »How, if transparency is required for high availability? Backup and recovery Backup and recovery –Backup policy templates, Responsible site(s) for a particular data type? –Acceptable latency for recovery?
Initial list of possible evaluation tasks Oracle replication study Oracle replication study –Eg Continue/extend work started during CMS DC04 –Focus: stability, data rates, conflict handling, administration, topology DB File based distribution DB File based distribution –Eg shipping complete MySQL DBs or Oracle tablespaces –Focus: deployment impact on existing applications Application specific cross vendor extraction Application specific cross vendor extraction –Eg Extracting a subset of Conditions Data to a T2 site –Focus: complete support of experiment computing model use cases Web Proxy based data distribution Web Proxy based data distribution –Eg Integrate this technology into relational abstraction layer –Focus: cache control, efficient data transfer Other Generic Vendor-to-Vendor bridges Other Generic Vendor-to-Vendor bridges –Eg Streams interface to MySQL –Focus: feasibility, fault tolerance, application impact
Proposed Mandate, Timescale & Deliverables Define, in collaboration with the experiments and Tier0-2 service providers, a reliable LCG infrastructure which allows to store the database data and distribute it (if necessary) for use from physics applications and grid services. The target delivery date for a first service should be in time for the 2005 data challenges. Define, in collaboration with the experiments and Tier0-2 service providers, a reliable LCG infrastructure which allows to store the database data and distribute it (if necessary) for use from physics applications and grid services. The target delivery date for a first service should be in time for the 2005 data challenges. The project could/should run as part of the LCG deployment area in close collaboration with the application area as provider of application requirements and db abstraction solutions. The project could/should run as part of the LCG deployment area in close collaboration with the application area as provider of application requirements and db abstraction solutions. Main deliverables should be Main deliverables should be –An inventory of data types and their properties (incl. distribution) –A service definition document to be agreed between experiments and LCG sites –A service implementation document to be agreed between LCG sites Status reports to the established LCG committees Status reports to the established LCG committees –Final decisions are obtained via PEB and GDB
How CAN/SHOULD FNAL Participate? Comments form CD/CSS/DSG Comments form CD/CSS/DSG –They are very short on manpower for the short-term. –Their involvement will be limited to the existing testing program with Oracle 10G and unidirectional streams replication. –They are willing, and interested, to participate in regular 3D meetings and offer technical advice when possible. CD/CEPA/DBS Interested in participating CD/CEPA/DBS Interested in participating –Help defining the plan –Development and testing: MMSR, FonNtier, MySQL replication. Need other CD Department/Group involvement. Need other CD Department/Group involvement. The ATLAS DB group at Argonne (David Malon, Alexandre Vaniachine, Jack Cranshaw) are very interested in working together with Fermilab. This could be a productive collaboration. The ATLAS DB group at Argonne (David Malon, Alexandre Vaniachine, Jack Cranshaw) are very interested in working together with Fermilab. This could be a productive collaboration.
Summary The HCAL DetDB project is providing a set of tools for the testbeam, and is providing us valuable contact with other DB projects in CMS and LCG. The HCAL DetDB project is providing a set of tools for the testbeam, and is providing us valuable contact with other DB projects in CMS and LCG. A FroNtier plug-in for POOL would enable us to leverage our experience in CDF to CMS. We are pursuing this with the POOL developers. A FroNtier plug-in for POOL would enable us to leverage our experience in CDF to CMS. We are pursuing this with the POOL developers. The proposed Distributed Deployment of Databases is an important project. We should be involved in its definition and development. The proposed Distributed Deployment of Databases is an important project. We should be involved in its definition and development.