Presentation is loading. Please wait.

Presentation is loading. Please wait.

CHEP 2000, Padova, Feb 2000 Operational Experience with the B A B AR Database David R. Quarrie Lawrence Berkeley National Laboratory for B A B AR Computing.

Similar presentations


Presentation on theme: "CHEP 2000, Padova, Feb 2000 Operational Experience with the B A B AR Database David R. Quarrie Lawrence Berkeley National Laboratory for B A B AR Computing."— Presentation transcript:

1 CHEP 2000, Padova, Feb 2000 Operational Experience with the B A B AR Database David R. Quarrie Lawrence Berkeley National Laboratory for B A B AR Computing Group DRQuarrie@LBL.GOV

2 CHEP 2000, Padova, Feb 2000 David R. Quarrie: Operational Experience with the BaBar Database 2 Acknowledgements D. Quarrie 5, T. Adye 6, A. Adesanya 7, J-N. Albert 4, J. Becla 7, D. Brown 5, C. Bulfon 3, I. Gaponenko 5, S. Gowdy 5, A. Hanushevsky 7, A. Hasan 7, Y. Kolomensky 2, S. Mukhortov 1, S. Patton 5, G. Svarovski 1, A. Trunov 7, G. Zioulas 7 for the B A B AR Computing Group 1 Budker Institute of Nuclear Physics, Russia 2 California Institute of Technology, USA 3 INFN, Rome, Italy 4 Lab de l'Accelerateur Lineaire, France 5 Lawrence Berkeley National Laboratory, USA 6 Rutherford Appleton Laboratory, UK 7 Stanford Linear Accelerator Center, USA

3 CHEP 2000, Padova, Feb 2000 David R. Quarrie: Operational Experience with the BaBar Database 3 Introduction l Many other talks describe other aspects of B A B AR Database n A. Adesanya, An interactive browser for B A B AR databases n J. Becla, Improving Performance of Object Oriented Databases, B A B AR Case Studies n I. Gaponenko, An Overview of the B A B AR Conditions Database n A. Hanushevsky, Practical Security in large-Scale Distributed Object Oriented Databases n A. Hanushevsky, Disk Cache Management in Large-Scale Object Oriented Databases n E. Leonardi, Distributing Data around the B A B AR collaboration’s Objectivity Federations n S. Patton, Schema migration for B A B AR Objectivity Federations n G. Zioulas, The B A B AR Online Databases l Focus on some of the operational aspects n Lessons learnt during 12 months of production running

4 CHEP 2000, Padova, Feb 2000 David R. Quarrie: Operational Experience with the BaBar Database 4 Experiment Characteristics

5 CHEP 2000, Padova, Feb 2000 David R. Quarrie: Operational Experience with the BaBar Database 5 Performance Requirements l Online Prompt Reconstruction n Baseline of 200 processing nodes n 100 Hz total (physics plus backgrounds) ç30 Hz of Hadronic Physics l Fully reconstructed ç70 Hz of backgrounds, calibration physics l Not necessarily fully reconstructed l Physics Analysis n DST Creation ç2 users at 10 9 events in 10 6 secs (1 month) n DST Analysis ç20 users at 10 8 events in 10 6 secs n Interactive Analysis ç100 users at 100events/secs

6 CHEP 2000, Padova, Feb 2000 David R. Quarrie: Operational Experience with the BaBar Database 6 Functionality Summary l Basic design/functionality ok l No performance or scaling problems with conditions, ambient and configuration databases l Security and data protection APIs added n Internal to a federation n Access to different federations l Startup problems being resolved n Scaling problems with event store çOnline Prompt Reconstruction çPhysics Analysis n Data Distribution çInternal within SLAC çExternal to/from remote Institutions

7 CHEP 2000, Padova, Feb 2000 David R. Quarrie: Operational Experience with the BaBar Database 7 SLAC Hardware Configuration

8 CHEP 2000, Padova, Feb 2000 David R. Quarrie: Operational Experience with the BaBar Database 8 Production Federations l Developer Test n Dedicated server with 500GB disk & two lockserver machines çSaturation of transaction table with a single lock server n Test federations typically correspond to B A B AR software releases n 5 federation ids assigned per developer n Space at a premium – separate journal file area l Shared Test n Developer communities ç(e.g. reconstruction) n Share hardware with developer test federations çSpace becoming a problem – dedicated servers being setup l Production Releases n Used during the software release build process. çOne per release and platform architecture n Share hardware with developer test federations

9 CHEP 2000, Padova, Feb 2000 David R. Quarrie: Operational Experience with the BaBar Database 9 Production Federations (2) l Online (IR2) n Used for online calibrations, slow controls information, configurations n Servers physically located in experiment hall l Online Prompt Reconstruction (OPR) n Pseudo real-time reconstruction of raw data çDesigned to share IR2 federation as 2 nd autonomous partition çIntermittent DAQ run startup interference caused split l Still planned to recombine n Input from files on spool disk çDecoupling to prevent possible deadtime çThese files also written to tape n 100-200 processing nodes çDesign is 200 with 100Hz input event rate n Output to several database servers with 3TB of disk çAutomatic migration to hierarchical mass store tape l Reprocessing n Clone of OPR for bulk reprocessing with improved algorithms etc. çReprocessing from raw data tapes çBeing configured now – first reprocessing scheduled for March 2000

10 CHEP 2000, Padova, Feb 2000 David R. Quarrie: Operational Experience with the BaBar Database 10 Production Federations (3) l Physics Analysis n Main physics analysis activities n Decoupled from OPR to prevent interference l Simulation Production n Bulk production of simulated data n Small farm of ~30 machines çAugmented by farm at LLNL writing to same federation çOther production site databases imported to SLAC l Simulation Analysis n Shares same servers as physics analysis federation n Separate federations to allow more database ids çNot possible to access physics and simulation data simultaneously l Testbed federation n Dedicated servers with up to 240 clients n Performance scaling as function of number of servers, filesystems per server, cpus per server, and other configuration parameters

11 CHEP 2000, Padova, Feb 2000 David R. Quarrie: Operational Experience with the BaBar Database 11 Integration with Mass Store l Data Servers form primary interface l Different regions on disk n Staged: Databases managed by staging/migration/purging service n Resident: Databases are never staged or purged n Dynamic: Neither staged, migrated or purged çMetadata such as federation catalog, management databases çFrequently modified so would be written to tape frequently çOnly single slot in namespace but multiple space on tape çExplicit backups taken during scheduled outages n Test: Not managed. çTest new applications and database configurations l Analysis federation staging split into two servers n User: Explicit staging requests based on input event collections n Kept: Centrally managed access to particular physics runs

12 CHEP 2000, Padova, Feb 2000 David R. Quarrie: Operational Experience with the BaBar Database 12 Movement of data between Federations l Several federations form coupled sets n Physics çOnline, OPR, Analysis n Simulation çProduction, Analysis l Data Distribution strategy to move databases between federations n Allocation of id ranges avoids clashes between source & destination n Use of HPSS namespace to avoid physical copying of databases çOnce a database has been migrated from source, the catalog of the destination is updated and the staging procedures will read the database on demand l Transfer causes some interference n Still working to understand and minimize n Two scheduled outages per week (10% downtime) çOther administrative activities l Backups, schema updates, configuration updates n ~2 day latency from OPR to physics analysis çHave demonstrated <6 hours in tests

13 CHEP 2000, Padova, Feb 2000 David R. Quarrie: Operational Experience with the BaBar Database 13 Physicist Access to Data l Access via event collections n Mapping from event collection to databases çData from any event spread across 8 databases l Improved performance to frequently accessed information l Reduction in disk space n Scanning of collections became bottleneck çNeeded for explicit staging of databases from tape n Mapping known by OPR but not saved n Decided to use Oracle database for staging requests çSingle scan of collection to databases mapping çAlso used for production bookkeeping çData distribution bookkeeping l On-demand staging feasible using Objy 5.2 n Has been demonstrated n Prefer explicit staging for production until access better understood

14 CHEP 2000, Padova, Feb 2000 David R. Quarrie: Operational Experience with the BaBar Database 14 Production Schema Management l Crucial that new software releases compatible with schema in production federations n New software release is a true superset l Reference schema used to preload release build federation l If release build is successful, the output schema forms new reference n Following some QA tests l Offline and online builds can overlap in principle n Token passing scheme ensures sequential schema updates to reference l Production federations updated to reference during scheduled outages l Explicit schema evolution scheme described elsewhere

15 CHEP 2000, Padova, Feb 2000 David R. Quarrie: Operational Experience with the BaBar Database 15 Support Personnel l Two database administrators n Daily database operations n Develop management scripts and procedures n Data distribution between SLAC production federations n First level of user support l Two data distribution support people n Data distribution to/from regional centers and external sites l Two HPSS (mass store) support people n Also responsible for AMS back-end software çStaging/migration/purging çExtensions (security, timeout deferral, etc.) l Five database developers provide second tier support n Augmented by physicists, visitors

16 CHEP 2000, Padova, Feb 2000 David R. Quarrie: Operational Experience with the BaBar Database 16 Database Statistics (Jan 2000) l Data n ~33TB accumulated data n ~14000 databases n ~28000 collections l 513 persistent classes l Servers n 12 primary çDatabase servers n 15 secondary çLock servers, catalog & journal servers l Disk Space n ~10TB çTotal for data, split into staged, resident, etc.

17 CHEP 2000, Padova, Feb 2000 David R. Quarrie: Operational Experience with the BaBar Database 17 Database Statistics (2) l Sites n >30 sites using Objectivity çUSA, UK, France, Italy, Germany, Russia l Users n ~655 licensees çPeople who have signed the license agreement n ~430 users çPeople who have created a test federation n ~90 simultaneous users at SLAC çMonitoring distributed oolockmon statistics n ~60 developers çHave created or modified a persistent class çA wide range of expertise l 10-15 experts

18 CHEP 2000, Padova, Feb 2000 David R. Quarrie: Operational Experience with the BaBar Database 18 Ongoing and Future Operational Activities l Improved performance n Both hardware and software improvements n Reduced payload per event n Design goals almost met l Improved automation n Less burden on the support staff l Reduced downtime and latency n Outage level <10% n Latency <6 hours l Large file handling issues n Problems handling 10GB database files for external distribution l Better cleanup after problems l Remerge online and OPR federations

19 CHEP 2000, Padova, Feb 2000 David R. Quarrie: Operational Experience with the BaBar Database 19 Conclusions l Basic design and technology ok n No killer problems l Initial performance problems being overcome n Ongoing process n Design goals almost met l Still learning how to manage a large system n Multiple federations n Multiple servers n Multiple sites n Large user community n Large developer community l Still more to automate n Many manual procedures l More features on the way n Multi-dimensional indexing n Parallel iteration


Download ppt "CHEP 2000, Padova, Feb 2000 Operational Experience with the B A B AR Database David R. Quarrie Lawrence Berkeley National Laboratory for B A B AR Computing."

Similar presentations


Ads by Google