Download presentation
Presentation is loading. Please wait.
Published byDavid Barton Modified over 9 years ago
1
Jean-Yves Nief CC-IN2P3, Lyon HEPiX-HEPNT, Fermilab October 22nd – 25th, 2002
2
HEPiX-HEPNT Conference, Fermilab, October 22nd-25th 2002 2 Talk’s outline 1)Overview of BaBar: motivation for a TierA. 2)Hardware available for the CC-IN2P3 TierA (servers, storage, batch workers, network). 3)Software issues (maintenance, data import). 4)Resources usage (CPU used…). 5)Problems encountered (hardware, software). 6)BaBar-Grid and future developments.
3
HEPiX-HEPNT Conference, Fermilab, October 22nd-25th 2002 3 BaBar: a short overview Study of CP violation using B mesons, located at SLAC. Since 1999, more than 88 millions B-B events collected. ~ 660 TB of data stored (real data + simulation) How is it handled ? Object oriented techniques: C++ software and OO database system (Objectivity). For data analysis @ SLAC: 445 batch workers (500 CPUs), 127 Objy servers + ~50 TB of disk + HPSS. But: important users needs (> 500 physicists)=>saturation of the system. collaborators spread world-wide (America, Europe). Idea: creation of mirror sites where data analysis/simu prod could be done.
4
HEPiX-HEPNT Conference, Fermilab, October 22nd-25th 2002 4 CC-IN2P3 Tier A: hardware (I) 19 Objectivity servers: SUN machines. - 8 Sun Netra 1405T (4 CPUs). - 2 Sun 4500 (4 CPUs). - 1 Sun 1450 (4 CPUs). - 8 Sun 250 (2 CPUs). 9 servers for data access for analysis jobs. 2 databases catalog servers. 6 servers for databases transactions handling. 1 server for Monte-Carlo production. 1 server for data import/export. 20 TB of disks.
5
HEPiX-HEPNT Conference, Fermilab, October 22nd-25th 2002 5 Hardware (II): Storage system Mass storage system: 20 % available on disk => automatic staging required. Storage for private use: Temporary storage: 200 GB NFS space. Permanent storage : - For small files (log files…): Elliot archiving system. - For large files (ntuples…) > 20 GB: HPSS (2% of the total occupancy). > 100 TB in HPSS
6
HEPiX-HEPNT Conference, Fermilab, October 22nd-25th 2002 6 Hardware (III): the network Massive data import from Slac ( ~ 80 TB in one year ). Data needs to be available in Lyon within a short amount of time (max: 24 - 48 hours). Large bandwidth between SLAC and IN2P3 required. 2 roads: CC-IN2P3 Renater US : 100 Mbs/s CC-IN2P3 CERN US : 155 Mbs/s (until this week) CC-IN2P3 Geant US : 1 Gbs/s (from now on) Full potential never reached (not understood)
7
HEPiX-HEPNT Conference, Fermilab, October 22nd-25th 2002 7 Hardware (IV): the batch and interactive farm The batch farm (shared): –20 Sun Ultra 60 dual processor. –96 Linux PIII-750 MHz dual processor, NetFinity 4000R. –96 Linux PIII-1GHz dual processor, IBM X-series. 424 CPUs The interactive farm (shared): –4 Sun machines. –12 Linux machines.
8
HEPiX-HEPNT Conference, Fermilab, October 22nd-25th 2002 8 Software (I): BaBar releases, Objectivity BaBar releases: Needs to keep up with evolution of the BaBar software at Slac. new BaBar software releases have to be installed as soon as they are available. Objectivity and related issues: Development of tools: To monitor the servers activity, HPSS and batch resources. To survey the Objectivity processes on the servers (« sick » daemons, transactions locks…). Maintenance: software upgrades, load balancing of the servers. Debugging the Objy problems both on client and server side.
9
HEPiX-HEPNT Conference, Fermilab, October 22nd-25th 2002 9 Software (II): data import mechanism Data catalog available for users through a mySql database. (1)SLAC Cern IN2P3 (2) SLAC Renater IN2P3 ~ 500 MB using multi-stream transfer (bbftp: designed for big files). extraction when new or updated dbs available. import in Lyon launched when extraction @ SLAC is finished.
10
HEPiX-HEPNT Conference, Fermilab, October 22nd-25th 2002 10 Resources usage (I) Tier A officially opened last fall. ~ 200 - 250 analysis jobs running in parallel (the batch system can handle up to 600 jobs in // ). ~ 60 – 70 MC production jobs running in //. already ~ 50 millions events produced in Lyon. now represents ~ 10-15% of the total weekly BaBar MC prod. ~ 1/3 of the jobs running are BaBar jobs. Up to 4500 jobs in queue during the busiest periods.
11
HEPiX-HEPNT Conference, Fermilab, October 22nd-25th 2002 11 Resource usage (II) BaBar: top CPU consumer group in the last 4 months at IN2P3. Second CPU consumer since the beginning of the year. MC prod represents 25 – 30% of the total CPU time used. ~ 25 – 30% of CPU for analysis used by remote users. (*) 1 unit = 1/8 hour on PIII, 1 GHz. (*)
12
HEPiX-HEPNT Conference, Fermilab, October 22nd-25th 2002 12 Resources usage (III) 20% of the data on disk dynamic staging via HPSS (RFIO interface). –~ 80 s for a staging request. –Up to 3000 staging requests possible per day Not a limitation for CPU efficiency. Needs less disk space, allow to save money.
13
HEPiX-HEPNT Conference, Fermilab, October 22nd-25th 2002 13 Problems encountered A few problems with the availability of data in Lyon due to the complexity of the export/import procedure. Network bandwidth for data import a bit erratic, maximum never reached. Objectivity related bugs (most of them due to Objy server problems). Some HPSS outages, system overloaded (software related + hardware limitations): solved better performance now. During peak activity (e.g. before the summer conference), huge backlog on the batch system.
14
HEPiX-HEPNT Conference, Fermilab, October 22nd-25th 2002 14 The Tier A and the outer world: BaBar Grid @ IN2P3 Involvement of BaBar to use Grid technologies. Storage Resource Broker (SRB) and MetaCatalog (MCAT) software installed and tested @ IN2P3: –Allows to access data sets and resources based on their attributes rather than their physical locations. Future for the data distribution between SLAC and IN2P3. Tests @ IN2P3 of the EDG software using BaBar analysis applications: possible to remotely submit a job @ IN2P3 to RAL and SLAC. Prototype of a tool to remotely submit jobs: December 2002.
15
HEPiX-HEPNT Conference, Fermilab, October 22nd-25th 2002 15 CC-IN2P3 Tier A: future developments 2 new Objy servers + new disks (near future): –1 allocated to MC prod goal: x 2 the MC production. –Less staging requests to HPSS. 72 new Linux batch workers ( PIII, 1.4 Ghz) CPU power increased by 50% (shared with others). Compression of the databases on disk (client or server decompression on the fly) HPSS load decreased. Installation of a dynamic load balancing system on the Objy servers more efficient (next year).
16
HEPiX-HEPNT Conference, Fermilab, October 22nd-25th 2002 16 Conclusion BaBar Tier A in Lyon running full steam. ~ 25 – 30 % of the CPU consumed by analysis jobs used by remote users. Significant resources at CC-IN2P3 dedicated to BaBar (CPU: 2nd biggest user this year, HPSS: first staging requester). Contribution to BaBar overall effort increasing thanks to: –New Objy servers and disk space. –New batch workers (72 new Linux this year, ~ 200 next year). –HPSS new tape drivers. –Database compression and dynamic load balancing of the servers.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.