Presentation is loading. Please wait.

Presentation is loading. Please wait.

8 October 1999 BaBar Storage at CCIN2P3 p. 1 Rolf Rumler BaBar Storage at Lyon HEPIX and Mass Storage SLAC, California, U.S.A. 8 October 1999 Rolf Rumler,

Similar presentations


Presentation on theme: "8 October 1999 BaBar Storage at CCIN2P3 p. 1 Rolf Rumler BaBar Storage at Lyon HEPIX and Mass Storage SLAC, California, U.S.A. 8 October 1999 Rolf Rumler,"— Presentation transcript:

1 8 October 1999 BaBar Storage at CCIN2P3 p. 1 Rolf Rumler BaBar Storage at Lyon HEPIX and Mass Storage SLAC, California, U.S.A. 8 October 1999 Rolf Rumler, John O’Neall, Philippe Gaillardon, Internal Group IN2P3 Computing Center Villeurbanne, France URL http://www.in2p3.fr/CC

2 8 October 1999 BaBar Storage at CCIN2P3 p. 2 Rolf Rumler BABAR Experiment High-energy-physics experiment, started in July at SLAC The IN2P3 Computing Center is the “mirror” computing site for Babar computing. We will receive a copy of all Babar data (well, almost). Also will produce simulated data, which will be stored as well as sent to SLAC. Estimated data rate is on the order of 350 TB per year SLAC has chosen HPSS to store this data; the CCIN2P3 is following their example. Our initial goal is to do the same thing as SLAC for BABAR. Files >~ 2 GB

3 8 October 1999 BaBar Storage at CCIN2P3 p. 3 Rolf Rumler How it works Objectivity amshpss file file.lock HPSS ooss_Mig ooss_Pur ooss_Stage M P C R(1) R(2) R(3) (pfpt) (Creation, Lecture (read), Migration, Purge, Recovery) L data control (pftp)

4 8 October 1999 BaBar Storage at CCIN2P3 p. 4 Rolf Rumler HPSS Configuration For the moment, Babar only ==> like SLAC One single Storage Class in one single COS Tape only = Storagetek Redwoods, 9840 and MAGSTARs under study No mirroring All access to data via pftp_client Additional tools from SLAC (Andy Hanushevsky)

5 8 October 1999 BaBar Storage at CCIN2P3 p. 5 Rolf Rumler Objectivity Configuration Summary 1 SUN E4500 (4 CPUs) + 2 SUN A3500, in total about 1.1 TB RAID 5, under Veritas VM/FS, with actual BaBar data 1 SUN E4500 + 2 SUN A3500 as above, no data yet 1 SUN E450 (4 CPUs) linked to IBM VSS disk space, about 400 GB RAID 5, with Veritas: tests starting next week Intention: to have different Objy servers for different types of data

6 8 October 1999 BaBar Storage at CCIN2P3 p. 6 Rolf Rumler Core Server

7 8 October 1999 BaBar Storage at CCIN2P3 p. 7 Rolf Rumler HPSS Core Server RS/6000 F50 4 CPUs, 1 GB memory 2 x 4.5 GB mirrored system disks 24 GB internal SSA disks for SFS (mirrored) AIX 4.3.2 Ethernet (control network) DCE, Encina, SAMMI OMI driver for Redwoods Access to Storagetek ACL by ACSLS

8 8 October 1999 BaBar Storage at CCIN2P3 p. 8 Rolf Rumler Mover Stations

9 8 October 1999 BaBar Storage at CCIN2P3 p. 9 Rolf Rumler HPSS Movers Preliminary configuration, while waiting for choice of best machine to use with Gigabit Ethernet; also lacking BABAR usage profile (Historical problem: Changed from ATM to Hi-speed Ethernet just as HPSS was arriving) RS/6000 390, replacement under study (43P260?) 1 CPU, 256 MB memory 2 x 4.5 GB mirrored system disks AIX 4.3.2 Ethernet control network, Fast Ethernet data network

10 8 October 1999 BaBar Storage at CCIN2P3 p. 10 Rolf Rumler Storagetek 4400 Silos (6)

11 8 October 1999 BaBar Storage at CCIN2P3 p. 11 Rolf Rumler Performance Reminder: Temporary mover/network configuration Performance limited by: –Fast Ethernet data path (100 Mbps ==> < 8 MB/sec). –Mover CPUs: ~50 % occupied. Punctual transfer: ~ 5 MB/sec per tape Global rate slower because of cartridge mount and positioning time, ~ 3.5 MB/sec Global max transfer rate: > 16 MB/sec (write), ~ 3 MB/sec (read)

12 8 October 1999 BaBar Storage at CCIN2P3 p. 12 Rolf Rumler Errors during 1st test (5 days)

13 8 October 1999 BaBar Storage at CCIN2P3 p. 13 Rolf Rumler Errors during 2nd test (5 days)

14 8 October 1999 BaBar Storage at CCIN2P3 p. 14 Rolf Rumler Particular problem: Tape errors HPSS and Redwood cartridges, at least with our test usage pattern, do not seem to cohabit well, especially for random reading of ~ 2-GB files. Redwoods need regular maintenance (every 100 hours or less) ==> need to be scheduled. Need stats from controllers. Need effective maintenance from Storagetek. Need tools to monitor volume and drive errors. Need for HPSS to react automatically to volume and drive errors. (Example: unable to dismount cartridge ==> HPSS keeps trying indefinitely; drive errors during writing can turn drive into “black hole”)

15 8 October 1999 BaBar Storage at CCIN2P3 p. 15 Rolf Rumler The good(?) news Storagetek taking our problems seriously Adopted several measures to “minimize our dissatisfaction” (thru end of 1999): –Maintenance presence > 1 hour/day –Check cartridges to see if any from known-bad batches –Problem “PINNACLE”, max severity, to handle problems –Procedure to follow up on all tapes and drives sent to Storagetek for analysis or repair –Permanent spare SD-3 at IN2P3 + replacement priority –Daily log analysis, to monitor errors and report them back to us –Goal: Anticipate bad vols or drives and replace before they break

16 8 October 1999 BaBar Storage at CCIN2P3 p. 16 Rolf Rumler Other problem: HPSS manageability SAMMI doesn’t make it for us. Need to receive a user-configurable subset of the “alarms and events” messages in a script, which can then take the appropriate actions. The “appropriate actions” require that appropriate commands be available in command-line form: –lock a volume or device; –forward a message via e-mail, Patrol, beeper or other means; Many messages are not sufficiently precise or information is lacking.

17 8 October 1999 BaBar Storage at CCIN2P3 p. 17 Rolf Rumler Summary Greatest current problem is due to errors from Redwood drives; we are studying this problem with Storagetek France. This problem is exacerbated by the next one. Greatest long-term problem is manageability, specifically, the lack of adequate non-graphic interfaces to HPSS to permit effective, automatic error detection, performance monitoring and alarm propagation.


Download ppt "8 October 1999 BaBar Storage at CCIN2P3 p. 1 Rolf Rumler BaBar Storage at Lyon HEPIX and Mass Storage SLAC, California, U.S.A. 8 October 1999 Rolf Rumler,"

Similar presentations


Ads by Google