8 October 1999 BaBar Storage at CCIN2P3 p. 1 Rolf Rumler BaBar Storage at Lyon HEPIX and Mass Storage SLAC, California, U.S.A. 8 October 1999 Rolf Rumler,

Slides:



Advertisements
Similar presentations
Andrew Hanushevsky7-Feb Andrew Hanushevsky Stanford Linear Accelerator Center Produced under contract DE-AC03-76SF00515 between Stanford University.
Advertisements

Operating System.
S. Gadomski, "ATLAS computing in Geneva", journee de reflexion, 14 Sept ATLAS computing in Geneva Szymon Gadomski description of the hardware the.
Network Administration Procedures Tools –Ping –SNMP –Ethereal –Graphs 10 commandments for PC security.
What is it? Hierarchical storage software developed in collaboration with five US department of Energy Labs since 1992 Allows storage management of 100s.
1 Andrew Hanushevsky - HEPiX, October 6-8, 1999 Mass Storage For BaBar at SLAC Andrew Hanushevsky Stanford.
PetaByte Storage Facility at RHIC Razvan Popescu - Brookhaven National Laboratory.
MCTS Guide to Microsoft Windows Server 2008 Network Infrastructure Configuration Chapter 11 Managing and Monitoring a Windows Server 2008 Network.
Operational Dataset Update Functionality Included in the NCAR Research Data Archive Management System 1 Zaihua Ji Doug Schuster Steven Worley Computational.
Mass RHIC Computing Facility Razvan Popescu - Brookhaven National Laboratory.
Large scale data flow in local and GRID environment V.Kolosov, I.Korolko, S.Makarychev ITEP Moscow.
Windows Server MIS 424 Professor Sandvig. Overview Role of servers Performance Requirements Server Hardware Software Windows Server IIS.
The Mass Storage System at JLAB - Today and Tomorrow Andy Kowalski.
Guide to Linux Installation and Administration, 2e 1 Chapter 9 Preparing for Emergencies.
CC - IN2P3 Site Report Hepix Fall meeting 2009 – Berkeley
CASPUR Site Report Andrei Maslennikov Sector Leader - Systems Catania, April 2001.
11 SYSTEM PERFORMANCE IN WINDOWS XP Chapter 12. Chapter 12: System Performance in Windows XP2 SYSTEM PERFORMANCE IN WINDOWS XP  Optimize Microsoft Windows.
Farm Management D. Andreotti 1), A. Crescente 2), A. Dorigo 2), F. Galeazzi 2), M. Marzolla 3), M. Morandin 2), F.
D0 SAM – status and needs Plagarized from: D0 Experiment SAM Project Fermilab Computing Division.
Jean-Yves Nief CC-IN2P3, Lyon HEPiX-HEPNT, Fermilab October 22nd – 25th, 2002.
23 Oct 2002HEPiX FNALJohn Gordon CLRC-RAL Site Report John Gordon CLRC eScience Centre.
W.A.Wojcik/CCIN2P3, May Running the multi-platform, multi-experiment cluster at CCIN2P3 Wojciech A. Wojcik IN2P3 Computing Center
Overview of day-to-day operations Suzanne Poulat.
Introduction to U.S. ATLAS Facilities Rich Baker Brookhaven National Lab.
20-22 September 1999 HPSS User Forum, Santa Fe CERN IT/PDP 1 History  Test system HPSS 3.2 installation in Oct 1997 IBM AIX machines with IBM 3590 drives.
21 st October 2002BaBar Computing – Stephen J. Gowdy 1 Of 25 BaBar Computing Stephen J. Gowdy BaBar Computing Coordinator SLAC 21 st October 2002 Second.
6/26/01High Throughput Linux Clustering at Fermilab--S. Timm 1 High Throughput Linux Clustering at Fermilab Steven C. Timm--Fermilab.
SLAC Site Report Chuck Boeheim Assistant Director, SLAC Computing Services.
RAL Site Report John Gordon IT Department, CLRC/RAL HEPiX Meeting, JLAB, October 2000.
LCG Phase 2 Planning Meeting - Friday July 30th, 2004 Jean-Yves Nief CC-IN2P3, Lyon An example of a data access model in a Tier 1.
HPSS for Archival Storage Tom Sherwin Storage Group Leader, SDSC
4-8 th October 1999CERN Site Report, HEPiX SLAC. A.Silverman CERN Site Report HEPNT/HEPiX October 1999 SLAC Alan Silverman CERN/IT/DIS.
Jefferson Lab Site Report Sandy Philpott Thomas Jefferson National Accelerator Facility (formerly CEBAF - The Continuous Electron Beam Accelerator Facility)
PHENIX Computing Center in Japan (CC-J) Takashi Ichihara (RIKEN and RIKEN BNL Research Center ) Presented on 08/02/2000 at CHEP2000 conference, Padova,
Hepix LAL April 2001 An alternative to ftp : bbftp Gilles Farrache In2p3 Computing Center
PC clusters in KEK A.Manabe KEK(Japan). 22 May '01LSCC WS '012 PC clusters in KEK s Belle (in KEKB) PC clusters s Neutron Shielding Simulation cluster.
14 th April 1999CERN Site Report, HEPiX RAL. A.Silverman CERN Site Report HEPiX April 1999 RAL Alan Silverman CERN/IT/DIS.
Andrei Moskalenko Storage team, Centre de Calcul de l’ IN2P3. HPSS – The High Performance Storage System Storage at the Computer Centre of the IN2P3 HEPiX.
Randy MelenApril 14, Stanford Linear Accelerator Center Site Report April 1999 Randy Melen SLAC Computing Services/Systems HPC Team Leader.
RHIC/US ATLAS Tier 1 Computing Facility Site Report Christopher Hollowell Physics Department Brookhaven National Laboratory HEPiX Upton,
January 30, 2016 RHIC/USATLAS Computing Facility Overview Dantong Yu Brookhaven National Lab.
PPDG meeting, July 2000 Interfacing the Storage Resource Broker (SRB) to the Hierarchical Resource Manager (HRM) Arie Shoshani, Alex Sim (LBNL) Reagan.
W.A.Wojcik/CCIN2P3, Nov 1, CCIN2P3 Site report Wojciech A. Wojcik IN2P3 Computing Center URL:
S AN D IEGO S UPERCOMPUTER C ENTER N ATIONAL P ARTNERSHIP FOR A DVANCED C OMPUTATIONAL I NFRASTRUCTURE 6 TB SSA Disk StorageTek Tape Libraries 830 GB MaxStrat.
Focus 1 July 1999Summary of April 99 HepiX mass storage meeting 1 Summary of April 1999 HepiX mass storage meeting Focus 1 July 1999 H.Renshall PDP/IT.
The Fibre Channel Project at the CC_IN2P3 Hector DUQUE, Rolf RUMLER - April Hector DUQUE, Rolf RUMLER
Batch Software at JLAB Ian Bird Jefferson Lab CHEP February, 2000.
A UK Computing Facility John Gordon RAL October ‘99HEPiX Fall ‘99 Data Size Event Rate 10 9 events/year Storage Requirements (real & simulated data)
CERN IT Department CH-1211 Genève 23 Switzerland t The Tape Service at CERN Vladimír Bahyl IT-FIO-TSI June 2009.
D0 File Replication PPDG SLAC File replication workshop 9/20/00 Vicky White.
BNL dCache Status and Plan CHEP07: September 2-7, 2007 Zhenping (Jane) Liu for the BNL RACF Storage Group.
12 Mars 2002LCG Workshop: Disk and File Systems1 12 Mars 2002 Philippe GAILLARDON IN2P3 Data Center Disk and File Systems.
GDB meeting - Lyon - 16/03/05 An example of data management in a Tier A/1 Jean-Yves Nief.
W.A.Wojcik/CCIN2P3, HEPiX at SLAC, Oct CCIN2P3 Site report Wojciech A. Wojcik IN2P3 Computing Center URL:
CCJ introduction RIKEN Nishina Center Kohei Shoji.
CC-IN2P3 Pierre-Emmanuel Brinette Benoit Delaunay IN2P3-CC Storage Team 17 may 2011.
Data Hosting and Security Overview January, 2011.
10/18/01Linux Reconstruction Farms at Fermilab 1 Steven C. Timm--Fermilab.
NASA Langley Research Center’s Distributed Mass Storage System (DMSS) Juliet Z. Pao Guest Lecturing at ODU April 8, 1999.
Bernd Panzer-Steindel CERN/IT/ADC1 Medium Term Issues for the Data Challenges.
CCIN2P3 Site Report - BNL, Oct 18, CCIN2P3 Site report Wojciech A. Wojcik IN2P3 Computing Center.
High Performance Storage System (HPSS) Jason Hick Mass Storage Group HEPiX October 26-30, 2009.
U.S. ATLAS Grid Production Experience
PC Farms & Central Data Recording
SAM at CCIN2P3 configuration issues
Farida Fassi, Damien Mercie
Universita’ di Torino and INFN – Torino
NERSC Reliability Data
San Diego Supercomputer Center
Presentation transcript:

8 October 1999 BaBar Storage at CCIN2P3 p. 1 Rolf Rumler BaBar Storage at Lyon HEPIX and Mass Storage SLAC, California, U.S.A. 8 October 1999 Rolf Rumler, John O’Neall, Philippe Gaillardon, Internal Group IN2P3 Computing Center Villeurbanne, France URL

8 October 1999 BaBar Storage at CCIN2P3 p. 2 Rolf Rumler BABAR Experiment High-energy-physics experiment, started in July at SLAC The IN2P3 Computing Center is the “mirror” computing site for Babar computing. We will receive a copy of all Babar data (well, almost). Also will produce simulated data, which will be stored as well as sent to SLAC. Estimated data rate is on the order of 350 TB per year SLAC has chosen HPSS to store this data; the CCIN2P3 is following their example. Our initial goal is to do the same thing as SLAC for BABAR. Files >~ 2 GB

8 October 1999 BaBar Storage at CCIN2P3 p. 3 Rolf Rumler How it works Objectivity amshpss file file.lock HPSS ooss_Mig ooss_Pur ooss_Stage M P C R(1) R(2) R(3) (pfpt) (Creation, Lecture (read), Migration, Purge, Recovery) L data control (pftp)

8 October 1999 BaBar Storage at CCIN2P3 p. 4 Rolf Rumler HPSS Configuration For the moment, Babar only ==> like SLAC One single Storage Class in one single COS Tape only = Storagetek Redwoods, 9840 and MAGSTARs under study No mirroring All access to data via pftp_client Additional tools from SLAC (Andy Hanushevsky)

8 October 1999 BaBar Storage at CCIN2P3 p. 5 Rolf Rumler Objectivity Configuration Summary 1 SUN E4500 (4 CPUs) + 2 SUN A3500, in total about 1.1 TB RAID 5, under Veritas VM/FS, with actual BaBar data 1 SUN E SUN A3500 as above, no data yet 1 SUN E450 (4 CPUs) linked to IBM VSS disk space, about 400 GB RAID 5, with Veritas: tests starting next week Intention: to have different Objy servers for different types of data

8 October 1999 BaBar Storage at CCIN2P3 p. 6 Rolf Rumler Core Server

8 October 1999 BaBar Storage at CCIN2P3 p. 7 Rolf Rumler HPSS Core Server RS/6000 F50 4 CPUs, 1 GB memory 2 x 4.5 GB mirrored system disks 24 GB internal SSA disks for SFS (mirrored) AIX Ethernet (control network) DCE, Encina, SAMMI OMI driver for Redwoods Access to Storagetek ACL by ACSLS

8 October 1999 BaBar Storage at CCIN2P3 p. 8 Rolf Rumler Mover Stations

8 October 1999 BaBar Storage at CCIN2P3 p. 9 Rolf Rumler HPSS Movers Preliminary configuration, while waiting for choice of best machine to use with Gigabit Ethernet; also lacking BABAR usage profile (Historical problem: Changed from ATM to Hi-speed Ethernet just as HPSS was arriving) RS/ , replacement under study (43P260?) 1 CPU, 256 MB memory 2 x 4.5 GB mirrored system disks AIX Ethernet control network, Fast Ethernet data network

8 October 1999 BaBar Storage at CCIN2P3 p. 10 Rolf Rumler Storagetek 4400 Silos (6)

8 October 1999 BaBar Storage at CCIN2P3 p. 11 Rolf Rumler Performance Reminder: Temporary mover/network configuration Performance limited by: –Fast Ethernet data path (100 Mbps ==> < 8 MB/sec). –Mover CPUs: ~50 % occupied. Punctual transfer: ~ 5 MB/sec per tape Global rate slower because of cartridge mount and positioning time, ~ 3.5 MB/sec Global max transfer rate: > 16 MB/sec (write), ~ 3 MB/sec (read)

8 October 1999 BaBar Storage at CCIN2P3 p. 12 Rolf Rumler Errors during 1st test (5 days)

8 October 1999 BaBar Storage at CCIN2P3 p. 13 Rolf Rumler Errors during 2nd test (5 days)

8 October 1999 BaBar Storage at CCIN2P3 p. 14 Rolf Rumler Particular problem: Tape errors HPSS and Redwood cartridges, at least with our test usage pattern, do not seem to cohabit well, especially for random reading of ~ 2-GB files. Redwoods need regular maintenance (every 100 hours or less) ==> need to be scheduled. Need stats from controllers. Need effective maintenance from Storagetek. Need tools to monitor volume and drive errors. Need for HPSS to react automatically to volume and drive errors. (Example: unable to dismount cartridge ==> HPSS keeps trying indefinitely; drive errors during writing can turn drive into “black hole”)

8 October 1999 BaBar Storage at CCIN2P3 p. 15 Rolf Rumler The good(?) news Storagetek taking our problems seriously Adopted several measures to “minimize our dissatisfaction” (thru end of 1999): –Maintenance presence > 1 hour/day –Check cartridges to see if any from known-bad batches –Problem “PINNACLE”, max severity, to handle problems –Procedure to follow up on all tapes and drives sent to Storagetek for analysis or repair –Permanent spare SD-3 at IN2P3 + replacement priority –Daily log analysis, to monitor errors and report them back to us –Goal: Anticipate bad vols or drives and replace before they break

8 October 1999 BaBar Storage at CCIN2P3 p. 16 Rolf Rumler Other problem: HPSS manageability SAMMI doesn’t make it for us. Need to receive a user-configurable subset of the “alarms and events” messages in a script, which can then take the appropriate actions. The “appropriate actions” require that appropriate commands be available in command-line form: –lock a volume or device; –forward a message via , Patrol, beeper or other means; Many messages are not sufficiently precise or information is lacking.

8 October 1999 BaBar Storage at CCIN2P3 p. 17 Rolf Rumler Summary Greatest current problem is due to errors from Redwood drives; we are studying this problem with Storagetek France. This problem is exacerbated by the next one. Greatest long-term problem is manageability, specifically, the lack of adequate non-graphic interfaces to HPSS to permit effective, automatic error detection, performance monitoring and alarm propagation.