Site Report: The Linux Farm at the RCF HEPIX-HEPNT October 22-25, 2002 Ofer Rind RHIC Computing Facility Brookhaven National Laboratory.

Slides:



Advertisements
Similar presentations
Tony Doyle - University of Glasgow GridPP EDG - UK Contributions Architecture Testbed-1 Network Monitoring Certificates & Security Storage Element R-GMA.
Advertisements

The RHIC-ATLAS Computing Facility at BNL HEPIX – Edinburgh May 24-28, 2004 Tony Chan RHIC Computing Facility Brookhaven National Laboratory.
12th September 2002Tim Adye1 RAL Tier A Tim Adye Rutherford Appleton Laboratory BaBar Collaboration Meeting Imperial College, London 12 th September 2002.
Liverpool HEP – Site Report May 2007 John Bland, Robert Fay.
Manchester HEP Desktop/ Laptop 30 Desktop running RH Laptop Windows XP & RH Home server AFS using openafs 3 DB servers. Web server AFS Mail Server.
ATLAS Tier-3 in Geneva Szymon Gadomski, Uni GE at CSCS, November 2009 S. Gadomski, ”ATLAS T3 in Geneva", CSCS meeting, Nov 091 the Geneva ATLAS Tier-3.
S. Gadomski, "ATLAS computing in Geneva", journee de reflexion, 14 Sept ATLAS computing in Geneva Szymon Gadomski description of the hardware the.
Title US-CMS User Facilities Vivian O’Dell US CMS Physics Meeting May 18, 2001.
Magda – Manager for grid-based data Wensheng Deng Physics Applications Software group Brookhaven National Laboratory.
Mass RHIC Computing Facility Razvan Popescu - Brookhaven National Laboratory.
Edinburgh Site Report 1 July 2004 Steve Thorn Particle Physics Experiments Group.
STAR Software Basics Introduction to the working environment Lee Barnby - Kent State University.
Data oriented job submission scheme for the PHENIX user analysis in CCJ Tomoaki Nakamura, Hideto En’yo, Takashi Ichihara, Yasushi Watanabe and Satoshi.
ScotGrid: a Prototype Tier-2 Centre – Steve Thorn, Edinburgh University SCOTGRID: A PROTOTYPE TIER-2 CENTRE Steve Thorn Authors: A. Earl, P. Clark, S.
US ATLAS Western Tier 2 Status and Plan Wei Yang ATLAS Physics Analysis Retreat SLAC March 5, 2007.
Andrew McNab - Manchester HEP - 5 July 2001 WP6/Testbed Status Status by partner –CNRS, Czech R., INFN, NIKHEF, NorduGrid, LIP, Russia, UK Security Integration.
Central Reconstruction System on the RHIC Linux Farm in Brookhaven Laboratory HEPIX - BNL October 19, 2004 Tomasz Wlodek - BNL.
Ofer Rind - RHIC Computing Facility Site Report The RHIC Computing Facility at BNL HEPIX-HEPNT Vancouver, BC, Canada October 20, 2003 Ofer Rind RHIC Computing.
Farm Management D. Andreotti 1), A. Crescente 2), A. Dorigo 2), F. Galeazzi 2), M. Marzolla 3), M. Morandin 2), F.
An Overview of PHENIX Computing Ju Hwan Kang (Yonsei Univ.) and Jysoo Lee (KISTI) International HEP DataGrid Workshop November 8 ~ 9, 2002 Kyungpook National.
Paul Scherrer Institut 5232 Villigen PSI HEPIX_AMST / / BJ95 PAUL SCHERRER INSTITUT THE PAUL SCHERRER INSTITUTE Swiss Light Source (SLS) Particle accelerator.
Jean-Yves Nief CC-IN2P3, Lyon HEPiX-HEPNT, Fermilab October 22nd – 25th, 2002.
Introduction to U.S. ATLAS Facilities Rich Baker Brookhaven National Lab.
Tier 1 Facility Status and Current Activities Rich Baker Brookhaven National Laboratory NSF/DOE Review of ATLAS Computing June 20, 2002.
Jefferson Lab Site Report Kelvin Edwards Thomas Jefferson National Accelerator Facility Newport News, Virginia USA
Jefferson Lab Site Report Kelvin Edwards Thomas Jefferson National Accelerator Facility HEPiX – Fall, 2005.
14 Aug 08DOE Review John Huth ATLAS Computing at Harvard John Huth.
Software Scalability Issues in Large Clusters CHEP2003 – San Diego March 24-28, 2003 A. Chan, R. Hogue, C. Hollowell, O. Rind, T. Throwe, T. Wlodek RHIC.
21 st October 2002BaBar Computing – Stephen J. Gowdy 1 Of 25 BaBar Computing Stephen J. Gowdy BaBar Computing Coordinator SLAC 21 st October 2002 Second.
6/26/01High Throughput Linux Clustering at Fermilab--S. Timm 1 High Throughput Linux Clustering at Fermilab Steven C. Timm--Fermilab.
MAGDA Roger Jones UCL 16 th December RWL Jones, Lancaster University MAGDA  Main authors: Wensheng Deng, Torre Wenaus Wensheng DengTorre WenausWensheng.
Developing & Managing A Large Linux Farm – The Brookhaven Experience CHEP2004 – Interlaken September 27, 2004 Tomasz Wlodek - BNL.
SLAC Site Report Chuck Boeheim Assistant Director, SLAC Computing Services.
Manchester HEP Desktop/ Laptop 30 Desktop running RH Laptop Windows XP & RH OS X Home server AFS using openafs 3 DB servers Kerberos 4 we will move.
1 The new Fabric Management Tools in Production at CERN Thorsten Kleinwort for CERN IT/FIO HEPiX Autumn 2003 Triumf Vancouver Monday, October 20, 2003.
RAL Site Report John Gordon IT Department, CLRC/RAL HEPiX Meeting, JLAB, October 2000.
Kurt Mueller San Diego Supercomputer Center NPACI HotPage Updates.
The GRID and the Linux Farm at the RCF HEPIX – Amsterdam HEPIX – Amsterdam May 19-23, 2003 May 19-23, 2003 A. Chan, R. Hogue, C. Hollowell, O. Rind, A.
And Tier 3 monitoring Tier 3 Ivan Kadochnikov LIT JINR
Architecture and ATLAS Western Tier 2 Wei Yang ATLAS Western Tier 2 User Forum meeting SLAC April
U.S. ATLAS Tier 1 Planning Rich Baker Brookhaven National Laboratory US ATLAS Computing Advisory Panel Meeting Argonne National Laboratory October 30-31,
Wellcome Trust Sanger Institute Informatics Systems Group Ensembl Compute Grid issues James Cuff Informatics Systems Group Wellcome Trust Sanger Institute.
The GRID and the Linux Farm at the RCF CHEP 2003 – San Diego CHEP 2003 – San Diego March 27, 2003 March 27, 2003 A. Chan, R. Hogue, C. Hollowell, O. Rind,
STAR Off-line Computing Capabilities at LBNL/NERSC Doug Olson, LBNL STAR Collaboration Meeting 2 August 1999, BNL.
US ATLAS Tier 1 Facility Rich Baker Brookhaven National Laboratory DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National Laboratory.
Installing, running, and maintaining large Linux Clusters at CERN Thorsten Kleinwort CERN-IT/FIO CHEP
US ATLAS Tier 1 Facility Rich Baker Brookhaven National Laboratory Review of U.S. LHC Software and Computing Projects Fermi National Laboratory November.
ATLAS Tier 1 at BNL Overview Bruce G. Gibbard Grid Deployment Board BNL 5-6 September 2006.
RAL Site report John Gordon ITD October 1999
HEP Computing Status Sheffield University Matt Robinson Paul Hodgson Andrew Beresford.
December 26, 2015 RHIC/USATLAS Grid Computing Facility Overview Dantong Yu Brookhaven National Lab.
CERN Computer Centre Tier SC4 Planning FZK October 20 th 2005 CERN.ch.
Randy MelenApril 14, Stanford Linear Accelerator Center Site Report April 1999 Randy Melen SLAC Computing Services/Systems HPC Team Leader.
RHIC/US ATLAS Tier 1 Computing Facility Site Report Christopher Hollowell Physics Department Brookhaven National Laboratory HEPiX Upton,
January 30, 2016 RHIC/USATLAS Computing Facility Overview Dantong Yu Brookhaven National Lab.
May 10, 2000PHENIX CC-J Updates1 PHENIX CC-J Updates - Preparation For Opening - N.Hayashi / RIKEN May 10, 2000 PHENIX Computing
PDSF and the Alvarez Clusters Presented by Shane Canon, NERSC/PDSF
W.A.Wojcik/CCIN2P3, Nov 1, CCIN2P3 Site report Wojciech A. Wojcik IN2P3 Computing Center URL:
US ATLAS Tier 1 Facility Rich Baker Deputy Director US ATLAS Computing Facilities October 26, 2000.
R. Krempaska, October, 2013 Wir schaffen Wissen – heute für morgen Controls Security at PSI Current Status R. Krempaska, A. Bertrand, C. Higgs, R. Kapeller,
A Service-Based SLA Model HEPIX -- CERN May 6, 2008 Tony Chan -- BNL.
BNL dCache Status and Plan CHEP07: September 2-7, 2007 Zhenping (Jane) Liu for the BNL RACF Storage Group.
Western Tier 2 Site at SLAC Wei Yang US ATLAS Tier 2 Workshop Harvard University August 17-18, 2006.
Oct. 6, 1999PHENIX Comp. Mtg.1 CC-J: Progress, Prospects and PBS Shin’ya Sawada (KEK) For CCJ-WG.
CCJ introduction RIKEN Nishina Center Kohei Shoji.
The RAL PPD Tier 2/3 Current Status and Future Plans or “Are we ready for next year?” Chris Brew PPD Christmas Lectures th December 2007.
CCIN2P3 Site Report - BNL, Oct 18, CCIN2P3 Site report Wojciech A. Wojcik IN2P3 Computing Center.
Advanced Computing Facility Introduction
Nuclear Physics Data Management Needs Bruce G. Gibbard
Presentation transcript:

Site Report: The Linux Farm at the RCF HEPIX-HEPNT October 22-25, 2002 Ofer Rind RHIC Computing Facility Brookhaven National Laboratory

Ofer Rind - RHIC Computing Facility Site Report RCF - Overview Provide computing facilities for RHIC users: General computing environment General interactive tasks ( , document processing, web) Data analysis facility Computing infrastructure for RHIC experiments Code development, repository & distribution Raw data recording & reconstruction Data analysis ACF: US Atlas Tier 1 Computing Facility Shared infrastructure and synergy with RCF Support staff: 25 FTE's (4 dedicated to Linux Farm)

Ofer Rind - RHIC Computing Facility Site Report RCF - Structure

RCF - Component Summary Mass Storage Subsystem StorageTek library managed by HPSS 4 Silos, 1.2PB capacity (expanding to 4.5PB) In Run-2, raw data recorded at a common rate of 70MB/sec for a total of 170TB Total data store ~300TB Disk Storage Fibre channel SAN served by NFS ~110TB Raid5 14 Sun 450, Solaris 8 [2-02] ( 5 Sun 480 coming online) IBM AFS servers (AIX) Linux Server Farm Ofer Rind - RHIC Computing Facility Site Report

Linux Farm Hardware 840 1U and 2U servers (pre-'99 towers have been retired) 69 kSPECint95, expanding to 100 kSPECint95 (2+ TFLOPS) Most have 1GB mem (at least 500MB) Local SCSI disks up to 140GB/node Allocated by experiment Further allocated for Raw Data Reconstruction (CRS) and Re- constructed Data Analysis (CAS) VA Linux PIII 450Mz148Jun 99 VA Linux PIII 700Mz48Aug 00 VA Linux PIII 800Mz168Nov 00 IBM PIII 1000Mz316Aug 01 IBM PIII 1400Mz160Oct 02 Ofer Rind - RHIC Computing Facility Site Report

Linux Farm Software Configuration RedHat 7.2 upgraded to kernel Image(s) installed via Kickstart server and customized for RCF environment via rpm NFS + AFS home directory and file access Interactive login allowed on selected nodes Job management: (CAS) LSF slightly re-architected for robustness. Peak throughput before summer conferences was >150K jobs/week. (CRS) Locally produced Perl-based batch system (AIX needed for HPSS API). Approx. 670K jobs processed for Run-2. Expanding use of distributed disk models (rootd, ??) Atlas Grid testbed Ofer Rind - RHIC Computing Facility Site Report

Tracking LSF Usage Star queues weekly job statistics (week of Oct. 10) Job starts/hr Avg runtime/hr Runtime Ofer Rind - RHIC Computing Facility Site Report

Security and Monitoring Security: RCF firewall within BNL site firewall SSH2 only access through gateway bastion nodes (Solaris x86) User access restricted to a subset of systems (CAS only) Monitoring: 24 hr. on-call staff for critical systems during RHIC operation Cluster mgmt. software: VACM (VA Linux) xCAT (IBM, Cron scripts to "clean" nodes and head off possible problems (memory leaks, full disks, etc.) CTS system for problem reports Ofer Rind - RHIC Computing Facility Site Report

Farm Alert System Web-monitoring (user-accessible) plus paging/ alerts Python scripts running locally transferring node status information to a MySQL database. Notification of problems with NFS/AFS (e.g. stale file handles), LSF daemons, high load, etc. Ofer Rind - RHIC Computing Facility Site Report

Network Operation Status Perl scripts monitor network service connectivity for all nodes (ssh, yp, etc.) Ofer Rind - RHIC Computing Facility Site Report

Load Monitoring and History MySQL database for usage history History available back to Sept. '01 via web interface. CPU Load averaged over (98) Phenix machines during the month of September. Ofer Rind - RHIC Computing Facility Site Report

Plans for the Near Future 160 newly delivered IBM nodes to be brought online Expect purchase bid to go out for ~220 more nodes at beginning of FY03 (pending funding approval) Scaling up data storage capacity and throughput for Run-3 (up to 10X data increase over Run-2, starting in December) Evaluation of LSF 5 and Condor ongoing, with an eye towards distributed disk services Expanding Atlas GRID services Ofer Rind - RHIC Computing Facility Site Report