Report of Liverpool HEP Computing during 2007 Executive Summary. Substantial and significant improvements in the local computing facilities during the.

Slides:

Advertisements

Similar presentations

Express5800/ft series servers Product Information Fault-Tolerant General Purpose Servers.

Advertisements

NorthGrid status Alessandra Forti Gridpp12 Brunel, 1 February 2005.

Liverpool HEP – Site Report May 2007 John Bland, Robert Fay.

Liverpool HEP - Site Report June 2008 Robert Fay, John Bland.

UCL HEP Computing Status HEPSYSMAN, RAL,

24-Apr-03UCL HEP Computing Status April DESKTOPS LAPTOPS BATCH PROCESSING DEDICATED SYSTEMS GRID MAIL WEB WTS SECURITY SOFTWARE MAINTENANCE BACKUP.

Status GridKa & ALICE T2 in Germany Kilian Schwarz GSI Darmstadt.

Chris Brew RAL PPD Site Report Chris Brew SciTech/PPD.

Birmingham site report Lawrie Lowe: System Manager Yves Coppens: SouthGrid support HEP System Managers’ Meeting, RAL, May 2007.

Southgrid Status Pete Gronbech: 27th June 2006 GridPP 16 QMUL.

IHEP Site Status Jingyan Shi, Computing Center, IHEP 2015 Spring HEPiX Workshop.

NorthGrid status Alessandra Forti Gridpp13 Durham, 4 July 2005.

S. Gadomski, "ATLAS computing in Geneva", journee de reflexion, 14 Sept ATLAS computing in Geneva Szymon Gadomski description of the hardware the.

SUMS Storage Requirement 250 TB fixed disk cache 130 TB annual increment for permanently on- line data 100 TB work area (not controlled by SUMS) 2 PB near-line.

Lesson 1: Configuring Network Load Balancing

IFIN-HH LHCB GRID Activities Eduard Pauna Radu Stoica.

Virtual Network Servers. What is a Server? 1. A software application that provides a specific one or more services to other computers  Example: Apache.

Gareth Smith RAL PPD HEP Sysman. April 2003 RAL Particle Physics Department Site Report.

1 INDIACMS-TIFR TIER-2 Grid Status Report IndiaCMS Meeting, Sep 27-28, 2007 Delhi University, India.

Cluster computing facility for CMS simulation work at NPD-BARC Raman Sehgal.

Tier 3g Infrastructure Doug Benjamin Duke University.

Southgrid Status Report Pete Gronbech: February 2005 GridPP 12 - Brunel.

UCL Site Report Ben Waugh HepSysMan, 22 May 2007.

ScotGrid: a Prototype Tier-2 Centre – Steve Thorn, Edinburgh University SCOTGRID: A PROTOTYPE TIER-2 CENTRE Steve Thorn Authors: A. Earl, P. Clark, S.

The Cost of Cheap Understanding Your IT Investment Options.

Virtualization. Virtualization  In computing, virtualization is a broad term that refers to the abstraction of computer resources  It is "a technique.

27/04/05Sabah Salih Particle Physics Group The School of Physics and Astronomy The University of Manchester

Chapter 8 Implementing Disaster Recovery and High Availability Hands-On Virtual Computing.

CC - IN2P3 Site Report Hepix Fall meeting 2009 – Berkeley

RAL PPD Site Update and other odds and ends Chris Brew.

Southgrid Technical Meeting Pete Gronbech: 16 th March 2006 Birmingham.

The SLAC Cluster Chuck Boeheim Assistant Director, SLAC Computing Services.

30-Jun-04UCL HEP Computing Status June UCL HEP Computing Status April DESKTOPS LAPTOPS BATCH PROCESSING DEDICATED SYSTEMS GRID MAIL WEB WTS.

David Hutchcroft on behalf of John Bland Rob Fay Steve Jones And Mike Houlden [ret.] * /.\ /..‘\ /'.‘\ /.''.'\ /.'.'.\ /'.''.'.\ ^^^[_]^^^ * /.\ /..‘\

Group Computing Strategy Introduction and BaBar Roger Barlow June 28 th 2005.

SouthGrid Status Pete Gronbech: 2 nd April 2009 GridPP22 UCL.

INTRODUCTION The GRID Data Center at INFN Pisa hosts a big Tier2 for the CMS experiment, together with local usage from other HEP related/not related activities.

12th November 2003LHCb Software Week1 UK Computing Glenn Patrick Rutherford Appleton Laboratory.

23 Oct 2002HEPiX FNALJohn Gordon CLRC-RAL Site Report John Gordon CLRC eScience Centre.

ScotGRID:The Scottish LHC Computing Centre Summary of the ScotGRID Project Summary of the ScotGRID Project Phase2 of the ScotGRID Project Phase2 of the.

T. Bowcock Liverpool Sept 00. Sept LHCb-GRID T. Bowcock 2 University of Liverpool Successes Issues Improving the system Comments.

Batch Scheduling at LeSC with Sun Grid Engine David McBride Systems Programmer London e-Science Centre Department of Computing, Imperial College.

28 April 2003Imperial College1 Imperial College Site Report HEP Sysman meeting 28 April 2003.

RAL PPD Computing A tier 2, a tier 3 and a load of other stuff Rob Harper, June 2011.

Manchester HEP Desktop/ Laptop 30 Desktop running RH Laptop Windows XP & RH OS X Home server AFS using openafs 3 DB servers Kerberos 4 we will move.

CEA DSM Irfu IRFU site report. CEA DSM Irfu HEPiX Fall 0927/10/ Computing centers used by IRFU people IRFU local computing IRFU GRIF sub site Windows.

HEP Computing Status Sheffield University Matt Robinson Paul Hodgson Andrew Beresford.

IHEP(Beijing LCG2) Site Report Fazhi.Qi, Gang Chen Computing Center,IHEP.

6. Juli 2015 Dietrich Liko Physics Computing 114. Vorstandssitzung.

CD FY09 Tactical Plan Status FY09 Tactical Plan Status Report for Neutrino Program (MINOS, MINERvA, General) Margaret Votava April 21, 2009 Tactical plan.

The 2001 Tier-1 prototype for LHCb-Italy Vincenzo Vagnoni Genève, November 2000.

Accurate Information … Informed Decisions Data from automated and manual sources in a central, secure repository providing easy regulatory and ad hoc reporting,

RAL PPD Tier 2 (and stuff) Site Report Rob Harper HEP SysMan 30 th June

BaBar Cluster Had been unstable mainly because of failing disks Very few (

15-Feb-02Steve Traylen, RAL WP6 Test Bed Report1 RAL/UK WP6 Test Bed Report Steve Traylen, WP6 PPGRID/RAL, UK

RALPP Site Report HEP Sys Man, 11 th May 2012 Rob Harper.

Evangelos Markatos and Charalampos Gkikas FORTH-ICS Athens, th Mar Institute of Computer Science - FORTH Christos.

Next Generation of Apache Hadoop MapReduce Owen

The RAL PPD Tier 2/3 Current Status and Future Plans or “Are we ready for next year?” Chris Brew PPD Christmas Lectures th December 2007.

10/18/01Linux Reconstruction Farms at Fermilab 1 Steven C. Timm--Fermilab.

November 28, 2007 Dominique Boutigny – CC-IN2P3 CC-IN2P3 Update Status.

Bernd Panzer-Steindel CERN/IT/ADC1 Medium Term Issues for the Data Challenges.

Oracle & HPE 3PAR.

The Beijing Tier 2: status and plans

Belle II Physics Analysis Center at TIFR

HEPiX Spring 2014 Annecy-le Vieux May Martin Bly, STFC-RAL

Oxford Site Report HEPSYSMAN

Bernd Panzer-Steindel CERN/IT

RHUL Site Report Govind Songara, Antonio Perez,

QMUL Site Report by Dave Kant HEPSYSMAN Meeting /09/2019

Presentation transcript:

Report of Liverpool HEP Computing during 2007 Executive Summary. Substantial and significant improvements in the local computing facilities during the last 12 months Hardware and software causes of instabilities in the LCG cluster understood and largely cured. Many upgrades and improvements to local resources MAP 1 re-cycled CDFJIF1 scrappedBaBar rebuilt as Batch Farm

Further hardware improvements Upgrade of Force10 Switch with extensive recabling dual core Interactive nodes New cross campus link with failover recovery.. New MWS XP cluster

Further hardware improvements (continued) New servers set up :T2K-FE, Cockcroft-FE, ATLAS-FE…. New HEPSTORE RAID6 New secure gateway machine allows SSH access from anywhere. New HEPWALL node to protect the cluster and balance network load. VO10 replaced with SuperVO10 authentication server. 100’s of machines repaired, refurbished or upgraded. Extensive re-cabling of racks in the cluster room (on-going). All non-HEP servers moved and isolated in the old CDFJIF1 rack. 3 racks of DELL nodes taken back from AiMes and installed. Switches replaced in some water-cooled racks (horrible job). New network and computers in former OL library. 10 TB RAID6 added to LCG cluster. Complete upgrade of all office desktops finished. Major repairs to both chiller units on roof of OL.

In the pipeline, going as fast as possible Upgrades to the HEP and MON nodes to improve reliability, and speed for services etc. (big job, many services to be tested). “Puppet” system for managing all the software installations on all machines in the cluster room. Complete rebuild of LCG cluster : replace SE and CE nodes, add UPS to GRID servers, reconfigure d-Cache, upgrade to SL4 and (hopefully) replace NFS with AFS. Link most MAP2 nodes into the LCG cluster with job Qs for different tasks, e.g. GRID computing, batch farm analysis, local MC production, MPI facilities (use multiply nodes as one computer). Every node has to earn its keep.

Further software upgrades. Complete database for all hardware. New hardware monitoring system for all machines SL4 rolled out on some interactive nodes Automated daily backup of all critical system and user files

Problems still outstanding Monitoring of water cooled racks to spot cooling failure and make a clean shutdown of the cluster is still needed.  shut these down this Xmas (hopefully for last time). Install more air con units: current ones running at full capacity so ageing rapidly. B&E responsibility. Higher speed external network connection: currently limited to 1 GB. Network within the OL is in need of updating. Clean room legacy computers need attention. Current sys admins office arrangements inadequate.

GRIDPP3 Hardware upgrade and NW GRID Cluster Disappointing outcome for the next round of allocations to add to our GRID hardware: whole of NorthGRID unhappy with the result. Currently we will get ~£67K over 2 years, was hoping for ~£200K as in original (2006) plans. Making further bid for a share of an extra ~£100K but outcome not clear. We will be lucky to get another £10K. New NW GRID cluster ~ MAP2 to be situated in CSD. But the £270K grant comes through Physics, and we plan to closely couple the two clusters. New cluster is very suitable for MPI jobs. Contract says cluster must be installed by end Jan 2008.

HEP Computing in 1983 ~ 1 PII Pentium ~ CPU in mobile phone

Lot of computing changes in 25 Years Local computing resources have increased by ~ 7 orders of magnitude in 25 years (2 25 ~ )! However the number of sys admins ~ same as in So there is a clear need to continue to invest RG money to:- (a) Build in redundancy, UPS and backup in the critical systems; (b) Have automatic failover recovery where possible; (c) Have extensive monitoring for early warning of failures; (d) Use RAID6 (hardware or software). (e) Ensure specialist computers for clean rooms etc. are future- proofed at the purchase stage, and have spare motherboards. Fix your laptops to your desk with a security cable and back up the disk! PW protect them.