Tier1 Status Report Andrew Sansum Service Challenge Meeting 27 January 2004.

Slides:



Advertisements
Similar presentations
Tier1 Status Report Andrew Sansum GRIDPP12 1 February 2004.
Advertisements

ESLEA and HEPs Work on UKLight Network. ESLEA Exploitation of Switched Lightpaths in E- sciences Applications Exploitation of Switched Lightpaths in E-
Exporting Raw/ESD data from Tier-0 Tier-1s Wrap-up.
The Atlas Petabyte Datastore A grid enabled, networked data storage system: CrystalGrid Workshop 15 th Sept 2004 David Corney.
LCG Tiziana Ferrari - SC3: INFN installation status report 1 Service Challenge Phase 3: Status report Tiziana Ferrari on behalf of the INFN SC team INFN.
Status GridKa & ALICE T2 in Germany Kilian Schwarz GSI Darmstadt.
Martin Bly RAL CSF Tier 1/A RAL Tier 1/A Status HEPiX-HEPNT NIKHEF, May 2003.
Tier1A Status Andrew Sansum GRIDPP 8 23 September 2003.
Martin Bly RAL Tier1/A RAL Tier1/A Site Report HEPiX-HEPNT Vancouver, October 2003.
Southgrid Status Pete Gronbech: 27th June 2006 GridPP 16 QMUL.
Storage Networking Technologies and Virtualization Section 2 DAS and Introduction to SCSI1.
1 RAL Status and Plans Carmine Cioffi Database Administrator and Developer 3D Workshop, CERN, November 2009.
Ian M. Fisk Fermilab February 23, Global Schedule External Items ➨ gLite 3.0 is released for pre-production in mid-April ➨ gLite 3.0 is rolled onto.
1 INDIACMS-TIFR TIER-2 Grid Status Report IndiaCMS Meeting, Sep 27-28, 2007 Delhi University, India.
Southgrid Status Report Pete Gronbech: February 2005 GridPP 12 - Brunel.
Tier 1A Storage Procurement 2001/2002 Andrew Sansum CLRC eScience Centre.
The Mass Storage System at JLAB - Today and Tomorrow Andy Kowalski.
ScotGrid: a Prototype Tier-2 Centre – Steve Thorn, Edinburgh University SCOTGRID: A PROTOTYPE TIER-2 CENTRE Steve Thorn Authors: A. Earl, P. Clark, S.
08/06/00 LHCb(UK) Meeting Glenn Patrick LHCb(UK) Computing/Grid: RAL Perspective Glenn Patrick Central UK Computing (what.
April 2001HEPix/HEPNT1 RAL Site Report John Gordon CLRC, UK.
12th November 2003LHCb Software Week1 UK Computing Glenn Patrick Rutherford Appleton Laboratory.
23 Oct 2002HEPiX FNALJohn Gordon CLRC-RAL Site Report John Gordon CLRC eScience Centre.
RAL Tier 1 Site Report HEPSysMan – RAL – May 2006 Martin Bly.
Tier1 Status Report Martin Bly RAL 27,28 April 2005.
Martin Bly RAL Tier1/A RAL Tier1/A Report HepSysMan - July 2004 Martin Bly / Andrew Sansum.
Linux Servers with JASMine K. Edwards, A. Kowalski, S. Philpott HEPiX May 21, 2003.
1 CNAP 22nd March 2004 Summary of Atlas Petabyte Data Store User Group Meeting March 4 th 2004.
Introduction to dCache Zhenping (Jane) Liu ATLAS Computing Facility, Physics Department Brookhaven National Lab 09/12 – 09/13, 2005 USATLAS Tier-1 & Tier-2.
RAL Site Report Andrew Sansum e-Science Centre, CCLRC-RAL HEPiX May 2004.
RAL Site Report John Gordon IT Department, CLRC/RAL HEPiX Meeting, JLAB, October 2000.
São Paulo Regional Analysis Center SPRACE Status Report 22/Aug/2006 SPRACE Status Report 22/Aug/2006.
19th September 2003Tim Adye1 RAL Tier A Status Tim Adye Rutherford Appleton Laboratory BaBar UK Collaboration Meeting Royal Holloway 19 th September 2003.
O AK R IDGE N ATIONAL L ABORATORY U.S. D EPARTMENT OF E NERGY Facilities and How They Are Used ORNL/Probe Randy Burris Dan Million – facility administrator.
J Coles eScience Centre Storage at RAL Tier1A Jeremy Coles
Owen SyngeTitle of TalkSlide 1 Storage Management Owen Synge – Developer, Packager, and first line support to System Administrators. Talks Scope –GridPP.
KIT – The cooperation of Forschungszentrum Karlsruhe GmbH und Universität Karlsruhe (TH) Hadoop on HEPiX storage test bed at FZK Artem Trunov.
Martin Bly RAL Tier1/A Centre Preparations for the LCG Tier1 Centre at RAL LCG CERN 23/24 March 2004.
Grid User Interface for ATLAS & LHCb A more recent UK mini production used input data stored on RAL’s tape server, the requirements in JDL and the IC Resource.
Tier1 Andrew Sansum GRIDPP 10 June GRIDPP10 June 2004Tier1A2 Production Service for HEP (PPARC) GRIDPP ( ). –“ GridPP will enable testing.
USATLAS dCache System and Service Challenge at BNL Zhenping (Jane) Liu RHIC/ATLAS Computing Facility, Physics Department Brookhaven National Lab 10/13/2005.
Tier1A Status Andrew Sansum 30 January Overview Systems Staff Projects.
Forschungszentrum Karlsruhe in der Helmholtz-Gemeinschaft Implementation of a reliable and expandable on-line storage for compute clusters Jos van Wezel.
RAL Site report John Gordon ITD October 1999
RAL Site Report John Gordon HEPiX/HEPNT Catania 17th April 2002.
Status SC3 SARA/Nikhef 20 juli Status & results SC3 throughput phase SARA/Nikhef Mark van de Sanden.
Derek Ross E-Science Department DCache Deployment at Tier1A UK HEP Sysman April 2005.
UK Tier 1 Centre Glenn Patrick LHCb Software Week, 28 April 2006.
BNL Service Challenge 3 Status Report Xin Zhao, Zhenping Liu, Wensheng Deng, Razvan Popescu, Dantong Yu and Bruce Gibbard USATLAS Computing Facility Brookhaven.
CERN Computer Centre Tier SC4 Planning FZK October 20 th 2005 CERN.ch.
Status of the Bologna Computing Farm and GRID related activities Vincenzo M. Vagnoni Thursday, 7 March 2002.
RAL Site Report HEPiX - Rome 3-5 April 2006 Martin Bly.
Tier-1 Andrew Sansum Deployment Board 12 July 2007.
RHIC/US ATLAS Tier 1 Computing Facility Site Report Christopher Hollowell Physics Department Brookhaven National Laboratory HEPiX Upton,
Data Transfer Service Challenge Infrastructure Ian Bird GDB 12 th January 2005.
Tier1A Status Martin Bly 28 April CPU Farm Older hardware: –108 dual processors (450, 600 and 1GHz) –156 dual processor 1400MHz PIII Recent delivery:
RAL Site Report Martin Bly SLAC – October 2005.
LCG Service Challenges SC2 Goals Jamie Shiers, CERN-IT-GD 24 February 2005.
The RAL Tier-1 and the 3D Deployment Andrew Sansum 3D Meeting 22 March 2006.
8 August 2006MB Report on Status and Progress of SC4 activities 1 MB (Snapshot) Report on Status and Progress of SC4 activities A weekly report is gathered.
BNL dCache Status and Plan CHEP07: September 2-7, 2007 Zhenping (Jane) Liu for the BNL RACF Storage Group.
1 5/4/05 Fermilab Mass Storage Enstore, dCache and SRM Michael Zalokar Fermilab.
J Jensen/J Gordon RAL Storage Storage at RAL Service Challenge Meeting 27 Jan 2005.
The RAL PPD Tier 2/3 Current Status and Future Plans or “Are we ready for next year?” Chris Brew PPD Christmas Lectures th December 2007.
Storage & Database Team Activity Report INFN CNAF,
RAL Plans for SC2 Andrew Sansum Service Challenge Meeting 24 February 2005.
CERN Disk Storage Technology Choices LCG-France Meeting April 8 th 2005 CERN.ch.
Bernd Panzer-Steindel CERN/IT/ADC1 Medium Term Issues for the Data Challenges.
Experience of Lustre at QMUL
NL Service Challenge Plans
Scalable Database Services for Physics: Oracle 10g RAC on Linux
Presentation transcript:

Tier1 Status Report Andrew Sansum Service Challenge Meeting 27 January 2004

15 September 2004Tier-1 Status Report Overview What we will cover: AndrewTier-1 local infrastructure for Service Challenges RobinSite and UK networking for Tier 1 John (today) GRIDPP Storage Group plans for SRM John (Tomorrow) Long range planning for LCG

15 September 2004Tier-1 Status Report SC2 Team Intend to share load among several staff: –RAL to CERN Networking: Chris Seelig –LAN and hardware deployment: Martin Bly –Local System Tuning/RAID: Nick White –dCache: Derek Ross –Grid Interfaces: Steve Traylen Also expect to call on support from: –GRIDPP Storage Group (for SRM/dCache support) –GRIDPP Network Group (for end to end Network Optimisation)

15 September 2004Tier-1 Status Report Tier1A Disk TB –Dual Processor Server –Dual channel SCSI interconnect –External IDE/SCSI RAID arrays (Accusys and Infortrend) –ATA drives (mainly Maxtor) –Cheap and (fairly) cheerful –37 servers 2004 (140TB) –Infortrend Eonstore SATA/SCSI RAID Arrays –16*250GB Western Digital SATA per array –Two arrays per server –20 servers

15 September 2004Tier-1 Status Report Typical configuration EonStore Server: dual 2.8Ghz Xeon Eonstore U320 SCSI 7501 chipset 2*1Gbit uplink (1 in use) 15+1 RAID 5 250GB SATA 2 Luns per device, ext2 4 filesystems NFS

15 September 2004Tier-1 Status Report Disk Ideally deploy production capacity for Service Challenge –Tune kit/sw we need to work well in production –Tried and tested – no nasty suprises –Don’t invest effort in one off installation –OK for 8 week block provided no major resource clash, problematic for longer than that. –4 servers (maybe 8) potentially available Plan B – deploy batch worker nodes with extra disk drive. –Less keen – lower per node performance … –Allows us to retain the infrastructure

15 September 2004Tier-1 Status Report Device Throughput

15 September 2004Tier-1 Status Report Performance Notes Rough and ready preliminary check, – out of the box. Redhat 7.3 with kernel Just intended to show capability – no time put into tuning yet. At low thread count system caching is impacting benchmark. With larger files 150MB is more usual for write. Read performance seems to have a problem,probably a hyper-threading effect – however pretty good at high thread count. Probably can drive both arrays in parallel

15 September 2004Tier-1 Status Report dCache dCache deployed as production service (also test instance, JRA1, developer1 and developer2?) Now available in production for ATLAS, CMS, LHCB and DTEAM (17TB now configured – 4TB used) Reliability good – but load is light Work underway to provide Tape backend, prototype already operational. This will be production SRM to tape for SC3 Wish to use dCache (preferably production instance) as interface to Service Challenge 2.

15 September 2004Tier-1 Status Report Current Deployment at RAL

15 September 2004Tier-1 Status Report LAN Network will evolve in several stages Choose low cost solutions Minimise spend until needed Maintain flexibility Expect to be able to attach to UKLIGHT by March (but see Robin’s talk).

15 September 2004Tier-1 Status Report Now (Production) Disk+CPU Nortel 5510 stack (80Gbit) Summit 7i N*1 Gbit 1 Gbit/link Site Router

15 September 2004Tier-1 Status Report Next (Production) Disk+CPU Nortel 5510 stack (80Gbit) 10 Gigabit Switch N*10 Gbit 10 Gb 1 Gbit/link Site Router

15 September 2004Tier-1 Status Report Soon (Lightpath) Disk+CPU Nortel 5510 stack (80Gbit) Summit 7i N*1 Gbit 1 Gbit 1 Gbit/link Site Router UKLIGHTUKLIGHT Dual attach 10Gb10Gb 4*1Gbit

15 September 2004Tier-1 Status Report MSS Stress Testing Preparation for SC 3 (and beyond) underway (Tim Folkes). Underway since August. Motivation – service load has been historically rather low. Look for “Gotchas” Review known limitations. Stress test – part of the way through the process – just a taster here –Measure performance –Fix trivial limitations –Repeat –Buy more hardware –Repeat

15 September 2004Tier-1 Status Report STK x 9940 tape drives ADS_switch_1 ADS_Switch_2 Brocade FC switches 4 drives to each switch ermintrude AIX dataserver florence AIX dataserver zebedee AIX dataserver dougal AIX dataserver mchenry1 AIX Test flfsys basil AIX test dataserver brian AIX flfsys ADS0CNTR Redhat counter ADS0PT01 Redhat pathtape ADS0SB01 Redhat SRB interface dylan AIX Import/export buxton SunOS ACSLS User array4 array3 array2 array1 catalogue cache catalogue cache Test system SRB Inq; S commands; MySRB Tape devices ADS tape ADS sysreq admin commands create query User pathtape commands Logging Physical connection (FC/SCSI) Sysreq udp command User SRB command VTP data transfer SRB data transfer STK ACSLS command All sysreq, vtp and ACSLS connections to dougal also apply to the other dataserver machines, but are left out for clarity Production system SRB pathtape commands Thursday, 04 November 2004

15 September 2004Tier-1 Status Report flfstk tapeserv Farm Server flfsys (+libflf) user flfscan data transfer (libvtp) catalogue data STK tape drive cellmgr Catalogue Server (brian) flfdoexp (+libflf) flfdoback (+libflf) datastore (script) Robot Server (buxton) ACSLS API control info (mount/ dismount) data Tape Robot flfsys user commands (sysreq) SE recycling (+libflf) read Atlas Datastore Architecture 28 Feb B Strong SSI CSI flfsys farm commands (sysreq) LMU flfsys admin commands (sysreq) administrators flfaio IBM tape drive flfqryoff (copy of flfsys code) Backup catalogue stats flfsys tape commands (sysreq) servesys pathtape long name (sysreq) short name (sysreq) frontend backend Pathtape Server (rusty) (sysreq) importexport flfsys import/export commands (sysreq) libvtp User Node I/E Server (dylan) ? Copy B Copy C ACSLS cache disk Copy A vtp user program tape (sysreq)

15 September 2004Tier-1 Status Report Code Walkthrough Limits to growth –Catalogue able to store 1 million objects –In memory index currently limited to 700,000 –Limited to 8 data servers –Object names limited to 8 char username and 6 char tape –Maximum file size limited to 160GB –Transfer limits: 8 writes per server, 3 per user All can be fixed by re-coding, although transfer limits are limited by hardware performance

15 September 2004Tier-1 Status Report Catalogue Manipulation

15 September 2004Tier-1 Status Report Catalogue Manipulation Performance breaks down under high load: Solutions: –Buy faster hardware (faster disk/CPU etc) –Replace catalogue by Oracle for high transaction processing and resiliance.

15 September 2004Tier-1 Status Report Write Performance Single Server Test

15 September 2004Tier-1 Status Report Conclusions (Write) Server accepts data at about 40MB/s until cache fills up. Problem balancing write to cache against checksum followed by read from cache to tape. Aggregate read (putting out to tape) only 15MB/s until writing becomes throttled. Then read performance improves. Aggregate throughput to cache disk 50-60MB/s shared between 3 threads (write+read). Estimate suggests 60-80MB/s -> tape now. Buy more/faster disk and try again.

15 September 2004Tier-1 Status Report Timeline Expect to have UKLIGHT in time for March start, but schedule for end to end network availability needs to be finalised. At least 2 options for disk servers – we prefer to use production servers – but depends on timetable.