12.03.2002Bernd Panzer-Steindel CERN/IT/ADC1 Medium Term Issues for the Data Challenges.

Slides:



Advertisements
Similar presentations
Archive Task Team (ATT) Disk Storage Stuart Doescher, USGS (Ken Gacke) WGISS-18 September 2004 Beijing, China.
Advertisements

Bernd Panzer-Steindel, CERN/IT WAN RAW/ESD Data Distribution for LHC.
Beowulf Supercomputer System Lee, Jung won CS843.
1 Magnetic Disks 1956: IBM (RAMAC) first disk drive 5 Mb – Mb/in $/year 9 Kb/sec 1980: SEAGATE first 5.25’’ disk drive 5 Mb – 1.96 Mb/in2 625.
CASTOR Project Status CASTOR Project Status CERNIT-PDP/DM February 2000.
LHC experimental data: From today’s Data Challenges to the promise of tomorrow B. Panzer – CERN/IT, F. Rademakers – CERN/EP, P. Vande Vyvre - CERN/EP Academic.
6/2/2015Bernd Panzer-Steindel, CERN, IT1 Computing Fabric (CERN), Status and Plans.
12. March 2003Bernd Panzer-Steindel, CERN/IT1 LCG Fabric status
Belle computing upgrade Ichiro Adachi 22 April 2005 Super B workshop in Hawaii.
Reliability Week 11 - Lecture 2. What do we mean by reliability? Correctness – system/application does what it has to do correctly. Availability – Be.
High Performance Computing Course Notes High Performance Storage.
Virtual Network Servers. What is a Server? 1. A software application that provides a specific one or more services to other computers  Example: Apache.
EU funding for DataGrid under contract IST is gratefully acknowledged GridPP Tier-1A Centre CCLRC provides the GRIDPP collaboration (funded.
Bernd Panzer-Steindel, CERN/IT 2 * 50 Itanium Server (dual 1.3/1.5 GHz Itanium2, 2 GB mem) High Througput Prototype (openlab + LCG prototype) (specific.
Cluster computing facility for CMS simulation work at NPD-BARC Raman Sehgal.
Online Systems Status Review of requirements System configuration Current acquisitions Next steps... Upgrade Meeting 4-Sep-1997 Stu Fuess.
9/16/2000Ian Bird/JLAB1 Planning for JLAB Computational Resources Ian Bird.
Planning the LCG Fabric at CERN openlab TCO Workshop November 11 th 2003 CERN.ch.
Farm Management D. Andreotti 1), A. Crescente 2), A. Dorigo 2), F. Galeazzi 2), M. Marzolla 3), M. Morandin 2), F.
12th November 2003LHCb Software Week1 UK Computing Glenn Patrick Rutherford Appleton Laboratory.
23 Oct 2002HEPiX FNALJohn Gordon CLRC-RAL Site Report John Gordon CLRC eScience Centre.
INDIACMS-TIFR Tier 2 Grid Status Report I IndiaCMS Meeting, April 05-06, 2007.
20-22 September 1999 HPSS User Forum, Santa Fe CERN IT/PDP 1 History  Test system HPSS 3.2 installation in Oct 1997 IBM AIX machines with IBM 3590 drives.
LHC Computing Review Recommendations John Harvey CERN/EP March 28 th, th LHCb Software Week.
6/26/01High Throughput Linux Clustering at Fermilab--S. Timm 1 High Throughput Linux Clustering at Fermilab Steven C. Timm--Fermilab.
Laboratório de Instrumentação e Física Experimental de Partículas GRID Activities at LIP Jorge Gomes - (LIP Computer Centre)
RAL Site Report Andrew Sansum e-Science Centre, CCLRC-RAL HEPiX May 2004.
10/22/2002Bernd Panzer-Steindel, CERN/IT1 Data Challenges and Fabric Architecture.
JLAB Computing Facilities Development Ian Bird Jefferson Lab 2 November 2001.
CERN IT Department CH-1211 Genève 23 Switzerland Introduction to CERN Computing Services Bernd Panzer-Steindel, CERN/IT.
Test Results of the EuroStore Mass Storage System Ingo Augustin CERNIT-PDP/DM Padova.
Tier1 Andrew Sansum GRIDPP 10 June GRIDPP10 June 2004Tier1A2 Production Service for HEP (PPARC) GRIDPP ( ). –“ GridPP will enable testing.
IDE disk servers at CERN Helge Meinhard / CERN-IT CERN OpenLab workshop 17 March 2003.
Cluster Configuration Update Including LSF Status Thorsten Kleinwort for CERN IT/PDP-IS HEPiX I/2001 LAL Orsay Tuesday, December 08, 2015.
RAL Site Report John Gordon HEPiX/HEPNT Catania 17th April 2002.
Sep. 17, 2002BESIII Review Meeting BESIII DAQ System BESIII Review Meeting IHEP · Beijing · China Sep , 2002.
1 EIR Nov 4-8, 2002 DAQ and Online WBS 1.3 S. Fuess, Fermilab P. Slattery, U. of Rochester.
Status of the Bologna Computing Farm and GRID related activities Vincenzo M. Vagnoni Thursday, 7 March 2002.
Technology Summary John Gordon. Talks University Multidisciplinary Scientific Computing: Experience and Plans - Alan Tackett (Vanderbilt University) PASTA.
The 2001 Tier-1 prototype for LHCb-Italy Vincenzo Vagnoni Genève, November 2000.
R.Divià, CERN/ALICE Challenging the challenge Handling data in the Gigabit/s range.
David Foster LCG Project 12-March-02 Fabric Automation The Challenge of LHC Scale Fabrics LHC Computing Grid Workshop David Foster 12 th March 2002.
CNAF Database Service Barbara Martelli CNAF-INFN Elisabetta Vilucchi CNAF-INFN Simone Dalla Fina INFN-Padua.
Batch Software at JLAB Ian Bird Jefferson Lab CHEP February, 2000.
01. December 2004Bernd Panzer-Steindel, CERN/IT1 Tape Storage Issues Bernd Panzer-Steindel LCG Fabric Area Manager CERN/IT.
CERN IT Department CH-1211 Genève 23 Switzerland t The Tape Service at CERN Vladimír Bahyl IT-FIO-TSI June 2009.
19. November 2007Bernd Panzer-Steindel, CERN/IT1 CERN Computing Fabric Status LHCC Review, 19 th November 2007.
10/18/01Linux Reconstruction Farms at Fermilab 1 Steven C. Timm--Fermilab.
PADME Kick-Off Meeting – LNF, April 20-21, DAQ Data Rate - Preliminary estimate Tentative setup: all channels read with Fast ADC 1024 samples, 12.
26. Juni 2003Bernd Panzer-Steindel, CERN/IT1 LHC Computing re-costing for for the CERN T0/T1 center.
GDB Meeting 12. January Bernd Panzer-Steindel, CERN/IT 1 Mass Storage at CERN GDB meeting, 12. January 2005.
12/19/01MODIS Science Team Meeting1 MODAPS Status and Plans Edward Masuoka, Code 922 MODIS Science Data Support Team NASA’s Goddard Space Flight Center.
13 January 2004GDB Geneva, Milos Lokajicek Institute of Physics AS CR, Prague LCG regional centre in Prague
15.June 2004Bernd Panzer-Steindel, CERN/IT1 CERN Mass Storage Issues.
ALICE Computing Data Challenge VI
Video Security Design Workshop:
NL Service Challenge Plans
LHC experiments Requirements and Concepts ALICE
Experiences and Outlook Data Preservation and Long Term Analysis
Bernd Panzer-Steindel, CERN/IT
CERN Lustre Evaluation and Storage Outlook
Luca dell’Agnello INFN-CNAF
UK GridPP Tier-1/A Centre at CLRC
The INFN TIER1 Regional Centre
Bernd Panzer-Steindel, CERN/IT
LHC Computing re-costing for
ALICE Data Challenges On the way to 1 GB/s
Bernd Panzer-Steindel CERN/IT
Web Server Administration
The Problem ~6,000 PCs Another ~1,000 boxes But! Affected by:
Presentation transcript:

Bernd Panzer-Steindel CERN/IT/ADC1 Medium Term Issues for the Data Challenges

Bernd Panzer-Steindel CERN/IT/ADC2 Available Hardware Commodity off-the-shelve Dual processor PCs with INTEL CPUs (PIII ~1GHz, 512MB) + Fast Ethernet controller  CPU server ( ATX housing in racks, ~ 2KSFr per box KSFr infrastructure) + Gigabit Ethernet controller, EIDE RAID controller 1 TB EIDE disks  Disk server ( 4U rack mounted, ~ 11KSFr per box KSFr infrastructure) + Gigabit Ethernet controler, SCSI or Fiber channel controller one or two tape drives  Tape server

Bernd Panzer-Steindel CERN/IT/ADC3 Tape infrastructure STK 10 silos with cartridges 28 x 9940 drives ( 10 MB/) used by all experiments New acquisition in Q3/Q4  ~ 20 drives dedicated to the LCG project higher capacity tapes and higher performance drives Network infrastructure 3COM Fast Ethernet and Gigabit Ethernet switches ENTERASYS high end backbone routers 10 GB routers are currently being tested and will be incorporated in the Testbed soon

Bernd Panzer-Steindel CERN/IT/ADC4 LCG Testbed Structure 100 cpu servers on GE, 300 on FE, 100 disk servers on GE (~50TB), 10 tape server on GE 3 GB lines 8 GB lines 64 disk server Backbone Routers Backbone Routers 36 disk server 10 tape server 100 GE cpu server 200 FE cpu server 100 FE cpu server 1 GB lines

Bernd Panzer-Steindel CERN/IT/ADC5 Data Challenge Types ALICE DAQ tests event building, processing and storage  Goal for this year 200 MB/s into CASTOR sustained for one week ( peak 300 MB/s) Scalable middleware tests from the DataGrid project Large scale productions for the physics TDRs (CMS, ATLAS) Installation, configuration and monitoring of large farms  Scalability and robustness ( the whole LCG facility will be used by quite a number of experiments with different environment needs  reconfigurations

Bernd Panzer-Steindel CERN/IT/ADC6 Requirements for different challenges and productions

Bernd Panzer-Steindel CERN/IT/ADC7 Draft schedule for the node allocation in 2002 (clear priority guidelines have been approved)

Bernd Panzer-Steindel CERN/IT/ADC8 Problems and Solutions (1) LINUX Stability and performance has improved considerably during the last two years (MTBF disk server > 200 days, cpu server >100 days scheduled interruptions included) IO performance to be watched Kernel 2.2.x  2.4.x showed improvements by ~ 60% Kernel variations still to be explained 2.4.x  2.4.y : 20 – 30 % Sometimes hard to follow developments and changes

Bernd Panzer-Steindel CERN/IT/ADC9 Problems and Solutions (2) Network Working well, needed a few firmware upgrades in the beginning, only one major bug in a high end router (Obviously we are using/stressing the equipment like nobody else does ) Control and management Installation, configuration and monitoring  Some prototypes used, close collaboration with the DataGrid Low level fabric infrastructure planned for Q2/Q3 (console, reset, diagnostics)  No PC standard

Bernd Panzer-Steindel CERN/IT/ADC10 Hardware Software Level of redundancy, error recovery, fault tolerance How does the software (middleware, application) cope with Hardware problems ? TCO considerations Problems and Solutions (3) e.g. Lack of fail safeness in the software has to be compensated by complexity in the hardware

Bernd Panzer-Steindel CERN/IT/ADC11 Problems and Solutions (4) Disks Tapes Problems with certain IBM disk model, high error rate Finally fixed by firmware upgrade of ~800 disks Only seen with certain data access patterns Regular updates of RAID controller firmware CASTOR HSM successful tested, but questions about load balancing and scalability still need investigations on larger scales General Architecture is based on fully distributed and asynchronous - only few tape drives ‘Impedance’ problem when coupling to disk servers, IO performance of disk server should >> tape drive performance  Higher end disk servers, LINUX IO improvements,…  introduction of more disk cache levels  replace tapes with disks

Bernd Panzer-Steindel CERN/IT/ADC12 Tapes versus disks, Some ‘naive’ calculations (1) 1 PB of tapes with 20 drives Our current installation is from STK, 9940 drives with 60 GB cassettes, ~ 15 MB/s read/write performance per drive Costs for silos, drives, servers, tape media, maintenance over 4 years  ~ 4.1 SFr/GB (1PB with 0.3 GB/s aggregate throughput) With the new types of drives announced (STK, IBM, etc.) (Q3/Q4), an estimation would be the following  ~ 2.6 SFr/GB (1PB with 0.6 GB/s aggregate throughput)

Bernd Panzer-Steindel CERN/IT/ADC13 Tapes versus disks, Some ‘naive’ calculations (2) 1 PB of disk 1 TB of disk space per server, EIDE disk server current standard type (120 GB per disk, 10 disks)  ~11 SFr/GB ( 50 GB/s aggregate throughput) 10 TB of disk space per server, ~ 60 disks (160 GB) Currently not easy with EIDE channels, maybe Firewire or USB 2.0  ~5.5 SFr/GB ( 5 GB/s aggregate throughput) We assume that there are already quite some CPU servers around (2200 nodes) Each node is upgraded with 3 x 160 GB disks Just need to compensate for 10% CPU performance ( == 5 MB/s extra IO per node)  ~3.8 SFr/GB ( 11 GB/s aggregate throughput)

Bernd Panzer-Steindel CERN/IT/ADC14 Tapes versus disks, Some ‘naive’ calculations (3) But need to consider : Reliability tape versus disks  double disk copies needed Can the software cope with this kind of large distribution of disk space ?? Influence from the persistency model, data storage Model, HSM system Tapes 2.6 – 4.1 SFr/GB  Disks 3.8 – 11 SFr/GB

Bernd Panzer-Steindel CERN/IT/ADC15 When is the correct time to move to IA64 ? Is SAN really an alternative solution ? What is the Analysis model ? Blade systems are interesting, but still very expensive (x4) Full LHC computing is 6 years away  Paradigm changes ?? (PDA, set-top boxes, ‘Xbox’, eLiza,….) Other issues