19. November 2007Bernd Panzer-Steindel, CERN/IT1 CERN Computing Fabric Status LHCC Review, 19 th November 2007.

Slides:



Advertisements
Similar presentations
Bernd Panzer-Steindel, CERN/IT WAN RAW/ESD Data Distribution for LHC.
Advertisements

Status GridKa & ALICE T2 in Germany Kilian Schwarz GSI Darmstadt.
LHC experimental data: From today’s Data Challenges to the promise of tomorrow B. Panzer – CERN/IT, F. Rademakers – CERN/EP, P. Vande Vyvre - CERN/EP Academic.
6/2/2015Bernd Panzer-Steindel, CERN, IT1 Computing Fabric (CERN), Status and Plans.
12. March 2003Bernd Panzer-Steindel, CERN/IT1 LCG Fabric status
Copyright 2009 FUJITSU TECHNOLOGY SOLUTIONS PRIMERGY Servers and Windows Server® 2008 R2 Benefit from an efficient, high performance and flexible platform.
Virtual Network Servers. What is a Server? 1. A software application that provides a specific one or more services to other computers  Example: Apache.
1 INDIACMS-TIFR TIER-2 Grid Status Report IndiaCMS Meeting, Sep 27-28, 2007 Delhi University, India.
Large scale data flow in local and GRID environment V.Kolosov, I.Korolko, S.Makarychev ITEP Moscow.
Stefano Belforte INFN Trieste 1 CMS SC4 etc. July 5, 2006 CMS Service Challenge 4 and beyond.
Ian Fisk and Maria Girone Improvements in the CMS Computing System from Run2 CHEP 2015 Ian Fisk and Maria Girone For CMS Collaboration.
Status of WLCG Tier-0 Maite Barroso, CERN-IT With input from T0 service managers Grid Deployment Board 9 April Apr-2014 Maite Barroso Lopez (at)
 Design model for a computer  Named after John von Neuman  Instructions that tell the computer what to do are stored in memory  Stored program Memory.
US ATLAS Western Tier 2 Status and Plan Wei Yang ATLAS Physics Analysis Retreat SLAC March 5, 2007.
Computing for ILC experiment Computing Research Center, KEK Hiroyuki Matsunaga.
9/16/2000Ian Bird/JLAB1 Planning for JLAB Computational Resources Ian Bird.
Computing Infrastructure Status. LHCb Computing Status LHCb LHCC mini-review, February The LHCb Computing Model: a reminder m Simulation is using.
3. April 2006Bernd Panzer-Steindel, CERN/IT1 HEPIX 2006 CPU technology session some ‘random walk’
Storage Systems Market Analysis Dec 04. Storage Market & Technologies.
1 Computer and Network Bottlenecks Author: Rodger Burgess 27th October 2008 © Copyright reserved.
INDIACMS-TIFR Tier 2 Grid Status Report I IndiaCMS Meeting, April 05-06, 2007.
Using Virtual Servers for the CERN Windows infrastructure Emmanuel Ormancey, Alberto Pace CERN, Information Technology Department.
14 Aug 08DOE Review John Huth ATLAS Computing at Harvard John Huth.
JLab Scientific Computing: Theory HPC & Experimental Physics Thomas Jefferson National Accelerator Facility Newport News, VA Sandy Philpott.
RAL Site Report Andrew Sansum e-Science Centre, CCLRC-RAL HEPiX May 2004.
RAL Site Report John Gordon IT Department, CLRC/RAL HEPiX Meeting, JLAB, October 2000.
10/22/2002Bernd Panzer-Steindel, CERN/IT1 Data Challenges and Fabric Architecture.
JLAB Computing Facilities Development Ian Bird Jefferson Lab 2 November 2001.
Next Generation Operating Systems Zeljko Susnjar, Cisco CTG June 2015.
CERN IT Department CH-1211 Genève 23 Switzerland Introduction to CERN Computing Services Bernd Panzer-Steindel, CERN/IT.
The Evolution Of Personal Home Computers 1980-Current TRS 80 Model 4P By Joshua Brutzkus.
CERN IT Department CH-1211 Genève 23 Switzerland t Frédéric Hemmer IT Department Head - CERN 23 rd August 2010 Status of LHC Computing from.
ATLAS Tier 1 at BNL Overview Bruce G. Gibbard Grid Deployment Board BNL 5-6 September 2006.
Storage issues for end-user analysis Bernd Panzer-Steindel, CERN/IT 08 July
SLACFederated Storage Workshop Summary For pre-GDB (Data Access) Meeting 5/13/14 Andrew Hanushevsky SLAC National Accelerator Laboratory.
 The End to the Means › (According to IBM ) › 03.ibm.com/innovation/us/thesmartercity/in dex_flash.html?cmp=blank&cm=v&csr=chap ter_edu&cr=youtube&ct=usbrv111&cn=agus.
CERN Computer Centre Tier SC4 Planning FZK October 20 th 2005 CERN.ch.
Status of the Bologna Computing Farm and GRID related activities Vincenzo M. Vagnoni Thursday, 7 March 2002.
23.March 2004Bernd Panzer-Steindel, CERN/IT1 LCG Workshop Computing Fabric.
RHIC/US ATLAS Tier 1 Computing Facility Site Report Christopher Hollowell Physics Department Brookhaven National Laboratory HEPiX Upton,
Data Transfer Service Challenge Infrastructure Ian Bird GDB 12 th January 2005.
Ian Bird Overview Board; CERN, 8 th March 2013 March 6, 2013
01. December 2004Bernd Panzer-Steindel, CERN/IT1 Tape Storage Issues Bernd Panzer-Steindel LCG Fabric Area Manager CERN/IT.
Latest Improvements in the PROOF system Bleeding Edge Physics with Bleeding Edge Computing Fons Rademakers, Gerri Ganis, Jan Iwaszkiewicz CERN.
CERN IT Department CH-1211 Genève 23 Switzerland t The Tape Service at CERN Vladimír Bahyl IT-FIO-TSI June 2009.
Markus Frank (CERN) & Albert Puig (UB).  An opportunity (Motivation)  Adopted approach  Implementation specifics  Status  Conclusions 2.
Pathway to Petaflops A vendor contribution Philippe Trautmann Business Development Manager HPC & Grid Global Education, Government & Healthcare.
CERN - IT Department CH-1211 Genève 23 Switzerland t Power and Cooling Challenges at CERN IHEPCCC Meeting April 24 th 2007 Tony Cass.
DAQ & ConfDB Configuration DB workshop CERN September 21 st, 2005 Artur Barczyk & Niko Neufeld.
Meeting with University of Malta| CERN, May 18, 2015 | Predrag Buncic ALICE Computing in Run 2+ P. Buncic 1.
Virtual Server Server Self Service Center (S3C) JI July.
Status of GSDC, KISTI Sang-Un Ahn, for the GSDC Tier-1 Team
Dominique Boutigny December 12, 2006 CC-IN2P3 a Tier-1 for W-LCG 1 st Chinese – French Workshop on LHC Physics and associated Grid Computing IHEP - Beijing.
IHEP Computing Center Site Report Gang Chen Computing Center Institute of High Energy Physics 2011 Spring Meeting.
26. Juni 2003Bernd Panzer-Steindel, CERN/IT1 LHC Computing re-costing for for the CERN T0/T1 center.
GDB Meeting 12. January Bernd Panzer-Steindel, CERN/IT 1 Mass Storage at CERN GDB meeting, 12. January 2005.
Bernd Panzer-Steindel CERN/IT/ADC1 Medium Term Issues for the Data Challenges.
15.June 2004Bernd Panzer-Steindel, CERN/IT1 CERN Mass Storage Issues.
Processing Device and Storage Devices
LCG Service Challenge: Planning and Milestones
Bernd Panzer-Steindel, CERN/IT
Update on Plan for KISTI-GSDC
Luca dell’Agnello INFN-CNAF
Bernd Panzer-Steindel, CERN/IT
Olof Bärring LCG-LHCC Review, 22nd September 2008
Dagmar Adamova (NPI AS CR Prague/Rez) and Maarten Litmaath (CERN)
LHC Computing re-costing for
Ákos Frohner EGEE'08 September 2008
Bernd Panzer-Steindel CERN/IT
Presentation transcript:

19. November 2007Bernd Panzer-Steindel, CERN/IT1 CERN Computing Fabric Status LHCC Review, 19 th November 2007

19. November 2007Bernd Panzer-Steindel, CERN/IT2 Coarse grain functional differences in the CERN computing fabric 1.T0  Central Data Recording, first pass processing, tape migration, data export to the Tier 1 sites 2.CAF  selected data copies from the T0 (near real-time), calibration and alignment, analysis, T1/T2/T3 functions means something different for each experiment Both contain the same hardware, distinction is done via logical configurations in the Batch system and the Storage system  CPU nodes for processing ( ~65% of the total capacity for the T0)  Disk server for storage ( ~40% of the total capacity for the T0)  Tape libraries, tape drives and tape server  Service nodes IngredientsIngredients

19. November 2007Bernd Panzer-Steindel, CERN/IT3 ~linear growth rates ! underestimates ? experience/past shows exponential growth rates ~linear growth rates ! underestimates ? experience/past shows exponential growth rates Growth rates based on the latest experiment requirements at CERN Growth rates based on the latest experiment requirements at CERN

19. November 2007Bernd Panzer-Steindel, CERN/IT4  Tendering preparations for the 2008 purchases started in May 2007  Deliveries of equipment have started, first ~100 nodes have arrived and are being installed  More deliveries spread over the next 4 month  heavy logistic operations ongoing : preparation for the installation of ~2300 new nodes racks, rack preparations, power+console+network cabling, shipment and unpacking, installation (physical and logical), quality control and burn-in tests preparations to ‘retire’ ~1000 old nodes during the next few month Preparations for 2008 I

19. November 2007Bernd Panzer-Steindel, CERN/IT5 Resource increase in more than doubling the amount of CPU resources  ~ 1200 CPU nodes (~4 MCHF) 2. increasing the disk space by a factor 4  ~ 700 Disk servers ( ~6 MCHF) The experiment requirements for disk space this year were underestimated. We had to increase disk space by up to 50% during the various data challenges and productions. 3. increase and consolidation of redundant and stable services  ~ 350 service nodes (~3 MCHF) Grid services (CE, RB, UI, etc.), Castor, Castor data bases, condition data bases, VO-boxes, experiment specific services (bookkeeping, production steering, monitoring, etc.), build server don’t underestimate the service investments ! Preparations for 2008 II

19. November 2007Bernd Panzer-Steindel, CERN/IT6 Power and Cooling  Current computer center has a capacity of 2.5 MW for powering the nodes and 2.5 MW cooling capacity. A battery based UPS system allows for 10 min autonomy for the 2.5 MW.  Power for critical nodes is limited to about 340 KW (capacity backed-up by the CERN diesel generators). no free capacity left (DB systems, network, AFS, Web, Mail, etc.) We will reach ~2 MW already in March 2008 and will not be able to host the full required capacity in 2010 Started activities already more than a year ago, slow progress now active discussion between IT PH and TS Identify building in Prevessin (Meyrin does not have enough power available) and start preparations for infrastructure upgrade Budget is already foreseen

19. November 2007Bernd Panzer-Steindel, CERN/IT7  based on the latest round of requirement gathering from the experiments during the late summer period.  includes provisioning money for a new computer center  presented to the CCRB on the 23 rd of October [MCHF] Material budget Balance CPU server, Disk storage, Tape storage and infrastructure, service nodes, Ortacle Data Base infrastructure, LAN and WAN network, testbeds, new CC costs spread over 10 years small deficit but within the error of the cost predictions Material Budget

19. November 2007Bernd Panzer-Steindel, CERN/IT8 Processors I cost of a full server node cost of a separate single processor

19. November 2007Bernd Panzer-Steindel, CERN/IT9  less than 50% of a node costs are the processors plus memory  2007 was special year, heavy price war between INTEL and AMD, INTEL pushing their quad-cores (even competing with their own products)  new trend  dual motherboard per 1u unit, very good power supply power efficiencies, as good as for blades  our purchases will consist out of these nodes with a possibility of getting also blades Processors II

19. November 2007Bernd Panzer-Steindel, CERN/IT10 Technology trends :  aim to have a two year cycle now architecture improvements and structure reduction (45nm products already announced by INTEL)  multi-core ………… BUT what to do with the expected billion transistors and multi-cores ? the market is not clear  wide spread activities of INTEL e.g.\ -- initiatives to get multithreading into the software, quite some time away, complicated especially debugging (we have a hard time to get our ‘simple’ programs to work) -- co-processors (audio, video, etc.) -- merge of CPU and GPU (graphics), AMD + ATI  combined processors, NVIDIA  use GPU as processor, INTEL  move graphics to the cores -- on the fly re-programmable cores (FPGA like) not clear where we are going specialized hardware in the consumer area  change of price structure for us Processors III

19. November 2007Bernd Panzer-Steindel, CERN/IT11 Memory I

19. November 2007Bernd Panzer-Steindel, CERN/IT12  still monthly fluctuations in costs, up and down  large variety of memory modules frequency and latency 533 and 667 MHz about 10% cost difference, factor 2 for 1 Ghz higher frequency goes along with higher latency CAS  how does HEP code depend on memory speed ?  DDR3 upcoming, more expensive in the beginning  is 2 GB per core really enough ? Memory II

19. November 2007Bernd Panzer-Steindel, CERN/IT13 Disk storage I cost of a full disk server node cost of a separate single disk

19. November 2007Bernd Panzer-Steindel, CERN/IT14 Trends  cost evolution of single disks is still good ( ~ factor 2 per year, model dependent)  lots of infrastructure needed upgrade of CPU and memory  footprint of applications : RFIO, Gridftp, buffers, new functions, checksums, RAID5 consistency checks, data integrity probes  need disk space AND spindles  use smaller disks or buy more  increase overall costs  solid-state-disks, much more expensive (factor ~50)  data base area  hybrid disks good for VISTA (at least in the future, does not work yet…) but higher price e.g. new Seagate disks MB flash == + 25% costs general trend for notebooks can’t profit in our environment  seldom cache reuse Disk storage II

19. November 2007Bernd Panzer-Steindel, CERN/IT15 The physical network topology (connections of nodes to switches and routers) is defined by space, electricity, cooling and cabling constraints Network router Service Nodes Disk Server CPU Server Internal Network I

19. November 2007Bernd Panzer-Steindel, CERN/IT16 Changing access patterns, high aggregate IO on the disk servers CPU server Disk server Internal Network II 3000 nodes running concurrent physics applications are trying to access 1000 disk servers with disks Logical network topology

19. November 2007Bernd Panzer-Steindel, CERN/IT17 Need to upgrade the internal network infrastructure : decrease the blocking factor on the switches = spread the existing servers over more switches Changes since the 2005 LCG computing TDR :  disk space increased by 30 %  concurrent running applications increased by a factor 4 (multi-core technology evolution)  computing model evolution, more high IO applications (calibration and alignment, analysis)  doubling the number of connections (switches) to the network Core routers which as a consequences requires also to double the number of routers  additional investment of 3 MCHF in 2008 (already approved by finance committee) Internal Network III

19. November 2007Bernd Panzer-Steindel, CERN/IT18 Batch System solved with the upgrade to LSF 7 and hardware upgrade of LSF control nodes Much improved response time, removed throttling bottlenecks average about jobs/day peak value jobs/day up to in the queue at any time tested with jobs some scalability and stability problems in spring and early summer

19. November 2007Bernd Panzer-Steindel, CERN/IT19 Tape Storage Today we have :  10 PB of data on tape, 75 million files  5 Silos with ~30000 tapes, ~5 PB free space  120 tape drives (STK and IBM)  during the last month we have 3PB written to tape and 2.4 PB read from tape small files and spread of data sets over too many tapes caused very high mount load in the silos  space increase to 8 free PB in the next 3-4 month  more drives to cope with high recall rates and small files  need Castor improvements

19. November 2007Bernd Panzer-Steindel, CERN/IT20 CASTORCASTOR much improved stability and performance during the summer period (Castor Task Force) CMS CSA07 ATLAS export tests and M5 run regular running at ‘nominal’ speed (with 100% beam efficiency assumed) very high load on the disk servers small scale problems observed  identified and fixed (Castor+Experiment) complex patterns and large number of IO streams require more disk space for the T0 (probably factor 2) successful coupling of DAQ and T0 for LHCb, ALICE and ATLAS (not yet 100% nominal speed) CMS is planned for the beginning of next year

19. November 2007Bernd Panzer-Steindel, CERN/IT21 Data Export ATLAS successfully demonstrated for several days their nominal data export speed (~1030 MB/s) all in parallel to the CMS CSA07 exercise  no Castor issues, no internal network issues

19. November 2007Bernd Panzer-Steindel, CERN/IT22 Data Management  Enhance Castor disk pool definitions activity in close collaboration with the experiments, new Castor functionalities are now available (access control)  avoid disk server overload, better tape recall efficiencies  Small files creating problems for the experiment bookkeeping systems and the HSM tape system  need Castor improvements in the tape area (some amount of small files will be unavoidable)  Experiments are investing into file merging procedures creates more IO streams and activity, needs more disk space  Data integrity the deployment and organization of data checksums needs more work will create more IO and bookkeeping  CPU and data flow efficiency To increase the efficiencies one has to integrate the 4 large functional units much closer (information exchange). Experiment DM system Batch system Disk storage Tape storage

19. November 2007Bernd Panzer-Steindel, CERN/IT23 SummarySummary  Large scale logistic operation ongoing for the 2008 resource upgrades  Very good Castor performance and stability improvements  Large scale network (LAN) upgrade has started  Successful stress tests and productions from the experiments (T0 and partly CAF)  Power and cooling growth rate requires a new computer center, planning started