Download presentation
Presentation is loading. Please wait.
Published byCameron Price Modified over 9 years ago
1
Fabric Management for CERN Experiments Past, Present, and Future Tim Smith CERN/IT
2
2000/11/03Tim Smith: HEPiX @ JLab2 Contents The Fabric of CERN today The new challenges of LHC computing What has this got to do with the GRID Fabric Management solutions of tomorrow? The DataGRID Project
3
2000/11/03Tim Smith: HEPiX @ JLab3 Fabric Elements Functionalities Batch and Interactive Disk servers Tape Servers + devices Stage servers Home directory servers Application servers Backup service Infrastructure Job Scheduler Authentication Authorisation Monitoring Alarms Console managers Networks
4
2000/11/03Tim Smith: HEPiX @ JLab4 Fabric Technology at CERN 899091929394959697989900010303020204040505 1 100 10 1000 10000 Mainframes IBM Cray RISC Workstations Scalable Systems SP2 CS2 RISC Workstations PC Farms Multiplicity Scale Year SMPs SGI,DEC,HP,SUN
5
2000/11/03Tim Smith: HEPiX @ JLab5 Architecture Considerations Physics applications have ideal data parallelism mass of independent problems No message passing throughput rather than performance resilience rather than ultimate reliability Can build hierarchies of mass market components High Throughput Computing
6
2000/11/03Tim Smith: HEPiX @ JLab6 Component Architecture 100/1000baseT switch CPU High capacity backbone switch 1000baseT switch Tape Server Disk Server Application Server
7
2000/11/03Tim Smith: HEPiX @ JLab7 Analysis Chain: Farms batch physics analysis batch physics analysis detector event summary data raw data event reconstruction event reconstruction event simulation event simulation interactive physics analysis analysis objects (extracted by physics topic) event filter (selection & reconstruction) event filter (selection & reconstruction) processed data
8
2000/11/03Tim Smith: HEPiX @ JLab8 Multiplication ! 0 200 400 600 800 1000 1200 Jul-97Jan-98Jul-98Jan-99Jul-99Jan-00 #CPUs tomog tapes pcsf nomad na49 na48 na45 mta lxbatch lxplus lhcb l3c ion eff cms ccf atlas alice
9
2000/11/03Tim Smith: HEPiX @ JLab9 PC Farms
10
2000/11/03Tim Smith: HEPiX @ JLab10 Shared Facilities
11
2000/11/03Tim Smith: HEPiX @ JLab11 LHC Computing Challenge The scale will be different CPU10k SI951M SI95 Disk30TB3PB Tape600TB9PB The model will be different There are compelling reasons why some of the farms and some of the capacity will not be located at CERN
12
2000/11/03Tim Smith: HEPiX @ JLab12 Non-LHC Moore’s Law LHC Estimated disk storage capacity at CERN ~10K SI95 1200 processors Non- LHC LHC Estimated CPU capacity at CERN Bad News: IO 1996:4G @10MB/s 1TB – 2500MB/s 2000:50G @ 20 MB/s 1TB – 400 MB/s Bad News: Tapes < factor 2 reduction in 8 years Significant fraction of cost
13
2000/11/03Tim Smith: HEPiX @ JLab13 Regional Centres: a Multi-Tier Model Department Desktop CERN – Tier 0 MONARC http://cern.ch/MONARC Tier 1 FNAL RAL IN2P3 622 Mbps 2.5 Gbps 622 Mbps 155 mbps Tier2 Lab a Uni b Lab c Uni n
14
2000/11/03Tim Smith: HEPiX @ JLab14 More realistically: a Grid Topology CERN – Tier 0 Tier 1 FNAL RAL IN2P3 622 Mbps 2.5 Gbps 622 Mbps 155 mbps Tier2 Lab a Uni b Lab c Uni n Department Desktop DataGRID http://cern.ch/grid
15
2000/11/03Tim Smith: HEPiX @ JLab15 Can we build LHC farms? Positive predictions CPU and disk price/performance trends suggest that the raw processing and disk storage capacities will be affordable, and raw data rates and volumes look manageable perhaps not today for ALICE Space, power and cooling issues? So probably yes… but can we manage them? Understand costs - 1 PC is cheap, but managing 10000 is not! Building and managing coherent systems from such large numbers of boxes will be a challenge. 1999: CDR @ 45MB/s for NA48! 2000: CDR @ 90MB/s for Alice!
16
2000/11/03Tim Smith: HEPiX @ JLab16 Management Tasks I Supporting adaptability Configuration Management Machine / Service hierarchy Automated registration / insertion / removal Dynamic reassignment Automatic Software Installation and Management (OS and applications) Version management Application dependencies Controlled (re)deployment
17
2000/11/03Tim Smith: HEPiX @ JLab17 Management Tasks II Controlling Quality of Service System Monitoring Orientation to the service NOT the machine Uniform access to diverse fabric elements Integrated with configuration (change) management Problem Management Identification of root causes (faults + performance) Correlate network / system / application data Highly automated Adaptive - Integrated with configuration management
18
2000/11/03Tim Smith: HEPiX @ JLab18 Relevance to the GRID ? Scalable solutions needed in absence of GRID ! For the GRID to work it must be presented with information and opportunities Coordinated and efficiently run centres Presentable as a guaranteed quality resource ‘GRID’ification : the interfaces
19
2000/11/03Tim Smith: HEPiX @ JLab19 Mgmt Tasks: A GRID centre GRID enable Support external requests: services Publication Coordinated + ‘map’able Security: Authentication / Authorisation Policies: Allocation / Priorities / Estimation / Cost Scheduling Reservation Change Management Guarantees Resource availability / QoS
20
2000/11/03Tim Smith: HEPiX @ JLab20 Existing Solutions ? The world outside is moving fast !! Dissimilar problems Virtual super computers (~200 nodes) MPI, latency, interconnect topology and bandwith Roadrunner, LosLobos, Cplant, Beowulf Similar problems ISPs / ASPs (~200 nodes) Clustering: high availability / mission critical The DataGRID : Fabric Management WP4
21
2000/11/03Tim Smith: HEPiX @ JLab21 WP4 Partners CERN (CH)Tim Smith ZIB (D)Alexander Reinefeld KIP (D)Volker Lindenstruth NIKHEF (NL)Kors Bos INFN (I)Michele Michelotto RAL (UK)Andrew Sansum IN2P3 (Fr)Denis Linglin
22
2000/11/03Tim Smith: HEPiX @ JLab22 Concluding Remarks Years of experience in exploiting inexpensive mass market components But we need to marry these with inexpensive highly scalable management tools Build components back together as a resource for the GRID
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.