Fabric Management for CERN Experiments Past, Present, and Future Tim Smith CERN/IT.

Slides:



Advertisements
Similar presentations
31/03/00 CMS(UK)Glenn Patrick What is the CMS(UK) Data Model? Assume that CMS software is available at every UK institute connected by some infrastructure.
Advertisements

Beowulf Supercomputer System Lee, Jung won CS843.
10-Feb-00 CERN Building a Regional Centre A few ideas & a personal view CHEP 2000 – Padova 10 February 2000 Les Robertson CERN/IT.
DataGrid is a project funded by the European Union 22 September 2003 – n° 1 EDG WP4 Fabric Management: Fabric Monitoring and Fault Tolerance
1 Software & Grid Middleware for Tier 2 Centers Rob Gardner Indiana University DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National.
23/04/2008VLVnT08, Toulon, FR, April 2008, M. Stavrianakou, NESTOR-NOA 1 First thoughts for KM3Net on-shore data storage and distribution Facilities VLV.
12. March 2003Bernd Panzer-Steindel, CERN/IT1 LCG Fabric status
NPACI: National Partnership for Advanced Computational Infrastructure August 17-21, 1998 NPACI Parallel Computing Institute 1 Cluster Archtectures and.
EU funding for DataGrid under contract IST is gratefully acknowledged GridPP Tier-1A Centre CCLRC provides the GRIDPP collaboration (funded.
Large scale data flow in local and GRID environment V.Kolosov, I.Korolko, S.Makarychev ITEP Moscow.
Copyright © 2010 Platform Computing Corporation. All Rights Reserved.1 The CERN Cloud Computing Project William Lu, Ph.D. Platform Computing.
Hall D Online Data Acquisition CEBAF provides us with a tremendous scientific opportunity for understanding one of the fundamental forces of nature. 75.
Performance and Exception Monitoring Project Tim Smith CERN/IT.
CERN - European Laboratory for Particle Physics HEP Computer Farms Frédéric Hemmer CERN Information Technology Division Physics Data processing Group.
08/06/00 LHCb(UK) Meeting Glenn Patrick LHCb(UK) Computing/Grid: RAL Perspective Glenn Patrick Central UK Computing (what.
CERN TERENA Lisbon The Grid Project Fabrizio Gagliardi CERN Information Technology Division May, 2000
Seaborg Cerise Wuthrich CMPS Seaborg  Manufactured by IBM  Distributed Memory Parallel Supercomputer  Based on IBM’s SP RS/6000 Architecture.
D0 SAM – status and needs Plagarized from: D0 Experiment SAM Project Fermilab Computing Division.
Grid Workload Management & Condor Massimo Sgaravatto INFN Padova.
23 Oct 2002HEPiX FNALJohn Gordon CLRC-RAL Site Report John Gordon CLRC eScience Centre.
DataGrid Applications Federico Carminati WP6 WorkShop December 11, 2000.
Finnish DataGrid meeting, CSC, Otaniemi, V. Karimäki (HIP) DataGrid meeting, CSC V. Karimäki (HIP) V. Karimäki (HIP) Otaniemi, 28 August, 2000.
BesIII Computing Environment Computer Centre, IHEP, Beijing. BESIII Computing Environment.
LHC Computing Review Recommendations John Harvey CERN/EP March 28 th, th LHCb Software Week.
Using Virtual Servers for the CERN Windows infrastructure Emmanuel Ormancey, Alberto Pace CERN, Information Technology Department.
April 26, CSE8380 Parallel and Distributed Processing Presentation Hong Yue Department of Computer Science & Engineering Southern Methodist University.
6/26/01High Throughput Linux Clustering at Fermilab--S. Timm 1 High Throughput Linux Clustering at Fermilab Steven C. Timm--Fermilab.
Introduction to dCache Zhenping (Jane) Liu ATLAS Computing Facility, Physics Department Brookhaven National Lab 09/12 – 09/13, 2005 USATLAS Tier-1 & Tier-2.
RAL Site Report John Gordon IT Department, CLRC/RAL HEPiX Meeting, JLAB, October 2000.
1 CMPE 511 HIGH PERFORMANCE COMPUTING CLUSTERS Dilek Demirel İşçi.
PARALLEL COMPUTING overview What is Parallel Computing? Traditionally, software has been written for serial computation: To be run on a single computer.
EGEE is a project funded by the European Union under contract IST HEP Use Cases for Grid Computing J. A. Templon Undecided (NIKHEF) Grid Tutorial,
DataGrid Fabric Management (WP4) Gridification of Large Farms, a very brief overview David Groep, NIKHEF
Tim 18/09/2015 2Tim Bell - Australian Bureau of Meteorology Visit.
CERN IT Department CH-1211 Genève 23 Switzerland Introduction to CERN Computing Services Bernd Panzer-Steindel, CERN/IT.
Sep 02 IPP Canada Remote Computing Plans Pekka K. Sinervo Department of Physics University of Toronto 4 Sep IPP Overview 2 Local Computing 3 Network.
Les Les Robertson LCG Project Leader High Energy Physics using a worldwide computing grid Torino December 2005.
Grid User Interface for ATLAS & LHCb A more recent UK mini production used input data stored on RAL’s tape server, the requirements in JDL and the IC Resource.
CERN IT Department CH-1211 Genève 23 Switzerland t Frédéric Hemmer IT Department Head - CERN 23 rd August 2010 Status of LHC Computing from.
From the Transatlantic Networking Workshop to the DAM Jamboree to the LHCOPN Meeting (Geneva-Amsterdam-Barcelona) David Foster CERN-IT.
IDE disk servers at CERN Helge Meinhard / CERN-IT CERN OpenLab workshop 17 March 2003.
Cluster Configuration Update Including LSF Status Thorsten Kleinwort for CERN IT/PDP-IS HEPiX I/2001 LAL Orsay Tuesday, December 08, 2015.
ATLAS WAN Requirements at BNL Slides Extracted From Presentation Given By Bruce G. Gibbard 13 December 2004.
Slide David Britton, University of Glasgow IET, Oct 09 1 Prof. David Britton GridPP Project leader University of Glasgow UK-T0 Meeting 21 st Oct 2015 GridPP.
HEPiX 2 nd Nov 2000 Alan Silverman Proposal to form a Large Cluster SIG Alan Silverman 2 nd Nov 2000 HEPiX – Jefferson Lab.
Partner Logo A Tier1 Centre at RAL and more John Gordon eScience Centre CLRC-RAL HEPiX/HEPNT - Catania 19th April 2002.
23.March 2004Bernd Panzer-Steindel, CERN/IT1 LCG Workshop Computing Fabric.
Technology Summary John Gordon. Talks University Multidisciplinary Scientific Computing: Experience and Plans - Alan Tackett (Vanderbilt University) PASTA.
RHIC/US ATLAS Tier 1 Computing Facility Site Report Christopher Hollowell Physics Department Brookhaven National Laboratory HEPiX Upton,
6 march Building the INFN Grid Proposal outline a.ghiselli,l.luminari,m.sgaravatto,c.vistoli INFN Grid meeting, milano.
David Foster LCG Project 12-March-02 Fabric Automation The Challenge of LHC Scale Fabrics LHC Computing Grid Workshop David Foster 12 th March 2002.
10-Jan-00 CERN Building a Regional Centre A few ideas & a personal view CHEP 2000 – Padova 10 January 2000 Les Robertson CERN/IT.
January 20, 2000K. Sliwa/ Tufts University DOE/NSF ATLAS Review 1 SIMULATION OF DAILY ACTIVITITIES AT REGIONAL CENTERS MONARC Collaboration Alexander Nazarenko.
Latest Improvements in the PROOF system Bleeding Edge Physics with Bleeding Edge Computing Fons Rademakers, Gerri Ganis, Jan Iwaszkiewicz CERN.
Latest Improvements in the PROOF system Bleeding Edge Physics with Bleeding Edge Computing Fons Rademakers, Gerri Ganis, Jan Iwaszkiewicz CERN.
The Worldwide LHC Computing Grid Frédéric Hemmer IT Department Head Visit of INTEL ISEF CERN Special Award Winners 2012 Thursday, 21 st June 2012.
CERN IT Department CH-1211 Genève 23 Switzerland t CERN Agile Infrastructure Monitoring Pedro Andrade CERN – IT/GT HEPiX Spring 2012.
Meeting with University of Malta| CERN, May 18, 2015 | Predrag Buncic ALICE Computing in Run 2+ P. Buncic 1.
Dominique Boutigny December 12, 2006 CC-IN2P3 a Tier-1 for W-LCG 1 st Chinese – French Workshop on LHC Physics and associated Grid Computing IHEP - Beijing.
26. Juni 2003Bernd Panzer-Steindel, CERN/IT1 LHC Computing re-costing for for the CERN T0/T1 center.
10-Feb-00 CERN HepCCC Grid Initiative ATLAS meeting – 16 February 2000 Les Robertson CERN/IT.
Bernd Panzer-Steindel CERN/IT/ADC1 Medium Term Issues for the Data Challenges.
Hall D Computing Facilities Ian Bird 16 March 2001.
ATLAS – statements of interest (1) A degree of hierarchy between the different computing facilities, with distinct roles at each level –Event filter Online.
Clouds , Grids and Clusters
The LHC Computing Challenge
UK GridPP Tier-1/A Centre at CLRC
LHC Computing re-costing for
LHCb thinking on Regional Centres and Related activities (GRIDs)
The LHCb Computing Data Challenge DC06
Presentation transcript:

Fabric Management for CERN Experiments Past, Present, and Future Tim Smith CERN/IT

2000/11/03Tim Smith: JLab2 Contents  The Fabric of CERN today  The new challenges of LHC computing  What has this got to do with the GRID  Fabric Management solutions of tomorrow?  The DataGRID Project

2000/11/03Tim Smith: JLab3 Fabric Elements  Functionalities  Batch and Interactive  Disk servers  Tape Servers + devices  Stage servers  Home directory servers  Application servers  Backup service  Infrastructure  Job Scheduler  Authentication  Authorisation  Monitoring  Alarms  Console managers  Networks

2000/11/03Tim Smith: JLab4 Fabric Technology at CERN Mainframes IBM Cray RISC Workstations Scalable Systems SP2 CS2 RISC Workstations PC Farms Multiplicity Scale Year SMPs SGI,DEC,HP,SUN

2000/11/03Tim Smith: JLab5 Architecture Considerations  Physics applications have ideal data parallelism  mass of independent problems  No message passing  throughput rather than performance  resilience rather than ultimate reliability  Can build hierarchies of mass market components  High Throughput Computing

2000/11/03Tim Smith: JLab6 Component Architecture 100/1000baseT switch CPU High capacity backbone switch 1000baseT switch Tape Server Disk Server Application Server

2000/11/03Tim Smith: JLab7 Analysis Chain: Farms batch physics analysis batch physics analysis detector event summary data raw data event reconstruction event reconstruction event simulation event simulation interactive physics analysis analysis objects (extracted by physics topic) event filter (selection & reconstruction) event filter (selection & reconstruction) processed data

2000/11/03Tim Smith: JLab8 Multiplication ! Jul-97Jan-98Jul-98Jan-99Jul-99Jan-00 #CPUs tomog tapes pcsf nomad na49 na48 na45 mta lxbatch lxplus lhcb l3c ion eff cms ccf atlas alice

2000/11/03Tim Smith: JLab9 PC Farms

2000/11/03Tim Smith: JLab10 Shared Facilities

2000/11/03Tim Smith: JLab11 LHC Computing Challenge  The scale will be different  CPU10k SI951M SI95  Disk30TB3PB  Tape600TB9PB  The model will be different  There are compelling reasons why some of the farms and some of the capacity will not be located at CERN

2000/11/03Tim Smith: JLab12 Non-LHC Moore’s Law LHC Estimated disk storage capacity at CERN ~10K SI processors Non- LHC LHC Estimated CPU capacity at CERN Bad News: IO 1TB – 2500MB/s 20 MB/s 1TB – 400 MB/s Bad News: Tapes < factor 2 reduction in 8 years Significant fraction of cost

2000/11/03Tim Smith: JLab13 Regional Centres: a Multi-Tier Model Department    Desktop CERN – Tier 0 MONARC Tier 1 FNAL RAL IN2P3 622 Mbps 2.5 Gbps 622 Mbps 155 mbps Tier2 Lab a Uni b Lab c Uni n

2000/11/03Tim Smith: JLab14 More realistically: a Grid Topology CERN – Tier 0 Tier 1 FNAL RAL IN2P3 622 Mbps 2.5 Gbps 622 Mbps 155 mbps Tier2 Lab a Uni b Lab c Uni n Department    Desktop DataGRID

2000/11/03Tim Smith: JLab15 Can we build LHC farms?  Positive predictions  CPU and disk price/performance trends suggest that the raw processing and disk storage capacities will be affordable, and  raw data rates and volumes look manageable  perhaps not today for ALICE  Space, power and cooling issues?  So probably yes… but can we manage them?  Understand costs - 1 PC is cheap, but managing is not!  Building and managing coherent systems from such large numbers of boxes will be a challenge. 1999: 45MB/s for NA48! 2000: 90MB/s for Alice!

2000/11/03Tim Smith: JLab16 Management Tasks I  Supporting adaptability  Configuration Management  Machine / Service hierarchy  Automated registration / insertion / removal  Dynamic reassignment  Automatic Software Installation and Management (OS and applications)  Version management  Application dependencies  Controlled (re)deployment

2000/11/03Tim Smith: JLab17 Management Tasks II  Controlling Quality of Service  System Monitoring  Orientation to the service NOT the machine  Uniform access to diverse fabric elements  Integrated with configuration (change) management  Problem Management  Identification of root causes (faults + performance)  Correlate network / system / application data  Highly automated  Adaptive - Integrated with configuration management

2000/11/03Tim Smith: JLab18 Relevance to the GRID ?  Scalable solutions needed in absence of GRID !  For the GRID to work it must be presented with information and opportunities  Coordinated and efficiently run centres  Presentable as a guaranteed quality resource  ‘GRID’ification : the interfaces

2000/11/03Tim Smith: JLab19 Mgmt Tasks: A GRID centre  GRID enable  Support external requests: services  Publication  Coordinated + ‘map’able  Security: Authentication / Authorisation  Policies: Allocation / Priorities / Estimation / Cost  Scheduling  Reservation  Change Management  Guarantees  Resource availability / QoS

2000/11/03Tim Smith: JLab20 Existing Solutions ?  The world outside is moving fast !!  Dissimilar problems  Virtual super computers (~200 nodes)  MPI, latency, interconnect topology and bandwith  Roadrunner, LosLobos, Cplant, Beowulf  Similar problems  ISPs / ASPs (~200 nodes)  Clustering: high availability / mission critical  The DataGRID : Fabric Management WP4

2000/11/03Tim Smith: JLab21 WP4 Partners  CERN (CH)Tim Smith  ZIB (D)Alexander Reinefeld  KIP (D)Volker Lindenstruth  NIKHEF (NL)Kors Bos  INFN (I)Michele Michelotto  RAL (UK)Andrew Sansum  IN2P3 (Fr)Denis Linglin

2000/11/03Tim Smith: JLab22 Concluding Remarks  Years of experience in exploiting inexpensive mass market components  But we need to marry these with inexpensive highly scalable management tools  Build components back together as a resource for the GRID