1 LHCC RRB SG 16 Sep. 2004 P. Vande Vyvre CERN-PH On-line Computing M&O LHCC RRB SG 16 Sep 2004 P. Vande Vyvre CERN/PH for 4 LHC DAQ project leaders.

Slides:



Advertisements
Similar presentations
Top 10 Ways Of Reducing Your Data Center Infrastructure Operating Costs.
Advertisements

CHEP 2012 – New York City 1.  LHC Delivers bunch crossing at 40MHz  LHCb reduces the rate with a two level trigger system: ◦ First Level (L0) – Hardware.
12. March 2003Bernd Panzer-Steindel, CERN/IT1 LCG Fabric status
Lecture 14 Case Study Postgirot Bank and Provment AB.
Modified from Sommerville’s originalsSoftware Engineering, 7th edition. Chapter 21 Slide 1 Software evolution.
Figure 1.1 Interaction between applications and the operating system.
Software evolution.
Trigger and online software Simon George & Reiner Hauser T/DAQ Phase 1 IDR.
Chapter 9 – Software Evolution and Maintenance
CERN IT Department CH-1211 Genève 23 Switzerland t Some Hints for “Best Practice” Regarding VO Boxes Running Critical Services and Real Use-cases.
Large scale data flow in local and GRID environment V.Kolosov, I.Korolko, S.Makarychev ITEP Moscow.
Lecture # 22 Software Evolution
October 24, 2000Milestones, Funding of USCMS S&C Matthias Kasemann1 US CMS Software and Computing Milestones and Funding Profiles Matthias Kasemann Fermilab.
Bob Thome, Senior Director of Product Management, Oracle SIMPLIFYING YOUR HIGH AVAILABILITY DATABASE.
Ian Fisk and Maria Girone Improvements in the CMS Computing System from Run2 CHEP 2015 Ian Fisk and Maria Girone For CMS Collaboration.
Computers & Employment By Andrew Attard and Stephen Calleja.
Term 2, 2011 Week 3. CONTENTS The physical design of a network Network diagrams People who develop and support networks Developing a network Supporting.
C. Aiftimiei- December 2003 ALICE NIPNE-HH Cristina Aiftimiei National Institute for Physics and Nuclear Engineering - Horia.
Software evolution. Objectives l To explain why change is inevitable if software systems are to remain useful l To discuss software maintenance and maintenance.
Introduction and Overview Questions answered in this lecture: What is an operating system? How have operating systems evolved? Why study operating systems?
Software Engineering Lecture 20 Software Maintenance.
R. Divià, U. Fuchs, P. Vande Vyvre – CERN/PH 13 June 2012.
Finnish DataGrid meeting, CSC, Otaniemi, V. Karimäki (HIP) DataGrid meeting, CSC V. Karimäki (HIP) V. Karimäki (HIP) Otaniemi, 28 August, 2000.
LHC Computing Review Recommendations John Harvey CERN/EP March 28 th, th LHCb Software Week.
Using Virtual Servers for the CERN Windows infrastructure Emmanuel Ormancey, Alberto Pace CERN, Information Technology Department.
6/26/01High Throughput Linux Clustering at Fermilab--S. Timm 1 High Throughput Linux Clustering at Fermilab Steven C. Timm--Fermilab.
CE Operating Systems Lecture 3 Overview of OS functions and structure.
Roberto Divià, CERN/ALICE 1 CHEP 2009, Prague, March 2009 The ALICE Online Data Storage System Roberto Divià (CERN), Ulrich Fuchs (CERN), Irina Makhlyueva.
F. Rademakers - CERN/EPLinux Certification - FOCUS Linux Certification Fons Rademakers.
CERN Physics Database Services and Plans Maria Girone, CERN-IT
A.Golunov, “Remote operational center for CMS in JINR ”, XXIII International Symposium on Nuclear Electronics and Computing, BULGARIA, VARNA, September,
4/5/20071 The LAW (Linux Applications on Windows) Project Sudhamsh Reddy University of Texas at Arlington.
©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 21 Slide 1 Software evolution 1.
Software Evolution Program evolution dynamics Software maintenance Complexity and Process metrics Evolution processes 1.
 Load balancing is the process of distributing a workload evenly throughout a group or cluster of computers to maximize throughput.  This means that.
CERN IT Department CH-1211 Genève 23 Switzerland t Frédéric Hemmer IT Department Head - CERN 23 rd August 2010 Status of LHC Computing from.
Unit 22 People in Computing
LHCbComputing Manpower requirements. Disclaimer m In the absence of a manpower planning officer, all FTE figures in the following slides are approximate.
U.S. ATLAS Executive Committee August 3, 2005 U.S. ATLAS TDAQ FY06 M&O Planning A.J. Lankford UC Irvine.
P. Vande Vyvre – CERN/PH CERN – January Research Theme 2: DAQ ARCHITECT – Jan 2011 P. Vande Vyvre – CERN/PH2 Current DAQ status: Large computing.
Oracle for Physics Services and Support Levels Maria Girone, IT-ADC 24 January 2005.
©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 21 Slide 1 Software evolution.
Chapter 9 – Software Evolution 1Chapter 9 Software evolution.
Reliability of KLOE Computing Paolo Santangelo for the KLOE Collaboration INFN LNF Commissione Scientifica Nazionale 1 Roma, 13 Ottobre 2003.
Tier 1 at Brookhaven (US / ATLAS) Bruce G. Gibbard LCG Workshop CERN March 2004.
RRB scrutiny Report: Muons. Internal scrutiny group for RRB report Internal scrutiny group composition – J. Shank, G. Mikenberg, F. Taylor, H. Kroha,
Management of the LHCb DAQ Network Guoming Liu *†, Niko Neufeld * * CERN, Switzerland † University of Ferrara, Italy.
AB/CO Review, Interlock team, 20 th September Interlock team – the AB/CO point of view M.Zerlauth, R.Harrison Powering Interlocks A common task.
Pierre VANDE VYVRE ALICE Online upgrade October 03, 2012 Offline Meeting, CERN.
©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 21 Slide 1 Software evolution.
LHCbComputing Personnel status Preparation of discussion at next CB.
Ian Bird Overview Board; CERN, 8 th March 2013 March 6, 2013
WLCG critical services update Andrea Sciabà WLCG operations coordination meeting December 18, 2014.
DPS/ CMS RRB-T Core Software for CMS David Stickland for CMS Oct 01, RRB l The Core-Software and Computing was not part of the detector MoU l.
Markus Frank (CERN) & Albert Puig (UB).  An opportunity (Motivation)  Adopted approach  Implementation specifics  Status  Conclusions 2.
Breaking the frontiers of the Grid R. Graciani EGI TF 2012.
Introduction to System Administration. System Administration  System Administration  Duties of System Administrator  Types of Administrators/Users.
26. Juni 2003Bernd Panzer-Steindel, CERN/IT1 LHC Computing re-costing for for the CERN T0/T1 center.
1 ALICE Summary LHCC Computing Manpower Review September 3, 2003.
Bernd Panzer-Steindel CERN/IT/ADC1 Medium Term Issues for the Data Challenges.
Linux Systems Administration 101 National Computer Institute Sep
CERN IT Department CH-1211 Geneva 23 Switzerland t Service Reliability & Critical Services January 15 th 2008.
«My future profession»
Completion and Pre-Exploitation Costs for the Initial ATLAS Detector
The “Understanding Performance!” team in CERN IT
Database Services at CERN Status Update
4th FRC meeting, Antti Onnela, CERN-PH-DT
Introduction to Operating Systems
Chapter 9 – Software Evolution and Maintenance
Chapter 8 Software Evolution.
Presentation transcript:

1 LHCC RRB SG 16 Sep P. Vande Vyvre CERN-PH On-line Computing M&O LHCC RRB SG 16 Sep 2004 P. Vande Vyvre CERN/PH for 4 LHC DAQ project leaders

2 LHCC RRB SG 16 Sep P. Vande Vyvre CERN-PH Introduction Questions raised by RRB Scrutiny Group: –System managers profiles –Number of system managers –M&O budget category –Replacement profile of computer/network equipment Common answer from 4 LHC experiments See also presentation by A. Ceccucci to RRB SG in April 2003 on M&O for Online Computing

3 LHCC RRB SG 16 Sep P. Vande Vyvre CERN-PH System managers profiles CategoryFunctionQualification Level-1- React to alarms - Follow predefined procedures - 24/7 operational Experience and knowledge of computers Level-2- Install/Update systems/services - Configure/Monitor - Piquet service Same as above + 2 years experience Unix shell scripts etc. SupervisorOverall supervision and direction of these tasks Informatics professional Continuity is needed for Level-2 and supervisor personnel

4 LHCC RRB SG 16 Sep P. Vande Vyvre CERN-PH System management effort (1) Estimates based on LCG guidelines: fixed number of boxes (PC, network switch, storage element) per system manager Differences between online and offline systems: –Wide variety of equipment used as a single system –Various PCs with different configurations (trigger farms, dataflow, control, monitoring, file servers) –Variety compounded by staged procurements –Very large and highly loaded network (event building e.g.) –Failure of any part of the online system will reduce efficiency of data-taking partially (loss of HLT sub-farm e.g.) or will interrupt data taking (failure of central controller) i.e. we have to run a complete coherent system Dedicated team with appropriate skills needed to ensure reliability and optimal capacity of the online systems

5 LHCC RRB SG 16 Sep P. Vande Vyvre CERN-PH System management effort (2) Manpower from collaboration ? –LHC collaborations are very large but attempts to find suitably qualified effort for system manager have failed even to meet today’s needs –Most people (physicists, engineers) do not have the right profile –Institutes who have people with proper qualifications not prepared to locate them at CERN for adequate periods Full operation –24/7 cover at Level-1, normal working hours at Level-2 + service piquet –At least 5 people Level-1 and 5 people Level-2. Reduced by some overlap –Shift crew will contribute to Level-1 Provisional estimates to be adapted (2008-9) following experience of running the system and a better knowledge of the system reliability

6 LHCC RRB SG 16 Sep P. Vande Vyvre CERN-PH System management effort (3) ALICE2 (1+1) 3 (2+1) 4 (3+1) 5 (4+1) ATLAS2 (1+1) 3 (2+1) 5 (4+1) 8 (6+2) 9 (7+2) CMS1.5 (1+0.5) 3 (2+1) 7 (5+2) 8 (6+2) 10 (8+2) LHCb2 ( ) 3 ( ) 5 (4+1) 6 (5+1) 6 (5+1) Total effort in FTEs (Level1 and Level2 + Supervisor)

7 LHCC RRB SG 16 Sep P. Vande Vyvre CERN-PH M&O budget category M&O A Request of CERN management and RRB No other identified source

8 LHCC RRB SG 16 Sep P. Vande Vyvre CERN-PH Replacement of equipment (1) Equipment: PCs, network, and storage used for dataflow and online trigger Motivations: –Reliability of equipment as it ages –Maintainability after a few years (3 years warranty) –Suitability of old equipment to follow evolution of operating system and to work with new equipment –Need to follow Operating System (OS) evolution: Security patches New PCs (staged installation) not supported by old OS versions Old OS versions not supported Code will continue to be developed with dependencies on the OS and compiler versions Online trigger code based/using offline code developed for current OS version

9 LHCC RRB SG 16 Sep P. Vande Vyvre CERN-PH Replacement of equipment (2) Categories –Disk and fileservers: lower reliability and very rapid evolution. 3 years –PCs: 4 years Replacement cost will not directly follow Moore’s Law: I/O performance limitations, new multi-core architecture might require major increase in system memory –Network Central switch: 5 years (= period of maintenance by manufacturer) Smaller peripheral switches: 4 years (shorter warranty but less critical)

10 LHCC RRB SG 16 Sep P. Vande Vyvre CERN-PH Previous practice LEP and fixed target era: –Computers were complete systems qualified by a commercial company –Maintenance contract to paid by experiments –System managers in experiments (some CERN staff) –CERN had operators staff in the computing center and in groups giving support to experiments LHC era: –Components tested, qualified and assembled into complete systems by the experiments –Overall system much larger and complex than previously –Very few operators at CERN directly employed by CERN