CPU on the farms Yen-Chu Chen, Roman Lysak, Miro Siket, Stephen Wolbers June 4, 2003.

Slides:



Advertisements
Similar presentations
Costas Busch Louisiana State University CCW08. Becomes an issue when designing algorithms The output of the algorithms may affect the energy efficiency.
Advertisements

Case Study: Photo.net March 20, What is photo.net? An online learning community for amateur and professional photographers 90,000 registered users.
Performance Analysis of Daisy- Chained CPUs Based On Modeling Krzysztof Korcyl, Jagiellonian University, Krakow Radoslaw Trebacz Jagiellonian University,
Performance analysis and Capacity planning of Home LAN Mobile Networks Lab 4
23/04/2008VLVnT08, Toulon, FR, April 2008, M. Stavrianakou, NESTOR-NOA 1 First thoughts for KM3Net on-shore data storage and distribution Facilities VLV.
1 Buddy Internet Exchange LYU0701 Supervisor: Professor Michael R. Lyu Prepared By: Kwong Kwok Wai Chan Kwan Ho
SUMS Storage Requirement 250 TB fixed disk cache 130 TB annual increment for permanently on- line data 100 TB work area (not controlled by SUMS) 2 PB near-line.
ACAT 2002, Moscow June 24-28thJ. Hernández. DESY-Zeuthen1 Offline Mass Data Processing using Online Computing Resources at HERA-B José Hernández DESY-Zeuthen.
Chapter 10 Switching Fabrics. Outline Physical Interconnection Physical box with backplane Individual blades plug into backplane slots Each blade contains.
VMware vCenter Server Module 4.
What is a database? Databases are designed to offer an organized mechanism for storing, managing and retrieving information.

External and internal data traffic in Tier-2 ATLAS farms. Sketch of farm organization Some approximate estimate s of internal and external data flows in.
Operating Systems.
Test Of Distributed Data Quality Monitoring Of CMS Tracker Dataset H->ZZ->2e2mu with PileUp - 10,000 events ( ~ 50,000 hits for events) The monitoring.
1 Data Management D0 Monte Carlo needs The NIKHEF D0 farm The data we produce The SAM data base The network Conclusions Kors Bos, NIKHEF, Amsterdam Fermilab,
Claudio Grandi INFN Bologna CMS Operations Update Ian Fisk, Claudio Grandi 1.
CDF data production models 1 Data production models for the CDF experiment S. Hou for the CDF data production team.
Building a distributed software environment for CDF within the ESLEA framework V. Bartsch, M. Lancaster University College London.
D0 Farms 1 D0 Run II Farms M. Diesburg, B.Alcorn, J.Bakken, T.Dawson, D.Fagan, J.Fromm, K.Genser, L.Giacchetti, D.Holmgren, T.Jones, T.Levshina, L.Lueking,
3rd June 2004 CDF Grid SAM:Metadata and Middleware Components Mòrag Burgon-Lyon University of Glasgow.
Jean-Yves Nief CC-IN2P3, Lyon HEPiX-HEPNT, Fermilab October 22nd – 25th, 2002.
12th November 2003LHCb Software Week1 UK Computing Glenn Patrick Rutherford Appleton Laboratory.
A Design for KCAF for CDF Experiment Kihyeon Cho (CHEP, Kyungpook National University) and Jysoo Lee (KISTI, Supercomputing Center) The International Workshop.
Slide 1 – CPU Acronym Definition The CPU is a small square unit that sits behind a fan, the fan keeps the CPU from over heating. The CPU (Central Processing.
MPE/iX 7.5 and HP e3000 PA-8700 Performance Upgrade Updates Kevin Cooper Hewlett-Packard
DAQ Issues for the 12 GeV Upgrade CODA 3. A Modest Proposal…  Replace aging technologies  Run Control  Tcl-Based DAQ components  mSQL  Hall D Requirements.
Stephen Wolbers CHEP2000 February 7-11, 2000 Stephen Wolbers CHEP2000 February 7-11, 2000 CDF Farms Group: Jaroslav Antos, Antonio Chan, Paoti Chang, Yen-Chu.
Frontiers in Massive Data Analysis Chapter 3.  Difficult to include data from multiple sources  Each organization develops a unique way of representing.
CDF Offline Production Farms Stephen Wolbers for the CDF Production Farms Group May 30, 2001.
21 st October 2002BaBar Computing – Stephen J. Gowdy 1 Of 25 BaBar Computing Stephen J. Gowdy BaBar Computing Coordinator SLAC 21 st October 2002 Second.
6/26/01High Throughput Linux Clustering at Fermilab--S. Timm 1 High Throughput Linux Clustering at Fermilab Steven C. Timm--Fermilab.
Architecture for Caching Responses with Multiple Dynamic Dependencies in Multi-Tier Data- Centers over InfiniBand S. Narravula, P. Balaji, K. Vaidyanathan,
System Analysis (Part 3) System Control and Review System Maintenance.
D0 Taking Stock 11/ /2005 Calibration Database Servers.
The Alternative Larry Moore. 5 Nodes and Variant Input File Sizes Hadoop Alternative.
EGEE is a project funded by the European Union under contract IST HEP Use Cases for Grid Computing J. A. Templon Undecided (NIKHEF) Grid Tutorial,
CDF Production Farms Yen-Chu Chen, Roman Lysak, Stephen Wolbers May 29, 2003.
Management of the LHCb DAQ Network Guoming Liu * †, Niko Neufeld * * CERN, Switzerland † University of Ferrara, Italy.
Parts of a Computer - Introduction
Unzip the attachment and double click to run it..
1 Database mini workshop: reconstressing athena RECONSTRESSing: stress testing COOL reading of athena reconstruction clients Database mini workshop, CERN.
Outline: Tasks and Goals The analysis (physics) Resources Needed (Tier1) A. Sidoti INFN Pisa.
COSC 3330/6308 Solutions to the Third Problem Set Jehan-François Pâris November 2012.
Chapter 3 System Performance and Models Introduction A system is the part of the real world under study. Composed of a set of entities interacting.
The KLOE computing environment Nuclear Science Symposium Portland, Oregon, USA 20 October 2003 M. Moulson – INFN/Frascati for the KLOE Collaboration.
OPERATING SYSTEMS CS 3530 Summer 2014 Systems and Models Chapter 03.
Dzmitry Kliazovich University of Luxembourg, Luxembourg
Status of the Bologna Computing Farm and GRID related activities Vincenzo M. Vagnoni Thursday, 7 March 2002.
Capacity Planning - Managing the hardware resources for your servers.
Infrastructure for Data Warehouses. Basics Of Data Access Data Store Machine Memory Buffer Memory Cache Data Store Buffer Bus Structure.
UTA MC Production Farm & Grid Computing Activities Jae Yu UT Arlington DØRACE Workshop Feb. 12, 2002 UTA DØMC Farm MCFARM Job control and packaging software.
In a multitasking environment : what is a timeslice (quantum)? what is a context switch? When a large number of processes (jobs) are multitasking the actual.
Outline: Status: Report after one month of Plans for the future (Preparing Summer -Fall 2003) (CNAF): Update A. Sidoti, INFN Pisa and.
MC Production in Canada Pierre Savard University of Toronto and TRIUMF IFC Meeting October 2003.
D0 Farms 1 D0 Run II Farms M. Diesburg, B.Alcorn, J.Bakken, R. Brock,T.Dawson, D.Fagan, J.Fromm, K.Genser, L.Giacchetti, D.Holmgren, T.Jones, T.Levshina,
Markus Frank (CERN) & Albert Puig (UB).  An opportunity (Motivation)  Adopted approach  Implementation specifics  Status  Conclusions 2.
Hans Wenzel CDF CAF meeting October 18 th -19 th CMS Computing at FNAL Hans Wenzel Fermilab  Introduction  CMS: What's on the floor, How we got.
Jianming Qian, UM/DØ Software & Computing Where we are now Where we want to go Overview Director’s Review, June 5, 2002.
1 By: Solomon Mikael (UMBC) Advisors: Elena Vataga (UNM) & Pavel Murat (FNAL) Development of Farm Monitoring & Remote Concatenation for CDFII Production.
Data Network Designing and Evaluation
A Complete Guide to Select the Best VPS Hosting Providers.
11th September 2002Tim Adye1 BaBar Experience Tim Adye Rutherford Appleton Laboratory PPNCG Meeting Brighton 11 th September 2002.
1 P. Murat, Mini-review of the CDF Computing Plan 2006, 2005/10/18 An Update to the CDF Offline Plan and FY2006 Budget ● Outline: – CDF computing model.
Unit 2 Technology Systems
OPERATING SYSTEMS CS 3502 Fall 2017
Rate of Change A ratio (fraction) of the amount of change in the dependent variable (output) to the amount of change in the independent variable (input)
Bernd Panzer-Steindel, CERN/IT
Presentation transcript:

CPU on the farms Yen-Chu Chen, Roman Lysak, Miro Siket, Stephen Wolbers June 4, 2003

Plot found on fnpcc (farms home page) What does it mean? Why isn’t it higher? Can it get higher?

Each CPU can only deliver 100% total (user + system) So one has to add the two lines: 70% 10%

I/O nodes – Not available for processing There are 16 “I/O” nodes on the farms. –8 input –8 output There is no CPU available for reconstruction on those nodes. The amount of CPU used on those I/O nodes is quite low. Subtract 32 processors and the total available is 270, not 302 CPUs. This adds 10% to the utilization graph. So the 80% (70+10) becomes 88%.

Now look at the plots behind: Number of CPU’s used *0.88 = 238 (consistent) 238

FBS Information Input and output Nodes (Black and red) (Max Used) 32 (I/O)

FBS utilization 248 nodes used for processing at maximum (280 (total) –32 (I/O)). 270 available for processing. Ratio is 248/270 = 92%. Consistent with 88%.

What does it mean? While the farm is being used fully (Friday May 23 through Wed May 28) the utilization of the farm was about 88-92%, whether it is measured by fraction of CPU used on the 270 CPU’s available for processing or by the number of CPU’s used as assigned by the batch system.

Consistency of the result with the number of events processed One can calculate the expected number of events/day that can be reconstructed on this farm using the CPU time/event and the total CPU. Total capacity is = 317 GHz –( MHz, 64 1 GHz, GHz, GHz). CPU time/event = 3.3 GHz-s. –Still working on this number. Yen-Chu may talk about this next week. Max. No. of events possible = (317/3.3) * 24*3600 –Max. Total No. of events = 8.3M events/day. Average processed during this time was 7.5M/day. This gives 7.5/8.3 = 90%

CPU Time measurements

Can the farm expand? The issue is whether we are “bottlenecked” in any part of the system: –Input –Batch System –Output –Internal transfers –Database accesses

Farm Topology Switch Gbit Worker Nodes (151) (100 Mbit) I/O Nodes (Gbit) Enstore Nodes Servers, fcdfsgi1, fcdfsgi2, CAF1

Farms Expansion Given the current hardware, no bottlenecks are expected in: –Input (8 simultaneous Gbit connections) –Output (8 simultaneous Gbit connections) But there is the switch-to-switch bandwidth limitation to take into account. –Batch system (FBSNG should be OK) –Internal Transfers (32 Gbit backplane speed on switch) Not sure about DB access –Internal Farms DB –CDF DB (calibrations and DFC access)

Conclusion The Production Farm is running at about 90% efficiency, we think. We may be able to increase the efficiency by a little but there is no huge increase available. The graph on the fnpcc home page is misleading. We should update it. The farms can expand by adding CPUs or by replacing old slow CPUs with newer ones. The FY03 plan for the farms adds more CPUs to be able to handle increased data from online and reprocessing.