Quantitative Methodologies for the Scientific Computing: An Introductory Sketch Alberto Ciampa, INFN-Pisa Enrico Mazzoni, INFN-Pisa.

Slides:



Advertisements
Similar presentations
SLA-Oriented Resource Provisioning for Cloud Computing
Advertisements

Southgrid Status Pete Gronbech: 27th June 2006 GridPP 16 QMUL.
S. Gadomski, "ATLAS computing in Geneva", journee de reflexion, 14 Sept ATLAS computing in Geneva Szymon Gadomski description of the hardware the.
Operating Systems 1 K. Salah Module 2.1: CPU Scheduling Scheduling Types Scheduling Criteria Scheduling Algorithms Performance Evaluation.
A Grid Resource Broker Supporting Advance Reservations and Benchmark- Based Resource Selection Erik Elmroth and Johan Tordsson Reporter : S.Y.Chen.
SUMS Storage Requirement 250 TB fixed disk cache 130 TB annual increment for permanently on- line data 100 TB work area (not controlled by SUMS) 2 PB near-line.
Wk 2 – Scheduling 1 CS502 Spring 2006 Scheduling The art and science of allocating the CPU and other resources to processes.
IFIN-HH LHCB GRID Activities Eduard Pauna Radu Stoica.
Research Computing with Newton Gerald Ragghianti Nov. 12, 2010.
March 27, IndiaCMS Meeting, Delhi1 T2_IN_TIFR of all-of-us, for all-of-us, by some-of-us Tier-2 Status Report.
Federico Calzolari 1, Silvia Arezzini 2, Alberto Ciampa 2, Enrico Mazzoni 2 1 Scuola Normale Superiore - Pisa, Italy 2 National Institute of Nuclear Physics.
1 INDIACMS-TIFR TIER-2 Grid Status Report IndiaCMS Meeting, Sep 27-28, 2007 Delhi University, India.
Edge Based Cloud Computing as a Feasible Network Paradigm(1/27) Edge-Based Cloud Computing as a Feasible Network Paradigm Joe Elizondo and Sam Palmer.
Copyright © 2010 Platform Computing Corporation. All Rights Reserved.1 The CERN Cloud Computing Project William Lu, Ph.D. Platform Computing.
PROOF: the Parallel ROOT Facility Scheduling and Load-balancing ACAT 2007 Jan Iwaszkiewicz ¹ ² Gerardo Ganis ¹ Fons Rademakers ¹ ¹ CERN PH/SFT ² University.
Data oriented job submission scheme for the PHENIX user analysis in CCJ Tomoaki Nakamura, Hideto En’yo, Takashi Ichihara, Yasushi Watanabe and Satoshi.
Status Report on Tier-1 in Korea Gungwon Kang, Sang-Un Ahn and Hangjin Jang (KISTI GSDC) April 28, 2014 at 15th CERN-Korea Committee, Geneva Korea Institute.
Thinking about Accounting Matteo Melani SLAC Open Science Grid.
Silberschatz, Galvin and Gagne  2002 Modified for CSCI 399, Royden, Operating System Concepts Operating Systems Lecture 1 Introduction Read:
ISG We build general capability Introduction to Olympus Shawn T. Brown, PhD ISG MISSION 2.0 Lead Director of Public Health Applications Pittsburgh Supercomputing.
◦ What is an Operating System? What is an Operating System? ◦ Operating System Objectives Operating System Objectives ◦ Services Provided by the Operating.
BINP/GCF Status Report BINP LCG Site Registration Oct 2009
Open Science Grid The OSG Accounting System: GRATIA by Philippe Canal (FNAL) & Matteo Melani (SLAC) Mumbai, India CHEP2006.
INTRODUCTION The GRID Data Center at INFN Pisa hosts a big Tier2 for the CMS experiment, together with local usage from other HEP related/not related activities.
Grid Workload Management & Condor Massimo Sgaravatto INFN Padova.
Workshop KEK - CC-IN2P3 KEK new Grid system 27 – 29 Oct. CC-IN2P3, Lyon, France Day2 14: :55 (40min) Koichi Murakami, KEK/CRC.
Using Virtual Servers for the CERN Windows infrastructure Emmanuel Ormancey, Alberto Pace CERN, Information Technology Department.
14 Aug 08DOE Review John Huth ATLAS Computing at Harvard John Huth.
BOF: Megajobs Gracie: Grid Resource Virtualization and Customization Infrastructure How to execute hundreds of thousands tasks concurrently on distributed.
Integrating JASMine and Auger Sandy Philpott Thomas Jefferson National Accelerator Facility Jefferson Ave. Newport News, Virginia USA 23606
Storage, Networks, Data Management Report on Parallel Session OSG Meet 8/2006 Frank Würthwein (UCSD)
Analysis in STEP09 at TOKYO Hiroyuki Matsunaga University of Tokyo WLCG STEP'09 Post-Mortem Workshop.
The LHCb Italian Tier-2 Domenico Galli, Bologna INFN CSN1 Roma,
OPERATING SYSTEMS CS 3530 Summer 2014 Systems with Multi-programming Chapter 4.
Proposal for a IS schema Massimo Sgaravatto INFN Padova.
IHEP(Beijing LCG2) Site Report Fazhi.Qi, Gang Chen Computing Center,IHEP.
Status of the Bologna Computing Farm and GRID related activities Vincenzo M. Vagnoni Thursday, 7 March 2002.
A Silvio Pardi on behalf of the SuperB Collaboration a INFN-Napoli -Campus di M.S.Angelo Via Cinthia– 80126, Napoli, Italy CHEP12 – New York – USA – May.
ISG We build general capability Introduction to Olympus Shawn T. Brown, PhD ISG MISSION 2.0 Lead Director of Public Health Applications Pittsburgh Supercomputing.
RHIC/US ATLAS Tier 1 Computing Facility Site Report Christopher Hollowell Physics Department Brookhaven National Laboratory HEPiX Upton,
Cs431-cotter1 Processes and Threads Tanenbaum 2.1, 2.2 Crowley Chapters 3, 5 Stallings Chapter 3, 4 Silberschaz & Galvin 3, 4.
Participation of JINR in CERN- INTAS project ( ) Korenkov V., Mitcin V., Nikonov E., Oleynik D., Pose V., Tikhonenko E. 19 march 2004.
CS4315A. Berrached:CMS:UHD1 Introduction to Operating Systems Chapter 1.
The Worldwide LHC Computing Grid Frédéric Hemmer IT Department Head Visit of INTEL ISEF CERN Special Award Winners 2012 Thursday, 21 st June 2012.
BNL dCache Status and Plan CHEP07: September 2-7, 2007 Zhenping (Jane) Liu for the BNL RACF Storage Group.
Markus Frank (CERN) & Albert Puig (UB).  An opportunity (Motivation)  Adopted approach  Implementation specifics  Status  Conclusions 2.
Capacity Planning in a Virtual Environment Chris Chesley, Sr. Systems Engineer
Virtual Server Server Self Service Center (S3C) JI July.
PADME Kick-Off Meeting – LNF, April 20-21, DAQ Data Rate - Preliminary estimate Tentative setup: all channels read with Fast ADC 1024 samples, 12.
CYBERSAR Cybersar Gianluigi Zanetti Cybersar.
1 The S.Co.P.E. Project and its model of procurement G. Russo, University of Naples Prof. Guido Russo.
Dominique Boutigny December 12, 2006 CC-IN2P3 a Tier-1 for W-LCG 1 st Chinese – French Workshop on LHC Physics and associated Grid Computing IHEP - Beijing.
Claudio Grandi INFN Bologna Virtual Pools for Interactive Analysis and Software Development through an Integrated Cloud Environment Claudio Grandi (INFN.
26. Juni 2003Bernd Panzer-Steindel, CERN/IT1 LHC Computing re-costing for for the CERN T0/T1 center.
November 28, 2007 Dominique Boutigny – CC-IN2P3 CC-IN2P3 Update Status.
Enabling Grids for E-sciencE Claudio Cherubino INFN DGAS (Distributed Grid Accounting System)
Dynamic Extension of the INFN Tier-1 on external resources
Extending the farm to external sites: the INFN Tier-1 experience
Job monitoring and accounting data visualization
OPERATING SYSTEMS CS 3502 Fall 2017
Processes and threads.
Andrea Chierici On behalf of INFN-T1 staff
Towards GLUE Schema 2.0 Sergio Andreozzi INFN-CNAF Bologna, Italy
Lecture Topics: 11/1 Processes Process Management
Grid Computing.
Статус ГРИД-кластера ИЯФ СО РАН.
Ákos Frohner EGEE'08 September 2008
The INFN Tier-1 Storage Implementation
GGF15 – Grids and Network Virtualization
Wide Area Workload Management Work Package DATAGRID project
Presentation transcript:

Quantitative Methodologies for the Scientific Computing: An Introductory Sketch Alberto Ciampa, INFN-Pisa Enrico Mazzoni, INFN-Pisa 08/09/2015Alberto Ciampa: Scientific Computing1

Quantitative Methodologies for the Scientific Computing 08/09/2015Alberto Ciampa: Scientific Computing2 We want to evaluate: activity, efficiency, potentiality To account the costs: global costs, costs per Group/Experiment, costs per “produced unity” Power costs, investments, operative costs, maintenance, human (FTE) etc.: for this example we’ll consider Power costs Optimization strategies Forecast and scheduling New jobs and external work order evaluation To define:

The INFN-Pisa Computing Center Present situation: GRID ≈ 1900 production (WN) cores – CMS T2: 53% – Theophys: 39% National INFN Theophys Cluster: 1024 cores (under implementation) Services Storage ≈ 350 TB 08/09/2015Alberto Ciampa: Scientific Computing3

The INFN-Pisa Computing Center 08/09/2015Alberto Ciampa: Scientific Computing4 Plan Roof 3D sketch

Introduction: A request from our Director How can we account the various computing costs to the Groups/Experiments? We started defining the “resources” and trying to define a model for their usage by the Groups/Experiments. 08/09/2015Alberto Ciampa: Scientific Computing5

Introduction: The Resources and Production Resources (examples): Rack space (production) CPU Network port Storage space Power Conditioning A resource can be statically or dinamically allocated. Production Model: GRID (and local queues): CPU dynamic allocation Farm dedicated to Group/Experiment: CPU static allocation 08/09/2015Alberto Ciampa: Scientific Computing6

Introduction: The Model The final activity is the “production” measured (for instance) in day/core. The main resource is the computing core Some resources are “tailored”: the power, air conditioning, network, rack space Some resources are naturally statically allocated: storage space 08/09/2015Alberto Ciampa: Scientific Computing7

An Example: INFN-Pisa computing Center 08/09/2015Alberto Ciampa: Scientific Computing8 RACK space: 34 rack, 33 42U and 1 48U Available power: 1380A+450A(roof) = 317.4KW KW(roof) Used power (mean): 602A + 228A (roof) = 138.5KW KW (roof) = 43.6%, 50.6% (roof) Available power: 1380A+450A(roof) = 317.4KW KW(roof) Used power (mean): 602A + 228A (roof) = 138.5KW KW (roof) = 43.6%, 50.6% (roof) Air Conditioning: 235 KW Used (mean): 145 KW = 61.7% Max potential: 315 KW Air Conditioning: 235 KW Used (mean): 145 KW = 61.7% Max potential: 315 KW LAN: 900 * 1GbE + 40 * 10 GbE WAN: 1 Gb/s GRID Mb/s Sect. LAN: 900 * 1GbE + 40 * 10 GbE WAN: 1 Gb/s GRID Mb/s Sect. May 2009 Data

The Model: Some Important Definition General Efficiency: % allocated (production) cores – GRID: % cores running a job – Farm: total of cores Specific Efficiency: % of UN cores – GRID: cputime/walltime – Farm: % (UN cores)/(total cores) State of a core: SIWUN (System, Idle, Wait I/O, User, Nice) 08/09/2015Alberto Ciampa: Scientific Computing9

Survey: Scientific Computing, 1/6/08-31/5/09 08/09/2015Alberto Ciampa: Scientific Computing10 Computing Core: 1567 = (GRID) (Experiment Farm) Computing power: 2.35MSI2k ( HepSPEC) Non Production Core for Scientific Computing: 98 (GRID+dCache+GPFS) Storage Space: 300TB gross Max potential: core + 1 PB storage (quad core, 1 TB disk) Computing Core: 1567 = (GRID) (Experiment Farm) Computing power: 2.35MSI2k ( HepSPEC) Non Production Core for Scientific Computing: 98 (GRID+dCache+GPFS) Storage Space: 300TB gross Max potential: core + 1 PB storage (quad core, 1 TB disk) GRID GRID day-core: , that is 962 year-core. Usage %: 86% (general efficiency), 75% (specific eff.)=65% (total) Power usage (gross): KW Power consumption per day-core (gross): 3.44 KWh GRID day-core: , that is 962 year-core. Usage %: 86% (general efficiency), 75% (specific eff.)=65% (total) Power usage (gross): KW Power consumption per day-core (gross): 3.44 KWh Experiment Farm Experiment Farm day-core: , that is 185 year core. usage %: 33% (general + specific efficiency) power usage (gross): 29.2 KW power consumption per day-core (gross): KWh Experiment Farm day-core: , that is 185 year core. usage %: 33% (general + specific efficiency) power usage (gross): 29.2 KW power consumption per day-core (gross): KWh

GRID vs Farm 08/09/2015Alberto Ciampa: Scientific Computing11

Example: GRID and local queues accounting 08/09/2015Alberto Ciampa: Scientific Computing12

Example: Farm Accounting 08/09/2015Alberto Ciampa: Scientific Computing13 Example of a farm dedicated to an external user Raw data from GangliaCorrected data from Ganglia

Example of farm utilization 08/09/2015Alberto Ciampa: Scientific Computing14 21 core d/core, 49% 18 core 3 d/core, 0% Data derived from Ganglia

Notes on the Methodology Scientific Computing as a Production Activity – It is not important, in general, the single production item, but the production flow – The quality of the system is measured as the production level (quantity) and the production efficiency: General Efficiency = % of infrastructure usage Specific Efficiency = % of usage efficiency – Excepted for specific cases we don’t matter on what is produced, but how is produced: how much and how the systems are used – It is possible to, and we have to, measure the production costs, globally and for each “production line” (group or experiment): Power, investments, consumables, maintenance, FTE – It is possible, and we have to be ready to accept other job order, both internal (from our institute, mandatory) and external – We do not pretend to cover everything (interactive, T3, some farms) 08/09/2015Alberto Ciampa: Scientific Computing15

Scientific Computing The Context – GRID (wn, middleware, SRM), Group/Experiment Farm and cluster – Storage for the above mentioned activities(+ disk server, SAN, Switch FC ecc.) – Data Center LAN (for SC), WAN dedicated to SC Usage Paradigms – GRID: only batch (LSF), resources shared among the VO, dynamic allocation following the requests (“fair-share”), accounting on usage (general efficiency). Including local queues. – Group/Experiment Farm/Cluster: the owner can freely access, reserved resources, accounting with static allocation non on usage base Users – GRID: all the VO accepted by INFN-GRID, “welcome” fair-share “, support (success job, efficiency) – Farm/Cluster: need good motivation, accounting including services (rack, network, high speed network, etc.), hosting 08/09/2015Alberto Ciampa: Scientific Computing16

Quantitative Methodologies Resources – Global (%: SC and Service): space, power, conditioning, network, services* * Shared Services (DNS, DHCP, Auth*, shared storage, etc.) – SC: Server, Storage Data: LSFMON (GRID), Ganglia (Farm) Measures: – LSFMON: (#core, #job, #queued) only mean,  VO (#job, walltime, CPUtime) only , #CPU Integrals? – Ganglia:  Farm (  Server (#core, %SIWUN)) with mean Integral evaluation: not trivial – CPUPower = (RoomPower + Roof)/  (CPUProd + CPUServ) Power vs Performance – CPU vs Core 08/09/2015Alberto Ciampa: Scientific Computing17

Methodologies: GRID Account core-time allocated to a specific VO (walltime), regardless exit status, no account if empty jobslot, account core-time SIW+UN (not considering specific efficiency) General Efficiency: 100(1-∫freecore / ∫core) ≈ 100(1- mean(freecore)/mean(core)) GRIDPower = CPUPower∙∫CPUGRID ≈ CPUPower∙mean(CPUGRID) ≈ CPUPower∙mean(coreGRID)∙k Power/Day-Core = GRIDPower/∫(core-freecore) ≈ GRIDPower/(mean(core) – mean(freecore)) Efficiency = (General Efficiency)∙(Specific Efficiency*) – * CPUtime/Walltime, both global and per VO 08/09/2015Alberto Ciampa: Scientific Computing18

Methodologies: Farm Total cost independent from usage level (no specific efficiency= resources always allocated) General Efficiency: 100(∫coreUN / ∫core) FarmPower = CPUPower∙∫CPUFarm Power/Day-Core = FarmPower/∫CoreFarmUN Efficiency = (day-core working)/(day-core allocated) = ∫(#core∙%UN)/∫#core * – * both global and per Farm 08/09/2015Alberto Ciampa: Scientific Computing19

Further Info ccr-2009/271-ccr p-calcolo-scientifico-prime-metodologie- quantitative-per-un-ambiente-di-produzione sorry, in italian, but for any question you can contact: ccr-2009/271-ccr p-calcolo-scientifico-prime-metodologie- quantitative-per-un-ambiente-di-produzione – Me – Enrico Mazzoni 08/09/2015Alberto Ciampa: Scientific Computing20