History of the National INFN Pool P. Mazzanti, F. Semeria INFN – Bologna (Italy) European Condor Week 2006 Milan, 29-Jun-2006.

Slides:



Advertisements
Similar presentations
UK Condor Week NeSC – Scotland – Oct 2004 Condor Team Computer Sciences Department University of Wisconsin-Madison The Bologna.
Advertisements

Evaluation of the Globus Toolkit: Status Roberto Cucchi – INFN Cnaf Antonia Ghiselli – INFN Cnaf Giuseppe Lo Biondo – INFN Milano Francesco Prelz – INFN.
Dan Bradley Computer Sciences Department University of Wisconsin-Madison Schedd On The Side.
Line Efficiency     Percentage Month Today’s Date
HOW TO MAKE A CLIMATE GRAPH CLIMATE GRAPHING ASSIGNMENT PT.2.
David Adams ATLAS DIAL Distributed Interactive Analysis of Large datasets David Adams BNL March 25, 2003 CHEP 2003 Data Analysis Environment and Visualization.
Workload Management Workpackage Massimo Sgaravatto INFN Padova.
Workload Management Massimo Sgaravatto INFN Padova.
First steps implementing a High Throughput workload management system Massimo Sgaravatto INFN Padova
Evaluation of the Globus GRAM Service Massimo Sgaravatto INFN Padova.
Miron Livny Computer Sciences Department University of Wisconsin-Madison Harnessing the Capacity of Computational.
Vladimir Litvin, Harvey Newman Caltech CMS Scott Koranda, Bruce Loftis, John Towns NCSA Miron Livny, Peter Couvares, Todd Tannenbaum, Jamie Frey Wisconsin.
GRACE Project IST EGAAP meeting – Den Haag, 25/11/2004 Giuseppe Sisto – Telecom Italia Lab.
Workload Management WP Status and next steps Massimo Sgaravatto INFN Padova.
October, Scientific Linux INFN/Trieste B.Gobbo – Compass R.Gomezel - T.Macorini - L.Strizzolo INFN - Trieste.
03/27/2003CHEP20031 Remote Operation of a Monte Carlo Production Farm Using Globus Dirk Hufnagel, Teela Pulliam, Thomas Allmendinger, Klaus Honscheid (Ohio.
BaBar Grid Computing Eleonora Luppi INFN and University of Ferrara - Italy.
+ discussion in Software WG: Monte Carlo production on the Grid + discussion in TDAQ WG: Dedicated server for online services + experts meeting (Thusday.
Grid Workload Management & Condor Massimo Sgaravatto INFN Padova.
LCG Service Challenge Phase 4: Piano di attività e impatto sulla infrastruttura di rete 1 Service Challenge Phase 4: Piano di attività e impatto sulla.
UKQCD QCDgrid Richard Kenway. UKQCD Nov 2001QCDgrid2 why build a QCD grid? the computational problem is too big for current computers –configuration generation.
21 st October 2002BaBar Computing – Stephen J. Gowdy 1 Of 25 BaBar Computing Stephen J. Gowdy BaBar Computing Coordinator SLAC 21 st October 2002 Second.
6/26/01High Throughput Linux Clustering at Fermilab--S. Timm 1 High Throughput Linux Clustering at Fermilab Steven C. Timm--Fermilab.
Condor: High-throughput Computing From Clusters to Grid Computing P. Kacsuk – M. Livny MTA SYTAKI – Univ. of Wisconsin-Madison
Status of the LHCb MC production system Andrei Tsaregorodtsev, CPPM, Marseille DataGRID France workshop, Marseille, 24 September 2002.
ATLAS WAN Requirements at BNL Slides Extracted From Presentation Given By Bruce G. Gibbard 13 December 2004.
HIGUCHI Takeo Department of Physics, Faulty of Science, University of Tokyo Representing dBASF Development Team BELLE/CHEP20001 Distributed BELLE Analysis.
Peter Couvares Associate Researcher, Condor Team Computer Sciences Department University of Wisconsin-Madison
A Fully Automated Fault- tolerant System for Distributed Video Processing and Off­site Replication George Kola, Tevfik Kosar and Miron Livny University.
May Donatella Lucchesi 1 CDF Status of Computing Donatella Lucchesi INFN and University of Padova.
Computer Simulations of Liquid Crystal Models using Condor C. Chiccoli (1, P. Pasini (1, F. Semeria (1, C. Zannoni (2 ( 1 INFN Sezione di Bologna,
Condor on WAN D. Bortolotti - INFN Bologna T. Ferrari - INFN Cnaf A.Ghiselli - INFN Cnaf P.Mazzanti - INFN Bologna F. Prelz - INFN Milano F.Semeria - INFN.
6 march Building the INFN Grid Proposal outline a.ghiselli,l.luminari,m.sgaravatto,c.vistoli INFN Grid meeting, milano.
June 29, 2006 P. Capiluppi The First CMS Data Challenge (~1998/99) Using Condor.
Miron Livny Computer Sciences Department University of Wisconsin-Madison Condor and (the) Grid (one of.
GIIS Implementation and Requirements F. Semeria INFN European Datagrid Conference Amsterdam, 7 March 2001.
Managing Network Resources in Condor Jim Basney Computer Sciences Department University of Wisconsin-Madison
Douglas Thain, John Bent Andrea Arpaci-Dusseau, Remzi Arpaci-Dusseau, Miron Livny Computer Sciences Department, UW-Madison Gathering at the Well: Creating.
Simulation Production System Science Advisory Committee Meeting UW-Madison March 1 st -2 nd 2007 Juan Carlos Díaz Vélez.
2-December Offline Report Matthias Schröder Topics: Monte Carlo Production New Linux Version Tape Handling Desktop Computers.
BaBar & Grid Eleonora Luppi for the BaBarGrid Group TB GRID Bologna 15 febbraio 2005.
Windows Server 2008 R2 Oct 2009 Windows Server 2003
Workload Management Workpackage
Simulation Production System
Condor A New PACI Partner Opportunity Miron Livny
Jan 2016 Solar Lunar Data.
Eleonora Luppi INFN and University of Ferrara - Italy
Condor – A Hunter of Idle Workstation
Grid Means Business OGF-20, Manchester, May 2007
Average Monthly Temperature and Rainfall
Basic Grid Projects – Condor (Part I)
Gantt Chart Enter Year Here Activities Jan Feb Mar Apr May Jun Jul Aug
Q1 Q2 Q3 Q4 PRODUCT ROADMAP TITLE Roadmap Tagline MILESTONE MILESTONE
Free PPT Diagrams : ALLPPT.com

Text for section 1 1 Text for section 2 2 Text for section 3 3
Text for section 1 1 Text for section 2 2 Text for section 3 3
Text for section 1 1 Text for section 2 2 Text for section 3 3
Text for section 1 1 Text for section 2 2 Text for section 3 3
Q1 Q2 Q3 Q4 PRODUCT ROADMAP TITLE Roadmap Tagline MILESTONE MILESTONE
Free PPT Diagrams : ALLPPT.com
Text for section 1 1 Text for section 2 2 Text for section 3 3
Text for section 1 1 Text for section 2 2 Text for section 3 3
Text for section 1 1 Text for section 2 2 Text for section 3 3
Text for section 1 1 Text for section 2 2 Text for section 3 3
Text for section 1 1 Text for section 2 2 Text for section 3 3
Status and plans for bookkeeping system and production tools
Text for section 1 1 Text for section 2 2 Text for section 3 3
Q1 Q2 Q3 Q4 PRODUCT ROADMAP TITLE Roadmap Tagline MILESTONE MILESTONE
Presentation transcript:

History of the National INFN Pool P. Mazzanti, F. Semeria INFN – Bologna (Italy) European Condor Week 2006 Milan, 29-Jun-2006

Our first experience (1997) Monte Carlo event generation. WA92 experiment at CERN: Beauty search in fixed target experiment. Working conditions: a dedicated farm of 3 Alpha VMS and 6 DecStation Ultrix. Results: events/day (0 dead time).

Then Condor came... Production Condor Pool: –23 DEC Alpha 18 Bologna 2 Cnaf (Bologna) 2 Turin 1 Rome –4 HP –6 DecStation Ultrix –5 Pentium Linux

The throughput of the 23 Alpha subset of the pool: to events/day plus events/day with the pool in Madison. We got x5 the production at zero cost! Then Condor came… (cont.)

Give me a calculator… At INFN : 1000 PCs used 8 hours/day by the owners (16 hours/day idle) 1000 * 16 = hours = 1.8 year 1.8 year equivalent CPU wasted each day!

The ‘Condor on WAN’ INFN Project Approved by the Computing Committee on February Goal: install Condor on the INFN WAN and evaluate its effectiveness for the INFN computational needs. 30 people involved.

The Condor INFN Project (cont.) The INFN Structure 27 sites More then 10 experiments on nuclear and sub- nuclear physics. Hundreds of researchers involved. Distributed and heterogeneous resources. (good frame for a grid…)

The Condor INFN Project (cont.) The first example in Europe of a national distributed computing environment

Collaboration INFN and Computer Science Dept. of the University of Wisconsin, Madison Coordinators for the project: –for Madison: Miron Livny –for INFN: Paolo Mazzanti.

General usage policy Each group of people must be able to maintain full control over their own machines.

General usage policy (cont.) A Condor job sent from a machine of a group must have the maximum access priority on the machines of the same group.

Subpools rank expression: a resource owner can give priority to requests from selected groups: GROUP_ID = “My_Group” RANK = target.GROUP_ID == “My_Group” From the group point of view the machines make a pool by themselves: a subpool.

Checkpoint Server Domains The network could be a concern with a computing environment distributed over a WAN. Policy: a job should run in the ckpt domain if local resources are available.

The INFN-WAN Pool (2000) ALPHA/OSF1 111 INTEL/LINUX 68 HP/HPUX 10 SUN/SOLARIS 11 INTEL/WNT 1 Total201

The INFN-WAN Pool (2002) ALPHA/OSF1 107 INTEL/LINUX 122 SUN/SOLARIS 6 INTEL/WNT 1 Total 235

INFN Condor Pool Allocation Time (Hours) (1999) Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec Jan TOTAL (> 36 years)

Applications Simulation of the CMS detector. MC event production for CMS. Simulation of Cherenkov light in the atmosphere (CLUE). MC integration in perturbative QCD. Dynamic chaotic systems. Extra-solar planets orbits. Sthocastics differentials equations. Maxwell equations.

Simulation of Cherenkov light in the atmosphere (CLUE). Without Condor (1 Alpha): –20000 events/week. With Condor: events in 2 weeks (gain: x9)

Dynamic chaotic systems Computations based on complex matrix (multiplication,inversion,determinants etc.). Very CPU-bound with little output and no input. Gains with Condor respect to the only Alpha used: x3.5 to x10.

MC integration in perturbative QCD CPU-bound No input, very small output Gains with Condor: x10.

Maxwell Equations 201 jobs, each with a different value of an input parameter. Output: 401 numbers/jobs Gains with Condor compared to the only Alpha available: x11

People very very very happy!!

The Pool Today 8 checkpoint servers: Bologna,Milano,Torino,Pavia,Trieste, Padova,LNGS,Napoli. 270 CPUs 45.5 years CPU equivalent used from January to June 25th -> 91 years CPU/year

Why the pool does not grow up? Why Condor is not installed on all PCs? Is it difficult to install? Is it difficult to use? Is it difficult to maintain? We are prefer to buy new machines?

An automatic installation tool Three type of installation –server: binary and library only –client: configuration files only –Full: client+server Rpm files are built up Web interface

Server installation Only binaries and libraries Usually done on nfs or afs servers. It exports bin and lib to the clients

Client installation Install configuration files using data specified through the web interfaceCreates startup and shutdown scripts for the Condor daemons Add binaries path (from the ‘server’ installation) in the users PATH

Full installation Client + Server All the condor distribution and the configuration files on the same machine NFS and AFS are not required

Conclusion The INFN Condor Pool has been the first ‘pre-grid’ wide area distributed computing system. It is still used by people out from the ‘big science’.

Conclusion (cont.) BUT: why not Condor on each PC? We did not find the answer in 10 years…