Welcome to HTCondor Week #16 (year 31 of our project)

Slides:



Advertisements
Similar presentations
WP1 Grid Workload Management Massimo Sgaravatto INFN Padova
Advertisements

– n° 1 Review resources access policy, procedures, rules and challenges: The Italian experience and future challenges Antonia Ghiselli INFN-CNAF Workshop.
INFN & Globus activities Massimo Sgaravatto INFN Padova.
Welcome to HTCondor Week #15 (year #30 for our project)
Work Package 1 Installation and Evaluation of the Globus Toolkit Massimo Sgaravatto INFN Padova.
INFN Testbed1 status L. Gaido, A. Ghiselli WP6 meeting CERN, 11 December 2001.
Deployment Team. Deployment –Central Management Team Takes care of the deployment of the release, certificates the sites and manages the grid services.
Evaluation of the Globus Toolkit: Status Roberto Cucchi – INFN Cnaf Antonia Ghiselli – INFN Cnaf Giuseppe Lo Biondo – INFN Milano Francesco Prelz – INFN.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Consistency of Accounting Information with.
Bosco: Enabling Researchers to Expand Their HTC Resources The Bosco Team: Dan Fraser, Jaime Frey, Brooklin Gore, Marco Mambelli, Alain Roy, Todd Tannenbaum,
Why do I build CI and what did it teach me? Miron Livny Wisconsin Institutes for Discovery Madison-Wisconsin.
Building Campus HTC Sharing Infrastructures Derek Weitzel University of Nebraska – Lincoln (Open Science Grid Hat)
Campus High Throughput Computing (HTC) Infrastructures (aka Campus Grids) Dan Fraser OSG Production Coordinator Campus Grids Lead.
Condor-G: A Computation Management Agent for Multi-Institutional Grids James Frey, Todd Tannenbaum, Miron Livny, Ian Foster, Steven Tuecke Reporter: Fu-Jiun.
GRID workload management system and CMS fall production Massimo Sgaravatto INFN Padova.
Status of Globus activities within INFN Massimo Sgaravatto INFN Padova for the INFN Globus group
Workload Management Workpackage Massimo Sgaravatto INFN Padova.
Globus activities within INFN Massimo Sgaravatto INFN Padova for the INFN Globus group
GRID Workload Management System Massimo Sgaravatto INFN Padova.
Milos Kobliha Alejandro Cimadevilla Luis de Alba Parallel Computing Seminar GROUP 12.
Globus activities within INFN Massimo Sgaravatto INFN Padova for the INFN Globus group
Workload Management Massimo Sgaravatto INFN Padova.
First steps implementing a High Throughput workload management system Massimo Sgaravatto INFN Padova
Evaluation of the Globus GRAM Service Massimo Sgaravatto INFN Padova.
INFN Testbed status report L. Gaido WP6 meeting CERN - October 30th, 2002.
CERN IT Department CH-1211 Geneva 23 Switzerland t T0 report WLCG operations Workshop Barcelona, 07/07/2014 Maite Barroso, CERN IT.
Welcome to HTCondor Week #14 (year #29 for our project)
Assessment of Core Services provided to USLHC by OSG.
1 Condor Team 2010 Established 1985.
Miron Livny Computer Sciences Department University of Wisconsin-Madison Harnessing the Capacity of Computational.
Distributed Systems Early Examples. Projects NOW – a Network Of Workstations University of California, Berkely Terminated about 1997 after demonstrating.
Vladimir Litvin, Harvey Newman Caltech CMS Scott Koranda, Bruce Loftis, John Towns NCSA Miron Livny, Peter Couvares, Todd Tannenbaum, Jamie Frey Wisconsin.
LCG Milestones for Deployment, Fabric, & Grid Technology Ian Bird LCG Deployment Area Manager PEB 3-Dec-2002.
The EDG Testbed Deployment Details The European DataGrid Project
Welcome to CW 2007!!!. The Condor Project (Established ‘85) Distributed Computing research performed by.
DATAGRID ConferenceTestbed0 - resources in Italy Luciano Gaido 1 DATAGRID WP6 Testbed0 resources in Italy Amsterdam March,
Grid Workload Management & Condor Massimo Sgaravatto INFN Padova.
DOSAR Workshop, Sao Paulo, Brazil, September 16-17, 2005 LCG Tier 2 and DOSAR Pat Skubic OU.
Miron Livny Center for High Throughput Computing Computer Sciences Department University of Wisconsin-Madison Open Science Grid (OSG)
Grid Workload Management Massimo Sgaravatto INFN Padova.
Condor Team Welcome to Condor Week #10 (year #25 for the project)
DataGrid Workshop Oxford, July 2-5 INFN Testbed status report Luciano Gaido 1 DataGrid Workshop INFN Testbed status report L. Gaido Oxford July,
Report from USA Massimo Sgaravatto INFN Padova. Introduction Workload management system for productions Monte Carlo productions, data reconstructions.
Tarball server (for Condor installation) Site Headnode Worker Nodes Schedd glidein - special purpose Condor pool master DB Panda Server Pilot Factory -
1 Condor Team 2011 Established 1985.
Globus Toolkit Massimo Sgaravatto INFN Padova. Massimo Sgaravatto Introduction Grid Services: LHC regional centres need distributed computing Analyze.
Michael Fenn CPSC 620, Fall 09.  Grid computing is the process of allowing loosely-coupled virtual organizations to share resources over a wide area.
THE INFN GRID PROJECT zScope: Study and develop a general INFN computing infrastructure, based on GRID technologies, to be validated (as first use case)
Open Science Grid (OSG) Introduction for the Ohio Supercomputer Center Open Science Grid (OSG) Introduction for the Ohio Supercomputer Center February.
Miron Livny Center for High Throughput Computing Computer Sciences Department University of Wisconsin-Madison High Throughput Computing (HTC)
Condor on WAN D. Bortolotti - INFN Bologna T. Ferrari - INFN Cnaf A.Ghiselli - INFN Cnaf P.Mazzanti - INFN Bologna F. Prelz - INFN Milano F.Semeria - INFN.
6 march Building the INFN Grid Proposal outline a.ghiselli,l.luminari,m.sgaravatto,c.vistoli INFN Grid meeting, milano.
Miron Livny Computer Sciences Department University of Wisconsin-Madison Condor and (the) Grid (one of.
Grid Workload Management (WP 1) Massimo Sgaravatto INFN Padova.
M. Cristina Vistoli EGEE SA1 Organization Meeting EGEE is proposed as a project funded by the European Union under contract IST Regional Operations.
CMS Experience with the Common Analysis Framework I. Fisk & M. Girone Experience in CMS with the Common Analysis Framework Ian Fisk & Maria Girone 1.
DGAS Distributed Grid Accounting System INFN Workshop /05/1009, Palau Giuseppe Patania Andrea Guarise 6/18/20161.
10-Feb-00 CERN HepCCC Grid Initiative ATLAS meeting – 16 February 2000 Les Robertson CERN/IT.
– n° 1 The Grid Production infrastructure Cristina Vistoli INFN CNAF.
Workload Management Workpackage
(HT)Condor - Past and Future
Driven by the potential of Distributed Computing to advance Scientific Discovery
Outline Expand via Flocking Grid Universe in HTCondor ("Condor-G")
INFN – GRID status and activities
Miron Livny John P. Morgridge Professor of Computer Science
Dean Martin Cadwallader Dean of the Graduate School
Welcome to HTCondor Week #18 (year 33 of our project)
Satisfying the Ever Growing Appetite of HEP for HTC
Welcome to (HT)Condor Week #19 (year 34 of our project)
Welcome to HTCondor Week #17 (year 32 of our project)
Presentation transcript:

Welcome to HTCondor Week #16 (year 31 of our project)

2 CHTC Team 2014

Driven by the potential of Distributed Computing to advance Scientific Discovery

– High Availability and Reliability – High System Performance – Ease of Modular and Incremental Growth – Automatic Load and Resource Sharing – Good Response to Temporary Overloads – Easy Expansion in Capacity and/or Function Claims for “benefits” provided by Distributed Processing Systems P.H. Enslow, “What is a Distributed Data Processing System?” Computer, January 1978

We are driven by Principals (⌐ Hype)

Definitional Criteria for a Distributed Processing System – Multiplicity of resources – Component interconnection – Unity of control – System transparency – Component autonomy P.H. Enslow and T. G. Saponas “”Distributed and Decentralized Control in Fully Distributed Processing Systems” Technical Report, 1981

Services (2) Batch: SLC6 migration: SLC5 CEs decommissioned, no grid job submission to SLC5 –SLC5 WNs final migration ongoing Batch system migration, from LSF to HTCondor –Goals: scalability, dynamism, dispatch rate, query scaling –Replacement candidates: SLURM feels too young HTCondor mature and promising Son of Grid Engine fast, a bit rough –More details of selection process: /22/material/slides/0.pdf /22/material/slides/0.pdf 8

Enslow’s DPS paper My PhD My PhD Condor Deployed

D. H. J Epema, Miron Livny, R. van Dantzig, X. Evers, and Jim Pruyne, "A Worldwide Flock of Condors : Load Sharing among Workstation Clusters" Journal on Future Generations of Computer Systems, Volume 12, Dubna/Berlin Amsterdam 3 Warsaw Worldwide Flock of Condors Madison Delft 10 3 Geneva 30 10

In 1996 I introduced the distinction between High Performance Computing (HPC) and High Throughput Computing (HTC) in a seminar at the NASA Goddard Flight Center in and a month later at the European Laboratory for Particle Physics (CERN). In June of 1997 HPCWire published an interview on High Throughput Computing.

High Throughput Computing is a activity and therefore requires automation FLOPY  (60*60*24*7*52)*FLOPS

Step IV - Think big! › Get access (account(s) + certificate(s)) to Globus managed Grid resources › Submit 599 “To Globus” Condor glide- in jobs to your personal Condor › When all your jobs are done, remove any pending glide-in jobs › Take the rest of the afternoon off...

A “To-Globus” glide-in job will... › … transform itself into a Globus job, › submit itself to Globus managed Grid resource, › be monitored by your personal Condor, › once the Globus job is allocated a resource, it will use a GSIFTP server to fetch Condor agents, start them, and add the resource to your personal Condor, › vacate the resource before it is revoked by the remote scheduler

THE INFN GRID PROJECT zScope: Study and develop a general INFN computing infrastructure, based on GRID technologies, to be validated (as first use case) implementing distributed Regional Center prototypes for LHC expts: ATLAS, CMS, ALICE and, later on, also for other INFN expts (Virgo, Gran Sasso ….) zProject Status: yOutline of proposal submitted to INFN management y3 Year duration yNext meeting with INFN management 18th of February yFeedback documents from LHC expts by end of February (sites, FTEs..) yFinal proposal to INFN by end of March

INFN & “Grid Related Projects” zGlobus tests z“Condor on WAN” as general purpose computing resource z“GRID” working group to analyze viable and useful solutions (LHC computing, Virgo…) yGlobal architecture that allows strategies for the discovery, allocation, reservation and management of resource collection zMONARC project related activities

GARR-B Topology 155 Mbps ATM based Network access points (PoP) main transport nodes TORINO PADOVA BARI PALERMO FIRENZE PAVIA MILANO GENOVA NAPOLI CAGLIARI TRIESTE ROMA PISA L’AQUILA CATANIA BOLOGNA UDINE TRENTO PERUGIA LNF LNGS SASSARI LECCE LNS LNL USA 155Mbps T3 SALERNO COSENZA S.Piero FERRARA PARMA CNAF Central Manager INFN Condor Pool on WAN: checkpoint domains ROMA Default CKPT Cnaf CKPT domain # hosts USA EsNet  machines  machines 6 ckpt servers  25 ckpt servers

The Open Science Grid (OSG) was established in 7/20/2005

The OSG is …

… a consortium of science communities, campuses, resource providers and technology developers that is governed by a council. The members of the OSG consortium are united in a commitment to promote the adoption and to advance the state of the art of distributed high throughput computing (dHTC).

OSG adopted the HTCondor principal of Submit Locally and Run Globally

Today, HTCondor manages daily the execution of more than 600K pilot jobs on OSG that delivers annually more than 800M core hours

Jack of all trades, master of all? HTCondor is used by OSG to: As a site batch system (HTCondor) As pilot job manager (Condor-G) As a site “gate keeper” (HTCondor-CE) As an overlay batch system (HTCondor) As a cloud batch system (HTCondor) As a cross site/VO sharing system (Flocking)

Perspectives on Grid Computing Uwe Schwiegelshohn Rosa M. Badia Marian Bubak Marco Danelutto Schahram Dustdar Fabrizio Gagliardi Alfred Geiger Ladislav Hluchy Dieter Kranzlmüller Erwin Laure Thierry Priol Alexander Reinefeld Michael Resch Andreas Reuter Otto Rienhoff Thomas Rüter Peter Sloot Domenico Talia Klaus Ullmann Ramin Yahyapour Gabriele von Voigt We should not waste our time in redefining terms or key technologies: clusters, Grids, Clouds... What is in a name? Ian Foster recently quoted Miron Livny saying: "I was doing Cloud computing way before people called it Grid computing", referring to the ground breaking Condor technology. It is the Grid scientific paradigm that counts!

Thank you for building such a wonderful HTC community