NGS Innovation Forum, Manchester4 th November 2008 Condor and the NGS John Kewley NGS Support Centre Manager.

Slides:



Advertisements
Similar presentations
Condor Project Computer Sciences Department University of Wisconsin-Madison Introduction Condor.
Advertisements

NGS computation services: API's,
NGS computation services: APIs and.
Data services on the NGS.
Current methods for negotiating firewalls for the Condor ® system Bruce Beckles (University of Cambridge Computing Service) Se-Chang Son (University of.
3rd Campus Grid SIG Meeting. Agenda Welcome OMII Requirements document Grid Data Group HTC Workshop Research Computing SIG? AOB Next meeting (AG)
John Kewley STFC e-Science Centre Accessing the Grid from DL 8 th January 2008 Accessing the Grid from DL John Kewley Grid Technology Group E-Science Centre.
1 Concepts of Condor and Condor-G Guy Warner. 2 Harvesting CPU time Teaching labs. + Researchers Often-idle processors!! Analyses constrained by CPU time!
Setting up of condor scheduler on computing cluster Raman Sehgal NPD-BARC.
NSF Site Visit HYDRA Using Windows Desktop Systems in Distributed Parallel Computing.
Condor and GridShell How to Execute 1 Million Jobs on the Teragrid Jeffrey P. Gardner - PSC Edward Walker - TACC Miron Livney - U. Wisconsin Todd Tannenbaum.
Dr. David Wallom Use of Condor in our Campus Grid and the University September 2004.
Windows HPC Server 2008 Presented by Frank Chism Windows and Condor: Co-Existence and Interoperation.
Universität Dortmund Robotics Research Institute Information Technology Section Grid Metaschedulers An Overview and Up-to-date Solutions Christian.
Workload Management Workpackage Massimo Sgaravatto INFN Padova.
OxGrid, A Campus Grid for the University of Oxford Dr. David Wallom.
Grid Services at NERSC Shreyas Cholia Open Software and Programming Group, NERSC NERSC User Group Meeting September 17, 2007.
Workload Management Massimo Sgaravatto INFN Padova.
First steps implementing a High Throughput workload management system Massimo Sgaravatto INFN Padova
17/09/2004 John Kewley Grid Technology Group Introduction to Condor.
Miron Livny Computer Sciences Department University of Wisconsin-Madison Harnessing the Capacity of Computational.
Distributed Systems Early Examples. Projects NOW – a Network Of Workstations University of California, Berkely Terminated about 1997 after demonstrating.
Parallel Computing The Bad News –Hardware is not getting faster fast enough –Too many architectures –Existing architectures are too specific –Programs.
SICSA student induction day, 2009Slide 1 Social Simulation Tutorial Session 6: Introduction to grids and cloud computing International Symposium on Grid.
High Throughput Computing with Condor at Purdue XSEDE ECSS Monthly Symposium Condor.
December 8 & 9, 2005, Austin, TX SURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide Configuring Resources for the Grid Jerry Perez.
Track 1: Cluster and Grid Computing NBCR Summer Institute Session 2.2: Cluster and Grid Computing: Case studies Condor introduction August 9, 2006 Nadya.
Research Achievements Kenji Kaneda. Agenda Research background and goal Research background and goal Overview of my research achievements Overview of.
The Glidein Service Gideon Juve What are glideins? A technique for creating temporary, user- controlled Condor pools using resources from.
Web: OMII-UK Campus Grid Toolkit NW-GRID Campus Grids Workshop 31 st October 2007 University of Liverpool Tim Parkinson.
Workload Management WP Status and next steps Massimo Sgaravatto INFN Padova.
John Kewley e-Science Centre CCLRC Daresbury Laboratory 28 th June nd European Condor Week Milano Heterogeneous Pools John Kewley
Job Submission Condor, Globus, Java CoG Kit Young Suk Moon.
Grid Computing I CONDOR.
Computer and Automation Research Institute Hungarian Academy of Sciences Automatic checkpoint of CONDOR-PVM applications by P-GRADE Jozsef Kovacs, Peter.
CHEP 2003Stefan Stonjek1 Physics with SAM-Grid Stefan Stonjek University of Oxford CHEP th March 2003 San Diego.
Grid tool integration within the eMinerals project Mark Calleja.
HYDRA: Using Windows Desktop Systems in Distributed Parallel Computing Arvind Gopu, Douglas Grover, David Hart, Richard Repasky, Joseph Rinkovsky, Steve.
Ian C. Smith ULGrid – Experiments in providing a campus grid.
Grid job submission using HTCondor Andrew Lahiff.
HYDRA: Using Windows Desktop Systems in Distributed Parallel Computing Arvind Gopu, Douglas Grover, David Hart, Richard Repasky, Joseph Rinkovsky, Steve.
BOF: Megajobs Gracie: Grid Resource Virtualization and Customization Infrastructure How to execute hundreds of thousands tasks concurrently on distributed.
Condor: High-throughput Computing From Clusters to Grid Computing P. Kacsuk – M. Livny MTA SYTAKI – Univ. of Wisconsin-Madison
Grid Compute Resources and Job Management. 2 Local Resource Managers (LRM)‏ Compute resources have a local resource manager (LRM) that controls:  Who.
Tool Integration with Data and Computation Grid GWE - “Grid Wizard Enterprise”
Todd Tannenbaum Computer Sciences Department University of Wisconsin-Madison Condor RoadMap.
Derek Wright Computer Sciences Department University of Wisconsin-Madison MPI Scheduling in Condor: An.
NW-GRID Campus Grids Workshop Liverpool31 Oct 2007 NW-GRID Campus Grids Workshop Liverpool31 Oct 2007 Moving Beyond Campus Grids Steven Young Oxford NGS.
July 11-15, 2005Lecture3: Grid Job Management1 Grid Compute Resources and Job Management.
What is SAM-Grid? Job Handling Data Handling Monitoring and Information.
Campus grids: e-Infrastructure within a University Mike Mineter National e-Science Centre 14 February 2006.
Pilot Factory using Schedd Glidein Barnett Chiu BNL
1 e-Science AHM st Aug – 3 rd Sept 2004 Nottingham Distributed Storage management using SRB on UK National Grid Service Manandhar A, Haines K,
HEP SYSMAN 23 May 2007 National Grid Service Steven Young National Grid Service Manager Oxford e-Research Centre University of Oxford.
WSV207. Cluster Public Cloud Servers On-Premises Servers Desktop Workstations Application Logic.
GRID activities in Wuppertal D0RACE Workshop Fermilab 02/14/2002 Christian Schmitt Wuppertal University Taking advantage of GRID software now.
Job Submission with Globus, Condor, and Condor-G Selim Kalayci Florida International University 07/21/2009 Note: Slides are compiled from various TeraGrid.
Tool Integration with Data and Computation Grid “Grid Wizard 2”
22 nd Oct 2008Euro Condor Week 2008 Barcelona 1 Condor Gotchas III John Kewley STFC Daresbury Laboratory
Grid Compute Resources and Job Management. 2 Grid middleware - “glues” all pieces together Offers services that couple users with remote resources through.
1 Porting applications to the NGS, using the P-GRADE portal and GEMLCA Peter Kacsuk MTA SZTAKI Hungarian Academy of Sciences Centre for.
NGS computation services: APIs and.
John Kewley e-Science Centre CCLRC Daresbury Laboratory 15 th March 2005 Paradyn / Condor Week Madison, WI Caging the CCLRC Compute Zoo (Activities at.
Campus grids: e-Infrastructure within a University Mike Mineter National e-Science Centre 22 February 2006.
Campus grids: e-Infrastructure within a University Mike Mineter National e-Science Centre
Job submission overview Marco Mambelli – August OSG Summer Workshop TTU - Lubbock, TX THE UNIVERSITY OF CHICAGO.
Condor on Dedicated Clusters Peter Couvares and Derek Wright Computer Sciences Department University of Wisconsin-Madison
The National Grid Service
Basic Grid Projects – Condor (Part I)
The National Grid Service Mike Mineter NeSC-TOE
Presentation transcript:

NGS Innovation Forum, Manchester4 th November 2008 Condor and the NGS John Kewley NGS Support Centre Manager

4 th November 2008NGS Innovation Forum, Manchester Outline What is High Throughput Computing What is Condor? Condor and the NGS

4 th November 2008NGS Innovation Forum, Manchester HPC vs. HTC HPC (High Performance Computing) Large amounts of [simultaneous] computing power for comparatively short periods of time HTC (High Throughput Computing) Large amounts of computing over significantly longer periods, not necessarily all at the same time

4 th November 2008NGS Innovation Forum, Manchester Various job types Parallel Serial Sequential Master-worker Embarrassingly Parallel Parameter Sweep Parameter Search Monte Carlo Parameter Studies MPI PVM OpenMP

4 th November 2008NGS Innovation Forum, Manchester Terminology Parallel Tightly-coupled processes Need synchronisation Information sharing –Message passing –Shared memory 1 process fails, whole job fails Single large homogenous resource Processors used simultaneously Independent Unordered (so not serial/sequential) Nothing embarrassing about it No communication once job starts Might not need all results Could run on different machines with different operating systems.

4 th November 2008NGS Innovation Forum, Manchester A job submission framework which utilises spare computing power Works within a heterogeneous computer network Desktop PCs, Linux workstations, servers, clusters, teaching lab resources can all be included in the Condor pool Uses matchmaking to connect jobs with resources Supports High Throughput Computing (HTC) Developed over the past 20 years at the University of Wisconsin in Madison What is Condor?

4 th November 2008NGS Innovation Forum, Manchester Automatic resubmission when jobs fail Ability to cluster groups of jobs Checkpointing / migration DAGMan - Directed Acyclic Graph / workflow manager Integration with Grid resources, especially through Condor-G Staging and retrieval of data Glide-in – dynamically add Grid worker nodes to your Condor pool Useful Features

4 th November 2008NGS Innovation Forum, Manchester The NGS and Cardiff NGS Partner site since April 05 First resource was a 32 processor SGI cluster (Apr 05) Second resource was the Condor pool (Jun 07) –Over 1000 Windows XP workstations –Mixture of P4s (80%) and C2Ds (20%) –Capped at 200 jobs running concurrently –Used by 10 different numbered accounts –See Central Manager Submit Nodes Execute Nodes

4 th November 2008NGS Innovation Forum, Manchester Other Condor on the NGS Bristol: ~50 WindowsXP in a Condor pool fronted by a Linux server Reading: ~400 Linux (CoLinux under WindowsXP)

4 th November 2008NGS Innovation Forum, Manchester What is Condor-G? Submit Node Queue Job 1 Job 2 … Remote Site Head Node (Globus) Internet Execute Node Firewall Batch System condor_submit …

4 th November 2008NGS Innovation Forum, Manchester OxGrid: Overview Oxford e-Research Centre Resource Broker/ Login (Condor) Storage (SRB) Microsoft Cluster National Grid Service Cluster BDII, VOMS, SSO CA... Super- computing centre Other University/Institution Condor pool Other University/Institution National Grid Service Resource Department/College Condor pool Departmental Clusters

4 th November 2008NGS Innovation Forum, Manchester NW-GRID cluster (ulgbc3) Condor-G submit host CSD-Physics cluster (ulgbc2) CSD-Physics cluster (ulgbc2) NW-GRID/POL cluster (ulgp4) Condor-G portal CSD AMD cluster (ulgbc1) Condor-G central manager MyProxy server User login Condor ClassAds Globus file staging

4 th November 2008NGS Innovation Forum, Manchester University of Manchester, Research Computing Services 100 cores (an additional 400 in 2 nd pool) Condor used as backfill for the SGE queues IP-tunnelling used to enable connection to the NW- Grid backend nodes from Condor (rather than the provided GCB, the Generic Connection Broker)

4 th November 2008NGS Innovation Forum, Manchester Novel Architecture !? Condor itself is not that new Some NGS users request Windows resources, but most previous NGS nodes used PBS, LSF or SGE on Linux Campus Grids are being developed to harness all available processing power (incl. teaching pools, servers and clusters) Condor can help NGS provide access to Windows resources

4 th November 2008NGS Innovation Forum, Manchester Windows on the NGS Many users are looking for Windows resources on which to run their computations. As well as the resources provided by Cardiff, Bristol and Reading, Southampton have made available a group of 100 processors running under the Windows Compute Cluster Server

4 th November 2008NGS Innovation Forum, Manchester Other work Jean-Alain Grunchec of the University of Edinburgh is trying Condor Glidein to add NGS resources to his condor pool The e-Minerals project utilised a condor submission mechanism to submit jobs to both local Condor pools and Grid resources such as NGS and NW-Grid Both the EGEE resource broker (being trialled by NGS) and Gridway metascheduler are based on Condor technologies STFC Daresbury Laboratory (another NW-Grid site) in collaboration with Cockcroft Centre in setting up a Campus Grid using NW-Grid and Condor resources

4 th November 2008NGS Innovation Forum, Manchester Summary Condor pools can be part of the NGS Condor can be used in many ways with the NGS Being combined with NGS in many Campus Grids Condor can help NGS provide access to Windows resources Information on NGS resources can be found on

4 th November 2008NGS Innovation Forum, Manchester Acknowledgements Some slides are based on material from the University of Wisconsin-Madison Condor team. Some of the slides describing the UK university condor work are based on ones they produced themselves (I hope nothing was "lost in translation" ?