High End Computing at Cardiff University Focus on Campus Grids James Osborne.

Slides:



Advertisements
Similar presentations
Cluster Computing at IQSS Alex Storer, Research Technology Consultant.
Advertisements

1 Concepts of Condor and Condor-G Guy Warner. 2 Harvesting CPU time Teaching labs. + Researchers Often-idle processors!! Analyses constrained by CPU time!
Setting up of condor scheduler on computing cluster Raman Sehgal NPD-BARC.
More HTCondor 2014 OSG User School, Monday, Lecture 2 Greg Thain University of Wisconsin-Madison.
Condor and GridShell How to Execute 1 Million Jobs on the Teragrid Jeffrey P. Gardner - PSC Edward Walker - TACC Miron Livney - U. Wisconsin Todd Tannenbaum.
Dr. David Wallom Use of Condor in our Campus Grid and the University September 2004.
High Throughput Computing with Condor at Notre Dame Douglas Thain 30 April 2009.
First steps implementing a High Throughput workload management system Massimo Sgaravatto INFN Padova
Jaeyoung Yoon Computer Sciences Department University of Wisconsin-Madison Virtual Machine Universe in.
Jaeyoung Yoon Computer Sciences Department University of Wisconsin-Madison Virtual Machines in Condor.
Introduction to Condor DMD/DFS J.Knudstrup December 2005.
Utilizing Condor and HTC to address archiving online courses at Clemson on a weekly basis Sam Hoover 1 Project Blackbird Computing,
Cheap cycles from the desktop to the dedicated cluster: combining opportunistic and dedicated scheduling with Condor Derek Wright Computer Sciences Department.
Miron Livny Computer Sciences Department University of Wisconsin-Madison Harnessing the Capacity of Computational.
Distributed Systems Early Examples. Projects NOW – a Network Of Workstations University of California, Berkely Terminated about 1997 after demonstrating.
Parallel Computing The Bad News –Hardware is not getting faster fast enough –Too many architectures –Existing architectures are too specific –Programs.
National Alliance for Medical Image Computing Grid Computing with BatchMake Julien Jomier Kitware Inc.
High Throughput Computing with Condor at Purdue XSEDE ECSS Monthly Symposium Condor.
Track 1: Cluster and Grid Computing NBCR Summer Institute Session 2.2: Cluster and Grid Computing: Case studies Condor introduction August 9, 2006 Nadya.
Prof. Heon Y. Yeom Distributed Computing Systems Lab. Seoul National University FT-MPICH : Providing fault tolerance for MPI parallel applications.
Personal Cloud Controller (PCC) Yuan Luo 1, Shava Smallen 2, Beth Plale 1, Philip Papadopoulos 2 1 School of Informatics and Computing, Indiana University.
An Introduction to High-Throughput Computing Rob Quick OSG Operations Officer Indiana University Some Content Contributed by the University of Wisconsin.
An Introduction to High-Throughput Computing Monday morning, 9:15am Alain Roy OSG Software Coordinator University of Wisconsin-Madison.
Condor Tugba Taskaya-Temizel 6 March What is Condor Technology? Condor is a high-throughput distributed batch computing system that provides facilities.
Progress Report Barnett Chiu Glidein Code Updates and Tests (1) Major modifications to condor_glidein code are as follows: 1. Command Options:
ISG We build general capability Introduction to Olympus Shawn T. Brown, PhD ISG MISSION 2.0 Lead Director of Public Health Applications Pittsburgh Supercomputing.
Installing and Managing a Large Condor Pool Derek Wright Computer Sciences Department University of Wisconsin-Madison
Grid Computing I CONDOR.
Compiled Matlab on Condor: a recipe 30 th October 2007 Clare Giacomantonio.
Condor Birdbath Web Service interface to Condor
GridShell + Condor How to Execute 1 Million Jobs on the Teragrid Jeffrey P. Gardner Edward Walker Miron Livney Todd Tannenbaum The Condor Development Team.
Experiences with a HTCondor pool: Prepare to be underwhelmed C. J. Lingwood, Lancaster University CCB (The Condor Connection Broker) – Dan Bradley
Part 6: (Local) Condor A: What is Condor? B: Using (Local) Condor C: Laboratory: Condor.
Intermediate Condor Rob Quick Open Science Grid HTC - Indiana University.
Rochester Institute of Technology Job Submission Andrew Pangborn & Myles Maxfield 10/19/2015Service Oriented Cyberinfrastructure Lab,
Resource Brokering in the PROGRESS Project Juliusz Pukacki Grid Resource Management Workshop, October 2003.
Condor: High-throughput Computing From Clusters to Grid Computing P. Kacsuk – M. Livny MTA SYTAKI – Univ. of Wisconsin-Madison
July 28' 2011INDIA-CMS_meeting_BARC1 Tier-3 TIFR Makrand Siddhabhatti DHEP, TIFR Mumbai July 291INDIA-CMS_meeting_BARC.
NGS Innovation Forum, Manchester4 th November 2008 Condor and the NGS John Kewley NGS Support Centre Manager.
Report from USA Massimo Sgaravatto INFN Padova. Introduction Workload management system for productions Monte Carlo productions, data reconstructions.
Dr Hugh Beedie Dr James Osborne. Led by Information Services directorate –Not by the School of Computer Science.
July 11-15, 2005Lecture3: Grid Job Management1 Grid Compute Resources and Job Management.
Review of Condor,SGE,LSF,PBS
Condor Project Computer Sciences Department University of Wisconsin-Madison Grids and Condor Barcelona,
Campus grids: e-Infrastructure within a University Mike Mineter National e-Science Centre 14 February 2006.
Pilot Factory using Schedd Glidein Barnett Chiu BNL
February 22-23, Washington D.C. SURA ENDyne Software for Dynamics of Electrons and Nuclei in Molecules. Developed by Dr. Yngve Öhrn and Dr. Erik Deumens,
how Shibboleth can work with job schedulers to create grids to support everyone Exposing Computational Resources Across Administrative Domains H. David.
Greg Thain Computer Sciences Department University of Wisconsin-Madison Configuring Quill Condor Week.
AMH001 (acmse03.ppt - 03/7/03) REMOTE++: A Script for Automatic Remote Distribution of Programs on Windows Computers Ashley Hopkins Department of Computer.
An Introduction to High-Throughput Computing With Condor Tuesday morning, 9am Zach Miller University of Wisconsin-Madison.
Weekly Work Dates:2010 8/20~8/25 Subject:Condor C.Y Hsieh.
ISG We build general capability Introduction to Olympus Shawn T. Brown, PhD ISG MISSION 2.0 Lead Director of Public Health Applications Pittsburgh Supercomputing.
Grid Compute Resources and Job Management. 2 Grid middleware - “glues” all pieces together Offers services that couple users with remote resources through.
Miron Livny Computer Sciences Department University of Wisconsin-Madison Condor and (the) Grid (one of.
JSS Job Submission Service Massimo Sgaravatto INFN Padova.
Condor Project Computer Sciences Department University of Wisconsin-Madison Condor Introduction.
Background Computer System Architectures Computer System Software.
Profiling Applications to Choose the Right Computing Infrastructure plus Batch Management with HTCondor Kyle Gross – Operations Support.
Harnessing the Power of Condor for Human Genetics
High Availability in HTCondor
Semiconductor Manufacturing (and other stuff) with Condor
Basic Grid Projects – Condor (Part I)
HTCondor Training Florentia Protopsalti IT-CM-IS 1/16/2019.
The Condor JobRouter.
Condor: Firewall Mirroring
Condor-G Making Condor Grid Enabled
Troubleshooting Your Jobs
PU. Setting up parallel universe in your pool and when (not
Presentation transcript:

High End Computing at Cardiff University Focus on Campus Grids James Osborne

Contents  High End Computing Spectrum  Facilities at Cardiff  Condor at Cardiff  Success Stories  High End Computing Futures  Questions

High End Computing Spectrum

The HEC Spectrum HPC Tightly Coupled Supercomputers NUMA Machines £ Million+ HTC Loosely Coupled Small Clusters Campus Grids £ Thousand £ H Thousand Large Clusters SMP £ H Thousand £ Million

The HPC End HPC Tightly Coupled Supercomputers Bluegene L 131,072 CPUs

The HTC End HTC Loosely Coupled Campus Grids 600+ CPUs

Facilities at Cardiff

Facilities at Cardiff - Helix Large Clusters Helix 200 CPUs Owned by PHARM, CHEMY, EARTH, BIOSI

Facilities at Cardiff - SGI Small Clusters SGI Origin CPUs Owned by WeSC

Facilities at Cardiff - Condor Campus Grids 600+ CPUs Owned by insrv

Condor at Cardiff

What is Condor ?  Condor is a software system that creates a High-Throughput Computing (HTC) environment  Condor effectively utilizes the computing power of workstations that communicate over a network  Condor's power comes from the ability to effectively harness resources under distributed ownership

What is a Condor Pool ?  A pool is a collection of workstations that communicate over a network Central Manager master collector negotiator schedd startd = ClassAd Communication Pathway = Process Spawned Submit-Only master schedd Execute-Only master startd Execute-Only master startd

What is a Condor Job ?  A command line windows executable  All files in a self-contained directory structure  Condor runs jobs in a sandbox..\execute\...  Condor runs jobs as user condor-reuse-vm1  One or more input files  One or more output files  A submit script  One or more logs – useful for debugging

What Goes In A Submit Script ?  Running myprog 100 times universe = vanilla executable = myprog.exe input = myin.$(PROCESS) output = myout.$(PROCESS) error = myerr.$(PROCESS) queue 100

What Else Can Go In ? root_dir = c:\mydirectory transfer_files = ALWAYS transfer_input_files = $(ROOT_DIR)\afile.txt transfer_output_files = $(ROOT_DIR)\afile.txt log = mylog.$(PROCESS) notification = NEVER | ERROR arguments = -arg1 -arg2

What Else Can Go In ? requirements = OpSys == “WINNT51” Machine == “hostname.cf.ac.uk”

How Do I Submit A Job ?  In the first instance by sending all your files to to allow us to tailor your jobs to our  In time by seeking permission to submit your own jobs to to allow us to enable your workstation as a submit  Currently requires IP address change

How Do I Submit A Job ?  Submitting your job condor_submit myscript.sub  Checking your job’s progress condor_q  Checking the pool condor_status

Terms of Use  Any local researcher can use the campus grid on the proviso that they…  write a short summary of their research that we can use to publicise their use of the campus grid  provide references to journal articles and conference proceedings containing appropriate acknowledgements

Success Stories

Prof Tim Wess  OPTOM  X-Ray Diffraction  Determine shape of molecules  Time on a single workstation = 2-3 Days  Time on the campus grid = 2-3 Hours  Speed-up factor of ~20 Chair of Non-Crystalline Diffraction Community & Chair of CCP13 for Non-Crystalline Materials

Prof Tim Wess  “This capability provides the final link in the chain that Cardiff has established to solve macromolecular structures”  “Our involvement with synchrotron sources such as DIAMOND … and the residence of CCP 13 … ensures that we are well placed to be in the vanguard of structure determination” Chair of Non-Crystalline Diffraction Community & Chair of CCP13 for Non-Crystalline Materials

Soyeon Lee  CARBS  Montecarlo Simulation  20,000 parameters for 90 different models  Time on a single workstation = 42 Days  Time on the campus grid = 2 Days  Speed-up factor of ~20 Research Student

Dr Kevin Ashelford  BIOSI  Distributed Search  Identify corrupt records in a DNA database  Time on a single workstation = 2.4 Years  Time on the campus grid = 2.6 Weeks  Speed-up factor of ~50 Research Fellow

Dr Kevin Ashelford  “This is a significant contribution to microbial research and will hopefully be the required impetus for the world-wide research community to improve current methods”

High End Computing Futures

The HEC Spectrum HPC Tightly Coupled Supercomputers NUMA Machines £ Million+ HTC Loosely Coupled Small Clusters Campus Grids £ Thousand £ H Thousand Large Clusters SMP £ H Thousand £ Million SGI Origin 300 Helix Cardiff

The HEC Spectrum HPC Tightly Coupled Supercomputers NUMA Machines £ Million+ HTC Loosely Coupled Small Clusters Campus Grids £ Thousand £ H Thousand Large Clusters SMP £ H Thousand £ Million SGI Origin 300 SRIF 3 Helix Cardiff

Questions ?