Leveraging HTC for UK eScience with Very Large Condor Pools: Demand for transforming untapped power into results. Paul Wilson 1, John Brodholt 1, and Wolfgang.

Slides:



Advertisements
Similar presentations
Condor use in Department of Computing, Imperial College Stephen M c Gough, David McBride London e-Science Centre.
Advertisements

CamGrid Mark Calleja Cambridge eScience Centre. What is it? A number of like minded groups and departments (10), each running their own Condor pool(s),
UK Campus Grid Special Interest Group Dr. David Wallom University of Oxford.
The UCL Condor Pool Experience John Brodholt 1, Paul Wilson 3, Wolfgang Emmerich 2 and Clovis Chapman Department of Earth Sciences, University College.
A Grid approach to Environmental Molecular Simulations: Deployment and use of Condor within the eMinerals Mini Grid. Paul Wilson 1, Mark Calleja 2, John.
Towards a Virtual European Supercomputing Infrastructure Vision & issues Sanzio Bassini
Building Campus HTC Sharing Infrastructures Derek Weitzel University of Nebraska – Lincoln (Open Science Grid Hat)
Dr. David Wallom Use of Condor in our Campus Grid and the University September 2004.
John Kewley e-Science Centre GIS and Grid Computing Workshop 13 th September 2005, Leeds Grid Middleware and GROWL John Kewley
Computational Steering on the GRID Using a 3D model to Interact with a Large Scale Distributed Simulation in Real-Time Michael.
MASPLAS ’02 Creating A Virtual Computing Facility Ravi Patchigolla Chris Clarke Lu Marino 8th Annual Mid-Atlantic Student Workshop On Programming Languages.
MOSES: Modelling and Simulation for e-Social Science Mark Birkin, Martin Clarke, Phil Rees School of Geography, University of Leeds Haibo Chen, Institute.
Distributed Systems Early Examples. Projects NOW – a Network Of Workstations University of California, Berkely Terminated about 1997 after demonstrating.
SICSA student induction day, 2009Slide 1 Social Simulation Tutorial Session 6: Introduction to grids and cloud computing International Symposium on Grid.
Open Science Grid For CI-Days Internet2: Fall Member Meeting, 2007 John McGee – OSG Engagement Manager Renaissance Computing Institute.
PCGRID ‘08 Workshop, Miami, FL April 18, 2008 Preston Smith Implementing an Industrial-Strength Academic Cyberinfrastructure at Purdue University.
John Kewley e-Science Centre CCLRC Daresbury Laboratory 28 th June nd European Condor Week Milano Heterogeneous Pools John Kewley
Campus Grids Report OSG Area Coordinator’s Meeting Dec 15, 2010 Dan Fraser (Derek Weitzel, Brian Bockelman)
Open Science Grid For CI-Days Elizabeth City State University Jan-2008 John McGee – OSG Engagement Manager Manager, Cyberinfrastructure.
20 October 2006Workflow Optimization in Distributed Environments Dynamic Workflow Management Using Performance Data David W. Walker, Yan Huang, Omer F.
Grid Workload Management & Condor Massimo Sgaravatto INFN Padova.
Using ICENI to run parameter sweep applications across multiple Grid resources Murtaza Gulamali Stephen McGough, Steven Newhouse, John Darlington London.
Condor Birdbath Web Service interface to Condor
Miron Livny Computer Sciences Department University of Wisconsin-Madison Welcome and Condor Project Overview.
Edinburgh Investment in e-Science Infrastructure Dr Arthur Trew.
1 Establishing an inter-organisational OGSA Grid: Lessons Learned Wolfgang Emmerich London Software Systems, Dept. of Computer Science University College.
GGF-16 Athens Production Grid Computing in the UK Neil Geddes CCLRC Director, e-Science.
Condor Team Welcome to Condor Week #10 (year #25 for the project)
Neil Geddes GridPP-10, June 2004 UK e-Science Grid Dr Neil Geddes CCLRC Head of e-Science Director of the UK Grid Operations Support Centre.
Cliff Addison University of Liverpool Campus Grids Workshop October 2007 Setting the scene Cliff Addison.
Experiences with the Globus Toolkit on AIX and deploying the Large Scale Air Pollution Model as a grid service Ashish Thandavan Advanced Computing and.
NGS Innovation Forum, Manchester4 th November 2008 Condor and the NGS John Kewley NGS Support Centre Manager.
Grid Execution Management for Legacy Code Applications Grid Enabling Legacy Code Applications Tamas Kiss Centre for Parallel.
GridPP Deployment & Operations GridPP has built a Computing Grid of more than 5,000 CPUs, with equipment based at many of the particle physics centres.
Towards a Virtual Institute for Research into eGovernment Prof. Zahir Irani & Dr Tony Elliman Information Systems Evaluation and Integration Group School.
Derek Wright Computer Sciences Department University of Wisconsin-Madison MPI Scheduling in Condor: An.
“Grids and eScience” Mark Hayes Technical Director - Cambridge eScience Centre GEFD Summer School 2003.
GridPP Presentation to AstroGrid 13 December 2001 Steve Lloyd Queen Mary University of London.
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
The eMinerals minigrid and the national grid service: A user’s perspective NGS169 (A. Marmier)
Applications & a Reality Check Mark Hayes. Applications on the UK Grid Ion diffusion through radiation damaged crystal structures (Mark Calleja, Mark.
EU-IndiaGrid (RI ) is funded by the European Commission under the Research Infrastructure Programme WP5 Application Support Marco.
Building the e-Minerals Minigrid Rik Tyer, Lisa Blanshard, Kerstin Kleese (Data Management Group) Rob Allan, Andrew Richards (Grid Technology Group)
Grid User Interface for ATLAS & LHCb A more recent UK mini production used input data stored on RAL’s tape server, the requirements in JDL and the IC Resource.
Campus grids: e-Infrastructure within a University Mike Mineter National e-Science Centre 14 February 2006.
Scheduling Architecture and Algorithms within ICENI Laurie Young, Stephen McGough, Steven Newhouse, John Darlington London e-Science Centre Department.
GridLab Resource Management System (GRMS) Jarek Nabrzyski GridLab Project Coordinator Poznań Supercomputing and.
1 e-Science AHM st Aug – 3 rd Sept 2004 Nottingham Distributed Storage management using SRB on UK National Grid Service Manandhar A, Haines K,
Utility Computing: Security & Trust Issues Dr Steven Newhouse Technical Director London e-Science Centre Department of Computing, Imperial College London.
1 Grid Activity Summary » Grid Testbed » CFD Application » Virtualization » Information Grid » Grid CA.
Condor Services for the Global Grid: Interoperability between OGSA and Condor Clovis Chapman 1, Paul Wilson 2, Todd Tannenbaum 3, Matthew Farrellee 3,
3/12/2013Computer Engg, IIT(BHU)1 CLOUD COMPUTING-1.
AHM04: Sep 2004 Nottingham CCLRC e-Science Centre eMinerals: Environment from the Molecular Level Managing simulation data Lisa Blanshard e- Science Data.
Toward a common data and command representation for quantum chemistry Malcolm Atkinson Director 5 th April 2004.
Università di Perugia Enabling Grids for E-sciencE Status of and requirements for Computational Chemistry NA4 – SA1 Meeting – 6 th April.
Grid Remote Execution of Large Climate Models (NERC Cluster Grid) Dan Bretherton, Jon Blower and Keith Haines Reading e-Science Centre
Miron Livny Computer Sciences Department University of Wisconsin-Madison Condor and (the) Grid (one of.
The National Grid Service Mike Mineter.
Holding slide prior to starting show. Lessons Learned from the GECEM Portal David Walker Cardiff University
© Geodise Project, University of Southampton, Workflow Support for Advanced Grid-Enabled Computing Fenglian Xu *, M.
Intersecting UK Grid & EGEE/LCG/GridPP Activities Applications & Requirements Mark Hayes, Technical Director, CeSC.
Collaborative Tools for the Grid V.N Alexandrov S. Mehmood Hasan.
Campus grids: e-Infrastructure within a University Mike Mineter National e-Science Centre 22 February 2006.
Building on virtualization capabilities for ExTENCI Carol Song and Preston Smith Rosen Center for Advanced Computing Purdue University ExTENCI Kickoff.
18 May 2006CCGrid2006 Dynamic Workflow Management Using Performance Data Lican Huang, David W. Walker, Yan Huang, and Omer F. Rana Cardiff School of Computer.
ChinaGrid: National Education and Research Infrastructure Hai Jin Huazhong University of Science and Technology
Reading e-Science Centre
UK Grid: Moving from Research to Production
ELIXIR: Potential areas for collaboration with e-Infrastructures
A. Rama Bharathi Regd. No: 08931F0040 III M.C.A
Presentation transcript:

Leveraging HTC for UK eScience with Very Large Condor Pools: Demand for transforming untapped power into results. Paul Wilson 1, John Brodholt 1, and Wolfgang Emmerich Department of Earth Sciences, University College London, Gower Street, London WC1E 6BT, UK 2. Department of Computer Science, University College London, Gower Street, London WC1E 6BT, UK Environment from the Molecular Level A NERC eScience testbed project

This talk: Part 1 1. The eMinerals problem area 2. The Computational job-types this generates 3. How Condor can help to sort these jobs out 4. What we gain from Condor and where to go next 5. UK Institutional Condor programmes and the road ahead. This talk: Part 2 1. Condor’s additional features and how we use them. 2. The eMinerals mini grid. 3. Conclusion. Environment from the Molecular Level A NERC eScience testbed project

THE PROBLEM AREA. 1. Simulation of pollutants in the environment Binding of heavy metals and organic molecules in soils. 2. Studies of materials for long-term nuclear waste encapsulation Radiocactive waste leaching through ceramic storage media. 3. Studies of weathering and scaling Mineral/water interface simulations, e.g oil well scaling. Codes relying on empirical descriptions of interatomic forces: DL-POLY - molecular dynamics simulations GULP – lattice energy/lattice dynamics simulations METADISE – interface simulations Codes using a quantum mechanical description of interactions between atoms: CRYSTAL – Hartree-Fock implementation. SIESTA – Density Function Theory, numerical basis sets to describe electronic wave function. ABINIT - DFT, plane wave descriptions of electronic wave functions Environment from the Molecular Level A NERC eScience testbed project WHAT TYPE OF JOBS WILL THESE PROBLEMS BE MANIFESTED AS?

2 TYPES OF JOB: 1) High to mid performance: Requiring powerful resources, potential process intercommunication, long execution times, CPU and memory intensive. 2) Low performance/high throughput: Requiring access to many hundreds or thousands of PC-level CPU’s. No process intercommunication, short execution times, low memory usage. WHERE CAN WE GET THE POWER? TYPE 1 JOB: Masses of UK HPC resources around- it seems that UK grid resources are largely HPC! TYPE 2 JOB: ???????? Environment from the Molecular Level A NERC eScience testbed project THERE HAS GOT TO BE A BETTER WAY TO OPTIMISE TYPE 2 JOBS!

…AND THERE IS: WE USE WHAT’S ALREADY THERE: 930 win2K PC’s (1GHz P3, 256/512Mb Ram, 1Gbit e-net.) clustered in 30 student cluster rooms across every department on the UCL campus, with the potential to scale up to ~3000 PC’s. These machines waste 95% of their CPU cycles 24/7: A MASSIVE UNTAPPED RESOURCE- A COUP FOR eMINERALS! This is where Condor enters the scene. THE ONLY AVAILABLE FREE, OFF-THE-SHELF RESOURCE MANAGEMENT AND JOB BROKER FOR WINDOWS: Install Condor on our clusters, and we harness 95% of the power of 930+ machines 24 hours a day, without spending any money. Environment from the Molecular Level A NERC eScience testbed project Is it really this simple?

YES! It has surpassed all expectations, with diverse current use and ever-rising demand smiley happy people ( our current group of users, and increasing monthly.): eMinerals project, eMaterials project, UCL Computer Science, UCL medical school, University of Marburg, Universities of Bath and Cambridge, Birkbeck College, The Royal Institution… - Over 1000,000 hours of work completed in 6 months (105 CPU-years equivalent and counting) - Codes migrated to Windows representing huge variety: environmental molecular work (all eMinerals codes!), materials polymorph prediction, financial derivatives research, quantum mechanical codes, climatic research, medical image realisation… NUMBER 1 METRIC FOR SUCCESS: Users love it. simple to use, doesn’t break and they can forget about their jobs. NUMBER 2 METRIC FOR SUCCESS: UCL admin love it. 100% utilisation levels 24/7on the entire cluster network with no drop in performance and negligible costs satisfies our dyed-in-the-wool, naturally paranoid, sys admin. NUMBER 3 METRIC FOR SUCCESS: eMinerals developers love it: fast deployment, tweakable, can build on top of it, low admin, integratable with globus, great metadata, great free support, great workflow capabilities, Condor-G. NUMBER 4 METRIC FOR SUCCESS: eScience loves it. Other institutions are following our example, interest is high. Environment from the Molecular Level A NERC eScience testbed project

 This is the largest single Condor pool in the UK (according to Condor)  This is the first fully x-department institutional Condor pool in the UK.  Several other Institutions have followed our lead: Cambridge, Cardiff.  Much scope for combining resources (flocking, glide-in) Environment from the Molecular Level A NERC eScience testbed project WHAT IS MOST IMPORTANT? Condor ENABLES any scientist to do their work in a way they previously dreamed about: Beginning to make real the ability to match unbounded science with unbounded resources. Condor has slashed time-to-results from years to weeks- Scientists using our Condor resource have Redefined their ability to achieve their goals. Condor has organised resources at many levels: Desktop- June 2002 (2 nodes) Cluster- Sept 2002 (18 nodes) Department – Jan 2003 (150 nodes) Campus – October 16 th 2003 (930 nodes) WHERE NEXT- (?????? nodes, ???? Pools)… One million Condor nodes in a hollowed out volcano! Mwahahaha… …Regional and national Condor resources are next…

 This is the largest single Condor pool in the UK (according to Condor)  This is the first fully x-department institutional Condor pool in the UK.  Several other Institutions have followed our lead: Cambridge, Cardiff.  Much scope for combining resources (flocking, glide-in) Environment from the Molecular Level A NERC eScience testbed project …Regional and national Condor resources continued. Many UK institutions have small/medium Condor pools. Some- Soton, Imperial, Cardiff, Cambridge have large and expanding pools. Many UK institutions have resources wasting millions of CPU cycles. We have proved the usefulness of large Windows Condor resources. Assurances regarding security, authorisation, authentication, access and reliable job execution are essential to the take up of Condor on this scale in the UK Many potential resources are Windows, which complicates matters (for example, poor GSI port to Windows and lack of Windows check-pointing.) With education, awareness, support and a core group to lead the way, UK institutions can form a national-level Condor infrastructure leveraging HTC resources for scientists within UK eScience. It hasn’t all been plain sailing though…

Issues with Very Large Condor Installations. Political – the biggest problem. – resistance to change, ownership. Technical – usually surmountable. – networks, deployment, admin, load. Policy – changes to I.S usage. – new usage, which is primary use? Security – trust or certificate based. – trust easy and works. Certs a pain. Environment from the Molecular Level A NERC eScience testbed project

5) The latest from the Condor pool… Environment from the Molecular Level A NERC eScience testbed project

Environment from the Molecular Level A NERC eScience testbed project

2) Latest UK Condor research: FC-UK… UCL, Cambridge and the Condor team at Wisconsin-Madison: Microsoft-funded (50%) 1 year project to develop web-services based Condor scheduler and administrative interfaces on the eMinerals mini- grid and using Microsoft.NET. This may extend into WS-RF (grid standard?) if it appears. This is a fully integrated Condor project, and will form part of future releases. Who? Me, Clovis, Wolfgang Emmerich (UCL) Martin Dove, Mark Calleja (Cams) Miron Livny, Todd Tanenbaum and Matt Farrellee (Condor) and all you prolific users! Environment from the Molecular Level A NERC eScience testbed project

3) Where next, given the lack of volcanoes? UK e-Science to lead in Condor-based HTC. Here’s the idea… 1.UCL host the UK Condor download mirror (imminent) 2.UK Condor support network working through the new Grid Operation Centre (Discussions with UK Grid Exec and GOC current) 3.UK Condor working group to develop an National HTC Condor Service, and formalise long term Condor integration across the UK. 4.UCL to integrate W-S Condor into existing infrastructure: more choice… 5.UCL kicked this all off by proposing and co-leading the inaugural UK Condor Week 2004… Environment from the Molecular Level A NERC eScience testbed project

4) UK Condor Week Jolly exciting it is too. October 11 th to 15 th 2004, National eScience Centre, Edinburgh. Anyone with an interest in Condor, creating HTC resources and the future of UK eScience: Project members, leaders, scientists, Institutional I.S leaders and administrators, eScience decision makers and leaders. Fully endorsed and encouraged by the Condor team, who will attend along with Miron Livny (Condor Godfather and a top bloke) and give two days of tutorials, hands-on sessions, Q & A, demos of new technology. 3 days will be discussions, breakout sessions etc with the aim of formalising a Condor/HTC roadmap for the short and near term for the UK, and agreeing on a group of people to actually do the work. See for details. Environment from the Molecular Level A NERC eScience testbed project

…AND FINALLY. THE MILLION DOLLAR QUESTION? When was the millionth recorded hour of work completed? DATE: April 2 nd 2004… HOUR: ~09.03AM… JOB: … JOB LENGTH: 23hrs 41 minutes… WHO GETS THE GLORY? DR SAM FRENCH, e-Materials Project, R.i. A.K.A: ‘The Poolmeister’ Environment from the Molecular Level A NERC eScience testbed project

Summary. Condor has enabled eMinerals scientists and their UK colleagues to perform their science: 1.in significantly new ways, 2.on previously un-tapped resources, 3.on previously unutilised operating systems, 4.in weeks rather than years, 5.in an integrated, heterogeneous, grid-enabled environment. 6.easily, painlessly and for no cost. 7.with equal importance given to data handling. 8.using out-of-the-box tools. Environment from the Molecular Level A NERC eScience testbed project

Conclusion: THIS MUST CONTINUE! Condor has an important part to play in the UK eScience programme: 1.Through meeting the increasing demands from users for large scale, accessible Condor-enabled HTC resources. 2.Through harnessing the significant volumes of existing, under- utilised, heterogeneous UK institutional hardware. 3.Through providing functionality to facilitate secure accessibility to heterogeneous compute and data resources. 4.Through engaging with the UK eScience programme within Condor’s grid/web service and standardisation developments. Environment from the Molecular Level A NERC eScience testbed project

Elvis from the Molecular Level A NERC eScience testbed project Uhhh thankyouverymuch. You’re beautiful. eMinerals project