Issues with Production Grids Tony Hey Director of UK e-Science Core Programme.

Slides:



Advertisements
Similar presentations
Delivering User Needs: A middleware perspective Steven Newhouse Director.
Advertisements

The National Grid Service Mike Mineter.
Neil Geddes CCLRC Head of e-Science Director of the UK Grid Operations Support Centre 1.The vision Thing, recent history and Organisation 2.Roles and Relationships.
INFSO-RI Enabling Grids for E-sciencE EGEE and the National Grid Service Mike Mineter
OMII-UK Steven Newhouse, Director. © 2 OMII-UK aims to provide software and support to enable a sustained future for the UK e-Science community and its.
SCARF Duncan Tooke RAL HPCSG. Overview What is SCARF? Hardware & OS Management Software Users Future.
Peter Berrisford RAL – Data Management Group SRB Services.
Towards a Virtual European Supercomputing Infrastructure Vision & issues Sanzio Bassini
The UK National Grid Service Using the NGS. Outline NGS Background Getting Certificates Acceptable usage policies Joining VO’s What resources will be.
The OMII Position At the University of Southampton.
Bob Thome, Senior Director of Product Management, Oracle SIMPLIFYING YOUR HIGH AVAILABILITY DATABASE.
The OMII Perspective on Grid and Web Services At the University of Southampton.
Checkpoint & Restart for Distributed Components in XCAT3 Sriram Krishnan* Indiana University, San Diego Supercomputer Center & Dennis Gannon Indiana University.
“How to Connect Campus Grids to the NGS” Neil Geddes Director, GOSC The UK's National Grid Service is a project to deploy and operate a grid infrastructure.
User requirements for and concerns about a European e-Infrastructure Steven Newhouse, Director.
CS e-Science & Grid Computing - introduction - What is e-Science? What is the Grid? Grid middleware.
DISTRIBUTED COMPUTING
Tony Hey Director, UK e-Science Core Programme
QCDGrid Progress James Perry, Andrew Jackson, Stephen Booth, Lorna Smith EPCC, The University Of Edinburgh.
UKQCD QCDgrid Richard Kenway. UKQCD Nov 2001QCDgrid2 why build a QCD grid? the computational problem is too big for current computers –configuration generation.
A portal interface to my Grid workflow technology Stefan Rennick Egglestone University of Nottingham
GGF-16 Athens Production Grid Computing in the UK Neil Geddes CCLRC Director, e-Science.
Neil Geddes GridPP-10, June 2004 UK e-Science Grid Dr Neil Geddes CCLRC Head of e-Science Director of the UK Grid Operations Support Centre.
The UK’s Grid Operations Support Centre and National Grid Service Core Components of the UK’s e-Infrstructure Neil Geddes Director, GOSC All Hands Meeting,
GEM Portal and SERVOGrid for Earthquake Science PTLIU Laboratory for Community Grids Geoffrey Fox, Marlon Pierce Computer Science, Informatics, Physics.
Dr Neil Geddes CCLRC Head of e-Science Director of the UK Grid Operations Support Centre.
Stephen Pickles Technical Director, Grid Operations Support Centre University of Manchester Neil Geddes CCLRC Head of e-Science Director of the UK Grid.
RAL Site Report Andrew Sansum e-Science Centre, CCLRC-RAL HEPiX May 2004.
Service - Oriented Middleware for Distributed Data Mining on the Grid ,劉妘鑏 Antonio C., Domenico T., and Paolo T. Journal of Parallel and Distributed.
Large Scale Parallel File System and Cluster Management ICT, CAS.
NIH Resource for Biomolecular Modeling and Bioinformatics Beckman Institute, UIUC NAMD Development Goals L.V. (Sanjay) Kale Professor.
NIH Resource for Biomolecular Modeling and Bioinformatics Beckman Institute, UIUC NAMD Development Goals L.V. (Sanjay) Kale Professor.
SEEK Welcome Malcolm Atkinson Director 12 th May 2004.
“Grids and eScience” Mark Hayes Technical Director - Cambridge eScience Centre GEFD Summer School 2003.
The UK eScience Grid (and other real Grids) Mark Hayes NIEeS Summer School 2003.
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
The National Grid Service Mike Mineter
Building the e-Minerals Minigrid Rik Tyer, Lisa Blanshard, Kerstin Kleese (Data Management Group) Rob Allan, Andrew Richards (Grid Technology Group)
Grid User Interface for ATLAS & LHCb A more recent UK mini production used input data stored on RAL’s tape server, the requirements in JDL and the IC Resource.
NEES Cyberinfrastructure Center at the San Diego Supercomputer Center, UCSD George E. Brown, Jr. Network for Earthquake Engineering Simulation NEES TeraGrid.
Combining the strengths of UMIST and The Victoria University of Manchester “Use cases” Stephen Pickles e-Frameworks meets e-Science workshop Edinburgh,
The National Grid Service Mike Mineter.
Remarks on OGSA and OGSI e-Science All Hands Meeting September Geoffrey Fox, Indiana University.
1 e-Science AHM st Aug – 3 rd Sept 2004 Nottingham Distributed Storage management using SRB on UK National Grid Service Manandhar A, Haines K,
Utility Computing: Security & Trust Issues Dr Steven Newhouse Technical Director London e-Science Centre Department of Computing, Imperial College London.
1 The National Grid Service: An Overview Stephen Pickles University of Manchester Technical Director, GOSC Towards an NGS User Induction Course, NeSC,
Cyberinfrastructure: Many Things to Many People Russ Hobby Program Manager Internet2.
An Introduction to UK e-Science Anne E Trefethen Deputy Director UK e-Science Core Programme.
The National Grid Service Mike Mineter.
The National Grid Service Mike Mineter
The Storage Resource Broker and.
The National Grid Service User Accounting System Katie Weeks Science and Technology Facilities Council.
Neil Geddes CCLRC Head of e-Science Director of the UK Grid Operations Support Centre.
RC ICT Conference 17 May 2004 Research Councils ICT Conference The UK e-Science Programme David Wallace, Chair, e-Science Steering Committee.
UK Grid Operations Support Centre All slides stolen by P.Clarke from a talk given by: Dr Neil Geddes CCLRC Head of e-Science Director of the UK Grid Operations.
The UK National Grid Service Andrew Richards – CCLRC, RAL.
Data Infrastructure in the TeraGrid Chris Jordan Campus Champions Presentation May 6, 2009.
A Web Based Job Submission System for a Physics Computing Cluster David Jones IOP Particle Physics 2004 Birmingham 1.
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING CLOUD COMPUTING
Clouds , Grids and Clusters
SuperComputing 2003 “The Great Academia / Industry Grid Debate” ?
UK Grid: Moving from Research to Production
Stephen Pickles Technical Director, GOSC
UK e-Science OGSA-DAI November 2002 Malcolm Atkinson
NGS Oracle Service.
University of Technology
The National Grid Service
SDM workshop Strawman report History and Progress and Goal.
The National Grid Service Mike Mineter NeSC-TOE
Grid Systems: What do we need from web service standards?
Presentation transcript:

Issues with Production Grids Tony Hey Director of UK e-Science Core Programme

The ‘Grid’ is a set of core middleware services Running on top of high performance global networks to support research and innovation

Grids of Grids of Simple Services Overlay and Compose Grids of Grids MethodsServicesFunctional Grids CPUsClusters Compute Resource Grids MPPs Databases Federated Databases SensorSensor Nets Data Resource Grids

NGS “Today” Projects e-Minerals e-Materials Orbital Dynamics of Galaxies Bioinformatics (using BLAST) GEODISE project UKQCD Singlet meson project Census data analysis MIAKT project e-HTPX project. RealityGrid (chemistry) Users Leeds Oxford UCL Cardiff Southampton Imperial Liverpool Sheffield Cambridge Edinburgh QUB BBSRC CCLRC. Interfaces OGSI::Lite

NGS Hardware Compute Cluster 64 dual CPU Intel 3.06 GHz (1MB cache) nodes 2GB memory per node 2x 120GB IDE disks (1 boot, 1 data) Gigabit network Myrinet M3F-PCIXD-2 Front end (as node) Disk server (as node) with 2x Infortrend 2.1TB U16U SCSI Arrays (UltraStar 146Z10 disks) PGI compilers Intel Compilers, MKL PBSPro TotalView Debugger RedHat ES 3.0 Data Cluster 20 dual CPU Intel 3.06 GHz nodes 4GB memory per node 2x120GB IDE disks (1 boot, 1 data) Gigabit network Myrinet M3F-PCIXD-2 Front end (as node) 18TB Fibre SAN ( Infortrend F16F 4.1TB Fibre Arrays (UltraStar 146Z10 disks) PGI compilers Intel Compilers, MKL PBSPro TotalView Debugger Oracle 9i RAC Oracle Application server RedHat ES 3.0

NGS Software

Src SH2 domain ligand RealityGrid AHM Experiment Measuring protein-peptide binding energies –  G bind is vital for e.g. understanding fundamental physical processes at play at the molecular level, for designing new drugs. Computing a peptide-protein binding energy traditionally takes weeks to months. We have developed a grid- based method to accelerate this process. We computed  G bind during the UK AHM i.e. in less than 48 hours

Experiment Details A Grid based approach, using the RealityGrid steering library enables us to launch, monitor, checkpoint and spawn multiple simulations Each simulation is a parallel molecular dynamic simulation running on a supercomputer class machine At any given instant, we had up to nine simulations in progress (over 140 processors) on machines at 5 different sites: e.g 1x TG-SDSC, 3x TG-NCSA, 3x NGS-Oxford, 1x NGS-Leeds, 1x NGS-RAL

Experiment Details (2) In all 26 simulations were run over 48 hours. We simulated over 6.8ns of classical molecular dynamics in this time Real time visualization and off-line analysis required bringing back data from simulations in progress. We used UK-light between UCL and the TeraGrid machines (SDSC, NCSA)

Computation Starlight (Chicago) Netherlight (Amsterdam) Leeds PSC SDSC UCL Network PoP Service Registry NCSA Manchester UKLight Oxford RAL US TeraGrid UK NGS Steering clients AHM 2004 Local laptops and Manchester vncserver All sites connected by production network (not all shown) The e-Infrastructure

The scientific results … Some simulations require extending and more sophisticated analysis needs to be performed

… and the problems Restarted the GridService container Wednesday evening Numerous quota and permission issues, especially at TG-SDSC NGS-Oxford was unreachable Wednesday evening to Thursday morning The steerer and launcher occasionally fail We were unable to checkpoint two simulations The batch queuing systems occasionally did not like our simulations 5 simulations died of natural causes Overall, up to six people were working on this calculation to solve these problems

NGS “Tomorrow” Grid Operation Support Centre Web Services based National Grid Infrastructure

‘WS-I+’ profile Web Service Grids: An Evolutionary Approach to WSRF WS-I Standards that have broad industry support and multiple interoperable implementations Specifications that are emerging from standardisation process and are recognised as being ‘useful’ Specifications that have/will enter a standardisation process but are not stable and are still experimental

OMII Vision To be the national provider of reliable, interoperable, open source grid middleware Provide one-stop portal and software repository for grid middleware Provide quality assured software engineering, testing, packaging and maintenance for our products Lead the evolution of Grid middleware through a managed programme and wide reaching collaboration with industry

OMII Distribution 1 Oct 2004 Collection of tested, documented and integrated software components for Web Service Grids A base built from off-the-shelf Web Services technology A package of extensions that can be enabled as required An initial set of Web Services for building file- compute collaborative grids Technical preview of Web Service version of OGSA-DAI database middleware Sample applications

OMII future distributions Include the services in previous distributions +… OMII managed programme contributions –Database service –Workflow service –Registry service –Reliable messaging service –Notification service Interoperability with other grids

Why Workflows and Services? Workflow = general technique for describing and enacting a process Workflow = describes what you want to do, not how you want to do it Web Service = how you want to do it Web Service = automated programmatic internet access to applications Automation –Capturing processes in an explicit manner –Tedium! Computers don’t get bored/distracted/hungry/impatient! –Saves repeated time and effort Modification, maintenance, substitution and personalisation Easy to share, explain, relocate, reuse and build Available to wider audience: don’t need to be a coder, just need to know how to do Bioinformatics Releases Scientists/Bioinformaticians to do other work Record –Provenance: what the data is like, where it came from, its quality –Management of data (LSID - Life Science IDentifiers)

Workflow Components Scufl Simple Conceptual Unified Flow Language Taverna Writing, running workflows & examining results SOAPLAB Makes applications available Freefluo Workflow engine to run workflows Freefluo SOAPLAB Web Service Any Application Web Service e.g. DDBJ BLAST

ABC The Williams Workflows A: Identification of overlapping sequence B: Characterisation of nucleotide sequence C: Characterisation of protein sequence

The Workflow Experience Correct and Biologically meaningful results Automation –Saved time, increased productivity –Process split into three, you still require humans! Sharing –Other people have used and want to develop the workflows Change of work practises –Post hoc analysis. Don’t analyse data piece by piece receive all data all at once –Data stored and collected in a more standardised manner –Results amplification –Results management and visualisation Have workflows delivered on their promise?YES!

Future UK e-Infrastructure? LHC ISIS TS2 HPCx + HECtoR GOSC Regional and Campus grids Users get common access, tools, information, nationally supported services, through NGS and robust, standards-compliant middleware from the OMII Integrated internationally VRE, VLE, IE