From PC Clusters to a Global Computational Grid David Abramson Head of School Computer Science and Software Engineering Monash University Thanks to Jon.

Slides:



Advertisements
Similar presentations
Nimrod/G GRID Resource Broker and Computational Economy
Advertisements

Nimrod/G and Grid Market A Case for Economy Grid Architecture for Service Oriented Global Grid Computing Rajkumar Buyya, David Abramson, Jon Giddy Monash.
Computational Grids and Computational Economy: Nimrod/G Approach David Abramson Rajkumar Buyya Jonathan Giddy.
National Institute of Advanced Industrial Science and Technology Advance Reservation-based Grid Co-allocation System Atsuko Takefusa, Hidemoto Nakada,
Rolls-Royce supported University Technology Centre in Control and Systems Engineering UK e-Science DAME Project Alex Shenfield
FP7-INFRA Enabling Grids for E-sciencE EGEE Induction Grid training for users, Institute of Physics Belgrade, Serbia Sep. 19, 2008.
1 Project Overview EconomyGrid Economic Paradigm For “Resource Management and Scheduling” for Service-Oriented Grid Computing Presenter Name: Sama GovindaRamanujam.
High Performance Parametric Modeling with Nimrod/G: A Killer Application for the Global Grid ? David Abramson, Jon Giddy and Lew Kotler Presentation By:
A Computation Management Agent for Multi-Institutional Grids
Resource Management of Grid Computing
USING THE GLOBUS TOOLKIT This summary by: Asad Samar / CALTECH/CMS Ben Segal / CERN-IT FULL INFO AT:
CoreGRID Workpackage 5 Virtual Institute on Grid Information and Monitoring Services Authorizing Grid Resource Access and Consumption Erik Elmroth, Michał.
Universität Dortmund Robotics Research Institute Information Technology Section Grid Metaschedulers An Overview and Up-to-date Solutions Christian.
Status of Globus activities within INFN Massimo Sgaravatto INFN Padova for the INFN Globus group
Workload Management Workpackage Massimo Sgaravatto INFN Padova.
Milos Kobliha Alejandro Cimadevilla Luis de Alba Parallel Computing Seminar GROUP 12.
Grids and Grid Technologies for Wide-Area Distributed Computing Mark Baker, Rajkumar Buyya and Domenico Laforenza.
Globus activities within INFN Massimo Sgaravatto INFN Padova for the INFN Globus group
The new The new MONARC Simulation Framework Iosif Legrand  California Institute of Technology.
UNICORE UNiform Interface to COmputing REsources Olga Alexandrova, TITE 3 Daniela Grudinschi, TITE 3.
Workload Management Massimo Sgaravatto INFN Padova.
1 GRID D. Royo, O. Ardaiz, L. Díaz de Cerio, R. Meseguer, A. Gallardo, K. Sanjeevan Computer Architecture Department Universitat Politècnica de Catalunya.
Grids and Globus at BNL Presented by John Scott Leita.
Evaluation of the Globus GRAM Service Massimo Sgaravatto INFN Padova.
Resource Management Reading: “A Resource Management Architecture for Metacomputing Systems”
The Globus Toolkit: Description and Applications Review Steve Tuecke & Ian Foster Argonne National Laboratory The University of Chicago Globus Co-PI: Carl.
INFN-GRID Globus evaluation (WP 1) Massimo Sgaravatto INFN Padova for the INFN Globus group
Nimrod/G GRID Resource Broker and Computational Economy David Abramson, Rajkumar Buyya, Jon Giddy School of Computer Science and Software Engineering Monash.
Grid Computing - AAU 14/ Grid Computing Josva Kleist Danish Center for Grid Computing
WP9 Resource Management Current status and plans for future Juliusz Pukacki Krzysztof Kurowski Poznan Supercomputing.
TRASC Globus Application Launcher VPAC Development Team Sudarshan Ramachandran.
Grid Workload Management & Condor Massimo Sgaravatto INFN Padova.
1 520 Student Presentation GridSim – Grid Modeling and Simulation Toolkit.
Nimrod & NetSolve Sathish Vadhiyar. Nimrod Sources/Credits: Nimrod web site & papers.
Grid Technologies  Slide text. What is Grid?  The World Wide Web provides seamless access to information that is stored in many millions of different.
PNPI HEPD seminar 4 th November Andrey Shevel Distributed computing in High Energy Physics with Grid Technologies (Grid tools at PHENIX)
CSF4 Meta-Scheduler Name: Zhaohui Ding, Xiaohui Wei
1 Globus Grid Middleware: Basics, Components, and Services Source: The Globus Project Argonne National Laboratory & University of Southern California
The Globus Project: A Status Report Ian Foster Carl Kesselman
Resource Brokering in the PROGRESS Project Juliusz Pukacki Grid Resource Management Workshop, October 2003.
“ A Distributed Computational Economy and the Nimrod-G Grid Resource Broker ”
Perspectives on Grid Technology Ian Foster Argonne National Laboratory The University of Chicago.
Tools for collaboration How to share your duck tales…
June 24-25, 2008 Regional Grid Training, University of Belgrade, Serbia Introduction to gLite gLite Basic Services Antun Balaž SCL, Institute of Physics.
Ames Research CenterDivision 1 Information Power Grid (IPG) Overview Anthony Lisotta Computer Sciences Corporation NASA Ames May 2,
GRID ARCHITECTURE Chintan O.Patel. CS 551 Fall 2002 Workshop 1 Software Architectures 2 What is Grid ? "...a flexible, secure, coordinated resource- sharing.
Authors: Ronnie Julio Cole David
Globus Toolkit Massimo Sgaravatto INFN Padova. Massimo Sgaravatto Introduction Grid Services: LHC regional centres need distributed computing Analyze.
GRIDS Center Middleware Overview Sandra Redman Information Technology and Systems Center and Information Technology Research Center National Space Science.
Authors: Rajkumar Buyya, David Abramson & Jonathan Giddy
O.C.E.A.N Open Computation Exchange and Auctioning Network.
Introduction to Grid Computing and its components.
Globus Grid Tutorial Part 2: Running Programs Across Multiple Resources.
GRID ANATOMY Advanced Computing Concepts – Dr. Emmanuel Pilli.
Globus: A Report. Introduction What is Globus? Need for Globus. Goal of Globus Approach used by Globus: –Develop High level tools and basic technologies.
6 march Building the INFN Grid Proposal outline a.ghiselli,l.luminari,m.sgaravatto,c.vistoli INFN Grid meeting, milano.
Millions of Jobs or a few good solutions …. David Abramson Monash University MeSsAGE Lab X.
© Geodise Project, University of Southampton, Workflow Support for Advanced Grid-Enabled Computing Fenglian Xu *, M.
The Globus Toolkit The Globus project was started by Ian Foster and Carl Kesselman from Argonne National Labs and USC respectively. The Globus toolkit.
PARALLEL AND DISTRIBUTED PROGRAMMING MODELS U. Jhashuva 1 Asst. Prof Dept. of CSE om.
Holding slide prior to starting show. Scheduling Parametric Jobs on the Grid Jonathan Giddy
INTRODUCTION TO GRID & CLOUD COMPUTING U. Jhashuva 1 Asst. Professor Dept. of CSE.
A System for Monitoring and Management of Computational Grids Warren Smith Computer Sciences Corporation NASA Ames Research Center.
Towards a High Performance Extensible Grid Architecture Klaus Krauter Muthucumaru Maheswaran {krauter,
Workload Management Workpackage
David Abramson, Rajkumar Buyya, and Jonathan Giddy
Clouds , Grids and Clusters
Peter Kacsuk – Sipos Gergely MTA SZTAKI
Grid Computing.
Wide Area Workload Management Work Package DATAGRID project
Presentation transcript:

From PC Clusters to a Global Computational Grid David Abramson Head of School Computer Science and Software Engineering Monash University Thanks to Jon Giddy, DSTC Rok Sosic, Active Tools Andrew Lewis, QPSF Ian Foster, ANL Rajkumar Buyya, Monash Tom Peachy, Monash

2 ©David Abramson Applications Nimrod/G ‘98 - DSTC Nimrod/O ‘97 - ‘99 ARC Research Model Nimrod ‘94 - ‘98 ActiveSheets ‘00 - DSTC Commercialisation (‘97 -)

3 ©David Abramson Parametrised Modelling Killer App for the Grid?  Study the behaviour of some of the output variables against a range of different input scenarios.  Computations are uncoupled (file transfer)  Allows real time analysis for many applications  More realistic simulations  Study the behaviour of some of the output variables against a range of different input scenarios.  Computations are uncoupled (file transfer)  Allows real time analysis for many applications  More realistic simulations

4 ©David Abramson Working with Small Clusters  Nimrod ( ) – DSTC Funded project – Designed for Department level clusters – Proof of concept  Clustor ( ( ) – Commercial version of Nimrod – Re-engineered  Features – Workstation Orientation – Access to idle workstations – Random allocation policy – Password security

5 ©David Abramson Execution Architecture Input Files Substitution Output Files Root Machine ComputationalNodes

Clustor Tools

7 ©David Abramson Physical Model f f Time to crack in this position (Courtesy Prof Rhys Jones, Dept Mechanical Engineering, Monash University) Clustor by example

8 ©David Abramson Dispatch cycle using Clustor...

9 ©David Abramson Sample Applications of Clustor Bioinformatics: Protein Modelling Bioinformatics: Protein Modelling Sensitivity experiments on smog formation Combinatorial Optimization: Meta-heuristic parameter estimation Ecological Modelling: Control Strategies for Cattle Tick Electronic CAD: Field Programmable Gate Arrays Computer Graphics: Ray Tracing High Energy Physics: Searching for Rare Events Physics: Laser-Atom Collisions VLSI Design: SPICE Simulations Fuzzy Logic Parameter setting ATM Network Design

10 ©David Abramson SMOG Sensitivity Experiments Control ROC Control NOx $$$

11 ©David Abramson Physics - Laser Interaction

12 ©David Abramson Electronic CAD

13 ©David Abramson Dr Dinelli Mather Monash University & MacFarlane Burnett Public Health Policy Health Standards Lew Kotler Australian Radiation Protection and Nuclear Safety Agency Airframe Simulation Dr Shane Dunn, AMRL, DSTO Network Simulation Dr Mahbun Hassan, Monash Current Application Drivers

14 ©David Abramson Evolution of the Global Grid GlobalClusters Desktop DepartmentClusters SharedSupercomputer Enterprise-WideClusters

15 ©David Abramson The Nimrod Vision... Can we make it 10% smaller? We need the answer by 5 o’clock

16 ©David Abramson Source: & updated Towards Grid Computing…. The Gusto Testbed

17 ©David Abramson What does the Grid have to offer? “Dependable, consistent, pervasive access to [high-end] resources”  Dependable: Can provide performance and functionality guarantees  Consistent: Uniform interfaces to a wide variety of resources  Pervasive: Ability to “plug in” from anywhere Source:

18 ©David Abramson Challenges for the Global Grid Security Resource Allocation & Scheduling Data locality Network Management System Management Resource Location Uniform Access

19 ©David Abramson Nimrod on Enterprise Wide Networks and the Global Grid  Manual resource location – Static file of machine names  No resource Scheduling – First come first serve  No cost Model – All machines/users cost alike  Homogeneous Access Mechanism

20 ©David Abramson Requirements  Users & system managers want to know – Where it will run – When it will run – How much it will cost – That access is secure – Will support a range of access mechanisms

21 ©David Abramson Source: The Globus Project  Basic research in grid-related technologies – Resource management, QoS, networking, storage, security, adaptation, policy, etc.  Development of Globus toolkit – Core services for grid-enabled tools & applns  Construction of large grid testbed: GUSTO – Largest grid testbed in terms of sites & apps  Application experiments – Tele-immersion, distributed computing, etc.

22 ©David Abramson Layered Globus Architecture Applications Local Services LSF CondorMPI NQEEasy TCP SolarisIrixAIX UDP High-level Services and Tools DUROCglobusrunMPI Nimrod/G MPI-IOCC++ GlobusViewTestbed Status Core Services Metacomputing Directory Service GRAM Globus Security Interface Heartbeat Monitor Nexus GloperfGASS Source:

23 ©David Abramson Some issues for Nimrod/G

24 ©David Abramson Resource Location  Need to locate suitable machines for an experiment – Speed – Number of processors – Cost – Availability – User account  Available resources will vary across experiment  Supported through Directory Server (Globus MDS)

25 ©David Abramson Resource Scheduling  User view – solve problem in minimum time  System – Spread load across machines  Soft real time problem through deadlines – Complete by deadline – Unreliable resource provision – Machine load may change at any time – Multiple machine queues

26 ©David Abramson Resource Scheduling...  Need to establish rate at which a machine can consume jobs  Use deadline as metric for machine performance  Move jobs to machines that are performing well  Remove jobs from machines that are falling behind

27 ©David Abramson Computational Economy  Resource selection on based real money and market based  A large number of sellers and buyers (resources may be dedicated/shared)  Negotiation: tenders/bids and select those offers meet the requirement  Trading and Advance Resource Reservation  Schedule computations on those resources that meet all requirements

28 ©David Abramson Cost Model  Without cost ANY shared system becomes un- managable  Charge users more for remote facilities than their own  Choose cheaper resources before more expensive ones  Cost units may be – Dollars – Shares in global facility – Stored in bank

29 ©David Abramson Cost Model...  Non-uniform costing  Encourages use of local resources first  Real accounting system can control machine usage User 5 Machine 1 User 1 Machine 5

30 ©David Abramson Security  Uses Globus Security Layer  Generic Security Service API using an implementation of SSL, Secure Sockets Layer.  RSA encryption algorithm employing both public and private keys.  X509 certificate consisting of – duration of the permissions, – the RSA public key, – signature of the Certificate Authority (CA).

31 ©David Abramson Uniform Access  Resource Allocation Module (GRAM) provides interface to range of schemes – Fork – Queue (Easy, LoadLeveler, Condor, LSF)  Multiple pathways to same machine (if supported)  Integrated with Security scheme

32 ©David Abramson Nimrod/G Architecture Nimrod/G Client Grid Directory Services Schedule Advisor Resource Discovery Grid Middleware Services Dispatcher GUSTO Test Bed Parametric Engine Persistent Info.

33 ©David Abramson Nimrod/G Interactions MDS server Resource location Queuing System GRAM server Resource allocation (local) Additional services used implicitly: GSI (authentication & authorization) Nexus (communication) User process File access GASS server Gatekeeper node Job Wrapper Computational node Dispatcher Root node Scheduler Prmtc.. Engine

34 ©David Abramson A Nimrod/G Client CostDeadline AvailableMachines

35 ©David Abramson Nimrod/G Scheduling Algorithm Find a set of machines (MDS search) Distribute jobs from root to machines Establish job consumption rate for each machine For each machine Can we meet deadline? If not, then return some jobs to root If yes, distribute more jobs to resource If cannot meet deadline with current resource Find additional resources

36 ©David Abramson Nimrod/G Scheduling algorithm... LocateMachines DistributeJobs EstablishRates MeetDeadlines? Re-distributeJobs LocatemoreMachines

37 ©David Abramson Nimrod/G Scheduling algorithm... LocateMachines DistributeJobs EstablishRates MeetDeadlines? Re-distributeJobs LocatemoreMachines

38 ©David Abramson Nimrod/G Scheduling algorithm... LocateMachines DistributeJobs EstablishRates MeetDeadlines? Re-distributeJobs LocatemoreMachines

39 ©David Abramson Nimrod/G Scheduling algorithm... LocateMachines DistributeJobs EstablishRates MeetDeadlines? Re-distributeJobs LocatemoreMachines

40 ©David Abramson Nimrod/G Scheduling algorithm... LocateMachines DistributeJobs EstablishRates MeetDeadlines? Re-distributeJobs LocatemoreMachines

41 ©David Abramson Nimrod/G Scheduling algorithm... LocateMachines DistributeJobs EstablishRates MeetDeadlines? Re-distributeJobs LocatemoreMachines

42 ©David Abramson Nimrod/G Scheduling algorithm... LocateMachines DistributeJobs EstablishRates MeetDeadlines? Re-distributeJobs LocatemoreMachines

43 ©David Abramson Nimrod/G Scheduling algorithm... LocateMachines DistributeJobs EstablishRates MeetDeadlines? Re-distributeJobs LocatemoreMachines

44 ©David Abramson Some results experiments

45 ©David Abramson

46

47

48 Optimal Design using computation - Nimrod/O  Clustor allows exploration of design scenarios – Search by enumeration  Search for local/global minima based on objective function – How do I minimise the cost of this design? – How do I maxmimize the life of this object?  Objective function evaluated by computational model – Computationally expensive  Driven by applications

49 ©David Abramson Application Drivers  Complex industrial design problems – Air quality – Antenna Design – Business Simulation – Mechanical Optimisation

50 ©David Abramson Cost function minimization  Continuous functions - gradient descent  Quasi-Newton BFGS algorithm – find gradient using finite difference approximation – line search using bound constrained, parallel method

51 ©David Abramson Implementation  Master - slave parallelization  Gradient-determination & line-searching – tasks queued via IBM LoadLeveler – (adapt to number of CPUs allocated by the Resource Manager)  Interfaced to existing dispatchers – Clustor – Nimrod/G

52 ©David Abramson Meta-heuristicSearch Meta-heuristicSearch Supercomputer or Cluster Pool ArchitectureBFGS ClustorDispatcher FunctionEvaluations JobsClustorPlanFile

53 ©David Abramson Ongoing research  Increased parallelism – Multi-start for better coverage – High dimensioned problems – Addition of other search algorithms – Simplex algorithm  Mixed integer problems – BFGS modified to support mixed integer – Mixed search/enumeration – Meta-heuristic based search – Adaptive Simulated Annealing (ASA)

54 ©David Abramson Further Information Nimrodwww.csse.monash.edu.au/~davida/nimrod.html DSTCwww.dstc.edu.au Globuswww.globus.org Activetoolswww.activetools.com Our Clusterhathor.csse.monash.edu.au