Download presentation
Presentation is loading. Please wait.
Published byJonathan Reeves Modified over 9 years ago
1
From PC Clusters to a Global Computational Grid David Abramson Head of School Computer Science and Software Engineering Monash University Thanks to Jon Giddy, DSTC Rok Sosic, Active Tools Andrew Lewis, QPSF Ian Foster, ANL Rajkumar Buyya, Monash Tom Peachy, Monash
2
2 ©David Abramson Applications Nimrod/G ‘98 - DSTC Nimrod/O ‘97 - ‘99 ARC Research Model Nimrod ‘94 - ‘98 ActiveSheets ‘00 - DSTC Commercialisation (‘97 -)
3
3 ©David Abramson Parametrised Modelling Killer App for the Grid? Study the behaviour of some of the output variables against a range of different input scenarios. Computations are uncoupled (file transfer) Allows real time analysis for many applications More realistic simulations Study the behaviour of some of the output variables against a range of different input scenarios. Computations are uncoupled (file transfer) Allows real time analysis for many applications More realistic simulations
4
4 ©David Abramson Working with Small Clusters Nimrod (1994 - ) – DSTC Funded project – Designed for Department level clusters – Proof of concept Clustor (www.activetools.com) (1997 - ) – Commercial version of Nimrod – Re-engineered Features – Workstation Orientation – Access to idle workstations – Random allocation policy – Password security
5
5 ©David Abramson Execution Architecture Input Files Substitution Output Files Root Machine ComputationalNodes
6
Clustor Tools
7
7 ©David Abramson Physical Model f f Time to crack in this position (Courtesy Prof Rhys Jones, Dept Mechanical Engineering, Monash University) Clustor by example
8
8 ©David Abramson Dispatch cycle using Clustor...
9
9 ©David Abramson Sample Applications of Clustor Bioinformatics: Protein Modelling Bioinformatics: Protein Modelling Sensitivity experiments on smog formation Combinatorial Optimization: Meta-heuristic parameter estimation Ecological Modelling: Control Strategies for Cattle Tick Electronic CAD: Field Programmable Gate Arrays Computer Graphics: Ray Tracing High Energy Physics: Searching for Rare Events Physics: Laser-Atom Collisions VLSI Design: SPICE Simulations Fuzzy Logic Parameter setting ATM Network Design
10
10 ©David Abramson SMOG Sensitivity Experiments Control ROC Control NOx $$$
11
11 ©David Abramson Physics - Laser Interaction
12
12 ©David Abramson Electronic CAD
13
13 ©David Abramson Dr Dinelli Mather Monash University & MacFarlane Burnett Public Health Policy Health Standards Lew Kotler Australian Radiation Protection and Nuclear Safety Agency Airframe Simulation Dr Shane Dunn, AMRL, DSTO Network Simulation Dr Mahbun Hassan, Monash Current Application Drivers
14
14 ©David Abramson Evolution of the Global Grid GlobalClusters Desktop DepartmentClusters SharedSupercomputer Enterprise-WideClusters
15
15 ©David Abramson The Nimrod Vision... Can we make it 10% smaller? We need the answer by 5 o’clock
16
16 ©David Abramson Source: www.globus.org & updated Towards Grid Computing…. The Gusto Testbed
17
17 ©David Abramson What does the Grid have to offer? “Dependable, consistent, pervasive access to [high-end] resources” Dependable: Can provide performance and functionality guarantees Consistent: Uniform interfaces to a wide variety of resources Pervasive: Ability to “plug in” from anywhere Source: www.globus.org
18
18 ©David Abramson Challenges for the Global Grid Security Resource Allocation & Scheduling Data locality Network Management System Management Resource Location Uniform Access
19
19 ©David Abramson Nimrod on Enterprise Wide Networks and the Global Grid Manual resource location – Static file of machine names No resource Scheduling – First come first serve No cost Model – All machines/users cost alike Homogeneous Access Mechanism
20
20 ©David Abramson Requirements Users & system managers want to know – Where it will run – When it will run – How much it will cost – That access is secure – Will support a range of access mechanisms
21
21 ©David Abramson Source: www.globus.org The Globus Project Basic research in grid-related technologies – Resource management, QoS, networking, storage, security, adaptation, policy, etc. Development of Globus toolkit – Core services for grid-enabled tools & applns Construction of large grid testbed: GUSTO – Largest grid testbed in terms of sites & apps Application experiments – Tele-immersion, distributed computing, etc.
22
22 ©David Abramson Layered Globus Architecture Applications Local Services LSF CondorMPI NQEEasy TCP SolarisIrixAIX UDP High-level Services and Tools DUROCglobusrunMPI Nimrod/G MPI-IOCC++ GlobusViewTestbed Status Core Services Metacomputing Directory Service GRAM Globus Security Interface Heartbeat Monitor Nexus GloperfGASS Source: www.globus.org
23
23 ©David Abramson Some issues for Nimrod/G
24
24 ©David Abramson Resource Location Need to locate suitable machines for an experiment – Speed – Number of processors – Cost – Availability – User account Available resources will vary across experiment Supported through Directory Server (Globus MDS)
25
25 ©David Abramson Resource Scheduling User view – solve problem in minimum time System – Spread load across machines Soft real time problem through deadlines – Complete by deadline – Unreliable resource provision – Machine load may change at any time – Multiple machine queues
26
26 ©David Abramson Resource Scheduling... Need to establish rate at which a machine can consume jobs Use deadline as metric for machine performance Move jobs to machines that are performing well Remove jobs from machines that are falling behind
27
27 ©David Abramson Computational Economy Resource selection on based real money and market based A large number of sellers and buyers (resources may be dedicated/shared) Negotiation: tenders/bids and select those offers meet the requirement Trading and Advance Resource Reservation Schedule computations on those resources that meet all requirements
28
28 ©David Abramson Cost Model Without cost ANY shared system becomes un- managable Charge users more for remote facilities than their own Choose cheaper resources before more expensive ones Cost units may be – Dollars – Shares in global facility – Stored in bank
29
29 ©David Abramson Cost Model... Non-uniform costing Encourages use of local resources first Real accounting system can control machine usage 13 21 User 5 Machine 1 User 1 Machine 5
30
30 ©David Abramson Security Uses Globus Security Layer Generic Security Service API using an implementation of SSL, Secure Sockets Layer. RSA encryption algorithm employing both public and private keys. X509 certificate consisting of – duration of the permissions, – the RSA public key, – signature of the Certificate Authority (CA).
31
31 ©David Abramson Uniform Access Resource Allocation Module (GRAM) provides interface to range of schemes – Fork – Queue (Easy, LoadLeveler, Condor, LSF) Multiple pathways to same machine (if supported) Integrated with Security scheme
32
32 ©David Abramson Nimrod/G Architecture Nimrod/G Client Grid Directory Services Schedule Advisor Resource Discovery Grid Middleware Services Dispatcher GUSTO Test Bed Parametric Engine Persistent Info.
33
33 ©David Abramson Nimrod/G Interactions MDS server Resource location Queuing System GRAM server Resource allocation (local) Additional services used implicitly: GSI (authentication & authorization) Nexus (communication) User process File access GASS server Gatekeeper node Job Wrapper Computational node Dispatcher Root node Scheduler Prmtc.. Engine
34
34 ©David Abramson A Nimrod/G Client CostDeadline AvailableMachines
35
35 ©David Abramson Nimrod/G Scheduling Algorithm Find a set of machines (MDS search) Distribute jobs from root to machines Establish job consumption rate for each machine For each machine Can we meet deadline? If not, then return some jobs to root If yes, distribute more jobs to resource If cannot meet deadline with current resource Find additional resources
36
36 ©David Abramson Nimrod/G Scheduling algorithm... LocateMachines DistributeJobs EstablishRates MeetDeadlines? Re-distributeJobs LocatemoreMachines
37
37 ©David Abramson Nimrod/G Scheduling algorithm... LocateMachines DistributeJobs EstablishRates MeetDeadlines? Re-distributeJobs LocatemoreMachines
38
38 ©David Abramson Nimrod/G Scheduling algorithm... LocateMachines DistributeJobs EstablishRates MeetDeadlines? Re-distributeJobs LocatemoreMachines
39
39 ©David Abramson Nimrod/G Scheduling algorithm... LocateMachines DistributeJobs EstablishRates MeetDeadlines? Re-distributeJobs LocatemoreMachines
40
40 ©David Abramson Nimrod/G Scheduling algorithm... LocateMachines DistributeJobs EstablishRates MeetDeadlines? Re-distributeJobs LocatemoreMachines
41
41 ©David Abramson Nimrod/G Scheduling algorithm... LocateMachines DistributeJobs EstablishRates MeetDeadlines? Re-distributeJobs LocatemoreMachines
42
42 ©David Abramson Nimrod/G Scheduling algorithm... LocateMachines DistributeJobs EstablishRates MeetDeadlines? Re-distributeJobs LocatemoreMachines
43
43 ©David Abramson Nimrod/G Scheduling algorithm... LocateMachines DistributeJobs EstablishRates MeetDeadlines? Re-distributeJobs LocatemoreMachines
44
44 ©David Abramson Some results experiments
45
45 ©David Abramson
46
46
47
47
48
48 Optimal Design using computation - Nimrod/O Clustor allows exploration of design scenarios – Search by enumeration Search for local/global minima based on objective function – How do I minimise the cost of this design? – How do I maxmimize the life of this object? Objective function evaluated by computational model – Computationally expensive Driven by applications
49
49 ©David Abramson Application Drivers Complex industrial design problems – Air quality – Antenna Design – Business Simulation – Mechanical Optimisation
50
50 ©David Abramson Cost function minimization Continuous functions - gradient descent Quasi-Newton BFGS algorithm – find gradient using finite difference approximation – line search using bound constrained, parallel method
51
51 ©David Abramson Implementation Master - slave parallelization Gradient-determination & line-searching – tasks queued via IBM LoadLeveler – (adapt to number of CPUs allocated by the Resource Manager) Interfaced to existing dispatchers – Clustor – Nimrod/G
52
52 ©David Abramson Meta-heuristicSearch Meta-heuristicSearch Supercomputer or Cluster Pool ArchitectureBFGS ClustorDispatcher FunctionEvaluations JobsClustorPlanFile
53
53 ©David Abramson Ongoing research Increased parallelism – Multi-start for better coverage – High dimensioned problems – Addition of other search algorithms – Simplex algorithm Mixed integer problems – BFGS modified to support mixed integer – Mixed search/enumeration – Meta-heuristic based search – Adaptive Simulated Annealing (ASA)
54
54 ©David Abramson Further Information Nimrodwww.csse.monash.edu.au/~davida/nimrod.html DSTCwww.dstc.edu.au Globuswww.globus.org Activetoolswww.activetools.com Our Clusterhathor.csse.monash.edu.au
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.