Nimrod/G GRID Resource Broker and Computational Economy David Abramson, Rajkumar Buyya, Jon Giddy School of Computer Science and Software Engineering Monash University, Melbourne, Australia Email: {davida, rajkumar, jon}@csse.monash.edu.au
Computing Power (HPC) Drivers Solving grand challenge applications using computer modeling, simulation and analysis E-commerce/anything Life Sciences Aerospace Digital Biology CAD/CAM Military Applications Military Applications Military Applications
Computing Platforms Evolution Breaking Administrative Barriers 2100 ? PERFORMANCE 2100 Administrative Barriers Individual Group Department Campus State National Globe Inter Planet Universe Desktop (Single Processor?) SMPs or SuperComputers Local Cluster Enterprise Cluster/Grid Global Cluster/Grid Inter Planet Cluster/Grid ??
Global Computational Grids
Grid Resource Management: Challenging Issues Authentication (once) Specify simulation (code, resources, etc.) Discover resources Negotiate authorization, acceptable use, Cost, etc. Acquire resources Schedule Jobs Initiate computation Steer computation Access remote data-sets Collaborate on results Account for usage Domain 1 Domain 2 Ack.: globus..
Grid Components … … … … Grid Apps. Scientific Engineering Applications and Portals Grid Apps. … Scientific Engineering Collaboration Prob. Solving Env. Web enabled Apps Development Environments and Tools Grid Tools … Languages Libraries Debuggers Monitoring Resource Brokers Web tools Distributed Resources Coupling Services Grid Middleware … Comm. Sign on & Security Information Process Data Access QoS Local Resource Managers Operating Systems Queuing Systems Libraries & App Kernels … TCP/IP & UDP Grid Fabric Networked Resources across Organisations … Computers Clusters Storage Systems Data Sources Scientific Instruments
Computational Market Model for Grid Resource Management Grid Information Server(s) Info ? Health Monitor Grid Node N Grid Explorer … Application … Job Control Agent Grid Node 2 Grid Node1 Schedule Advisor Trading Trade Server Charging Alg. Trade Manager Accounting Resource Reservation Other services … Deployment Agent Jobs Resource Allocation Grid User Grid Resource Broker R1 R2 … Rm Grid Resource/Control Domains Grid Middleware
What is Nimrod/G ? A global scheduler for managing and steering task farming (parametric simulation) applications on computational grid based on deadline and computational economy. Key Features A single window to manage & control experiment Resource Discovery Trade for Resources Scheduling Steering & data management It allows to study the behaviour of some of the output variables against a range of different input scenarios.
Nimrod/G Grid Resource Broker Architecture Nimrod/G Client Nimrod/G Client Nimrod/G Client Nimrod/G Engine Schedule Advisor Grid Bookkeeper Trading Manager Grid Dispatcher Grid Explorer Grid Middleware Globus,Legion, Condor-g,, Ninf,etc. TM TS GE GIS Grid Information Server(s) RM & TS RM & TS RM & TS L N G G Ninf enabled node. Globus enabled node. L C RM: Local Resource Manager, TS: Trade Server Condor enabled node.
Nimrod/G Interactions Resource location Grid Info servers Scheduler Trade Server Resource allocation (local) Prmtc.. Engine Dispatcher Process server Queuing System Job Wrapper User process I/O server File access Root node Gatekeeper node Computational node
A Nimrod/G Client Cost Deadline Legion hosts Globus Hosts Bezek is in both Globus and Legion Domains
Change deadline/budget + Monitor activities
Adaptive Scheduling algorithms ... Locate more Machines Locate Machines Establish Rates Re-distribute Jobs Meet requirements ? Deadlines and Budget Distribute Jobs
Nimrod/O: Automatic Design Optimization Search parameter space rather than exploring all options Nimrod/O Declarative Plan File Simulated Annealing Divide & Conquer Simplex P-BFGS Job Control Function Requests Values Nimrod or Clustor Super computer Cluster Jobs Results Nim Cache Active Sheets - Excell cell func()s execution on the Grid NimCache Nimrod/G
Related Work & Further Info. AppLeS (UC. San Diego) application level scheduling templates case-by-case for different Apps, soon PST. NetSolve (UTK/ORNL) -- API for creating farms SETI @ Home, Distributed.net, …. Millennium (UC. Berkeley) remote execution environment on clusters and supports computational economy CODINE/GRD (Genias/Gridware) meets deadline by dominating over others share. Mariposa- Distributed Database system (UC, Berkeley) query with budget, creates sub-query & dividesbudget, trades with (remote) servers More Info -- www.csse.monash.edu.au/~davida/nimrod.html