Presentation is loading. Please wait.

Presentation is loading. Please wait.

The Gridbus Toolkit for Building and Deploying eScience Applications on Utility Grids Rajkumar Buyya Fellow of Grid Computing Grid Computing and Distributed.

Similar presentations


Presentation on theme: "The Gridbus Toolkit for Building and Deploying eScience Applications on Utility Grids Rajkumar Buyya Fellow of Grid Computing Grid Computing and Distributed."— Presentation transcript:

1 The Gridbus Toolkit for Building and Deploying eScience Applications on Utility Grids Rajkumar Buyya Fellow of Grid Computing Grid Computing and Distributed Systems (GRIDS) Lab. Dept. of Computer Science and Software Engineering The University of Melbourne, Australia www.gridbus.org

2 2 Outline Introduction to eScience and Challenges Introduction to the Gridbus Project An Overview of Gridbus Components Grid Service Broker Architecture Design and Implementation Scheduling Algorithms BioGrid Demo OR Performance Evaluation A Case Study in High Energy Physics Economy-based Scheduling in Data Grids Summary

3 3 Prominent Grid Drivers: Emerging eScinece and eBusiness Apps Next generation experiments, simulations, sensors, satellites, even people and businesses are creating a flood of data. They all involve numerous experts/resources from multiple organization in synthesis, modeling, simulation, analysis, and interpretation. Life Sciences Digital Biology Finance: Portfolio analysis ~PBytes/sec Newswire & data mining: Natural language engineering Astronomy Internet & Ecommerce High Energy PhysicsBrain Activity Analysis Quantum Chemistry

4 4 E-Science Elements Distributed instruments Distributed computation Distributed data Peers sharing ideas and collaborative interpretation of data/results E-Scientist 2100 Remote Visualization Data & Compute Service

5 5 Grids have Emerged as Scalable Cyberinfrastructure for e-Science Applications Grid Resource Broker Resource Broker Application Grid Information Service Grid Resource Broker database R2R2 R3R3 RNRN R1R1 R4R4 R5R5 R6R6 Grid Information Service

6 6 Type of Services Modern Grids Offer Computational Services – CPU cycles SETI@Home, NASA IPG, TeraGrid, I-Grid,… Data Services Data replication, management, secure access-- LHC Grid/Napster Application Services Access to remote software/libraries and license management—NetSolve Information Services Extraction and presentation of data with meaning Knowledge Services The way knowledge is acquired and managed— data mining. Utility Computing Services Towards a market-based Grid computing: Leasing and delivering Grid services as ICT utilities. Computional Grid Data Grid ASP Grid Information Grid Knowledge Grid Utility Grid

7 7 Grid Challenges Security Resource Allocation & Scheduling Data locality Network Management System Management Resource Discovery Uniform Access Computational Economy Application Construction

8 8 Some Grid Initiatives Worldwide Australia Nimrod-G Gridbus DISCWorld GrangeNet. APACGrid ARC eResearch? Brazil OurGrid, EasyGrid LNCC-Grid + many others China ChinaGrid – Education CNGrid - application Europe UK eScience EU Grids.. and many more... India I-Grid  Japan NAGERI Korea... N*Grid Singapore NGP USA Globus NASA IPG AccessGrid TeraGrid Cyberinfrasture and many more... Industry Initiatives IBM On Demand Computing HP Adaptive Computing Sun N1 Microsoft -.NET Oracle 10g Infosys – Business Grid StorageTek –Grid.. and many more Public Forums Global Grid Forum Australian Grid Forum Conferences: CCGrid Grid P2P HPDC http://www.gridcomputing.com 1.3 billion – 3 yrs 1 billion – 5 yrs 450million – 5 yrs 486million – 5 yrs 1.3 billion (Rs) 27 million 2? billion 120million – 5 yrs

9 9 The Gridbus Project @ Melbourne: Enable Leasing of ICT Services on Demand WWG World Wide Grid!  On Demand Utility Computing Gridbus Distributed Data

10 10 The Gridbus Project: http://www.gridbus.org A multi-institutional “Open Source” R&D Project with focus on: Architecture, Specification, and Open Source Reference Implementation. Service-Oriented Grid, Utility Computing & Distributed Data and Computation Economy Scaling from Desktops, Clusters, Cluster Federation, Enterprise Grids to Global Grids. Grid Market Directory and Web Services Grid Bank: Accounting and Transaction Management Visual Tools for Creation of Distributed Applications Workflow Composition and Deployment Services Data Grid Brokering and Grid Economy Services Data Replication Strategies GridSim Toolkit: Enhanced to support Data Grid, Reservation, etc. Libra: Economic Cluster Scheduler Coupling of Clusters and Computational Economy Alchemi: Harnessing.NET/Windows-based Resources WWG: Global Data Intensive Grid Testbed Application Enabler Projects: High-Energy Physics, Astronomy, Brain Activity Analysis – Osaka U., Natural Language Processing, Portfolio Analysis – Spain, BioGrid - WEHI (via APACGrid), SensorGrid (NICTA), Medical Imaging (HFI) Supported by:

11 11 Grid Economy: Methodology for Sustained Resourced Sharing and Managing Supply-and-Demand for Resources

12 12 New challenges of Grid Economy Grid Service Providers (GSPs) How do I decide service pricing models ? How do I specify them ? How do I translate them into resource allocations ? How do I enforce them ? How do I advertise & attract consumers ? How do I do accounting and handle payments? ….. Grid Service Consumers (GSCs) How do I decide expenses ? How do I express QoS requirements ? How do I trade between timeframe & cost ? How do I map jobs to resources to meet my QoS needs? ….. They need mechanisms and technologies for value expression, value translation, and value enforcement.

13 GRACE: Service Oriented Grid Architecture GRid Architecture for Computational Economy (GRACE)

14 14 Grid Node N GRACE: A Reference Service-Oriented Grid Architecture for Computational Economies Grid Consumer Programming Environments Grid Resource Broker Grid Service Providers Grid Explorer Schedule Advisor Trade Manager Job Control Agent Deployment Agent Trade Server Resource Allocation Resource Reservation R1R1 Misc. services Information Service R2R2 RmRm … Pricing Algorithms Accounting Grid Node1 … Grid Middleware Services … … Health Monitor Grid Market Services JobExec Info ? Secure Trading QoS Storage Sign-on Grid Bank Applications Data Catalogue

15 15 Gridbus and Complementary Grid Technologies – realizing GRACE AIX Solaris WindowsLinux.NET Grid Fabric Software Grid Applications Core Grid Middleware User-Level Middleware (Grid Tools) Grid Bank Grid Exchange & Federation JVM Grid Brokers: X-Parameter Sweep Lang. Gridbus Data Broker MPI CondorSGETomcatPBS Alchemi Workflow IRIXOSF1 Mac Libra GlobusUnicore … … Grid Market Directory PDBCDB Worldwide Grid Grid Fabric Hardware … … PortalsScienceCommerceEngineering … … Collaboratories … … Workflow Engine Grid Storage Economy Grid Economy NorduGridXGrid ExcellGrid Nimrod-G GRIDSIMGRIDSIM Gridscape

16 16 Gridbus Technologies Application Construction Tools Visual Parametric Modeller (VPM) Grid Economy Services Grid Market Directory A Registry for publication of GSPs and their Services – VO/VE Grid Bank A Grid Accounting Services Grid Trading Services Data Grid Service Broker QoS based Scheduling of Distributed Data Oriented Apps on global Grids Grid Workflow Management System Gridscape Interactive Grid Testbed Portal Generator G-monitor Grid Application Execution Management Portal GridSim A Grid Simulation Toolkit Libra Economy based Cluster Scheduling

17 17 Alchemi:.NET-based Enterprise Grid Platform & Web Services Internet Alchemi Worker Agent Alchemi Manager Alchemi Users Web Services SETI@Home like Model General Purpose Dedicated/Non-dedicate workers Role-based Security.NET and Web Services C# Implementation GridThread and Job Model Programming Easy to setup and use

18 18 On Demand Assembly of Services: Putting Them All Together Data Source (Instruments/dis tributed sources) Data Replicator (GDMP) ASP Catalogue Grid Info Service Grid Market Directory GSP (Accounting Service) Gridbus GridBank Data GSP (e.g., UofM) PE GSP (e.g., VPAC) PE GSP (e.g., IBM) CPU or PE Grid Service (GS) (Globus) Alchemi GS GTS Cluster Scheduler Grid Service Provider (GSP) (e.g., CERN) PE Cluster Scheduler Job 8 Grid Resource Broker 2 Visual Application Composer Application Code Explore data 1 36 45 Results 97 Results+ Cost Info 10 11 Bill 12 Data Catalogue

19 Creation and Operation of Virtual Enterprises Grid Market Directory Grid Bank

20 20 A Market-Oriented Grid Environment

21 21 Grid Market Infrastructure Grids need to provide an infrastructure that supports: (a) the creation of one or more GMP registries; (b) the contributors to register themselves as GSPs along with their resources/application services that they wish to provide; (c) GSPs to publish themselves in one or more GMPs along with service prices; and (d) Grid resource brokers to discover resources/services and their attributes (e.g., access price and usage constraints) that meet user QoS requirements.

22 22 Grid Bank: Grid Transactions Authorization, Accounting, & Payment Infrastructure Grid Resource Broker (GRB) GridBank Payment Module Grid Trade Server GridBank Charging Module GridBank Server Establish Service Costs ApplicationsApplications Grid Agent Grid Resource Meter GridCheque Deploy Agent and Submt Jobs Usage Agreement Resource Usage GridCheque Grid Service Provider (GSP) GridCheque + Resource Usage (GSC Account Charge Grid Service Consumer (GSC) R1R2 R3 R4 User

23 Grid Applications: Composition and Deployment – A Broker Perspective Nimrod-G Broker: A Grid Broker for Computational Grids Gridbus Broker: A Grid Service Broker for Data Grids

24 24 Grid Applications and Parametric Computing Bioinformatics: Drug Design / Protein Modelling Bioinformatics: Drug Design / Protein Modelling Sensitivity experiments on smog formation Natural Language Engineering Ecological Modelling: Control Strategies for Cattle Tick Electronic CAD: Field Programmable Gate Arrays Computer Graphics: Ray Tracing High Energy Physics: Searching for Rare Events Finance: Investment Risk Analysis VLSI Design: SPICE Simulations Aerospace: Wing Design Network Simulation Automobile: Crash Simulation Data Mining Civil Engineering: Building Design astrophysics

25 25 Thesis Build a task farming application (parameter sweep or bag of tasks) and execute it on Grid within “T” hours or early and cost not exceeding $M. Manual Automated Three Options/Solutions: Using pure Globus commands Build your own Distributed App & Scheduler Use Gridbus Resource Broker to compose and schedule

26 The Gridbus Grid Service Broker for Data Grid Applications Builds on the Nimrod-G Computational Grid Broker and Computational Economy [Buyya, Abramson, Giddy, Monash University, 1999-2001] And Extends its notion for Data and Service Grids

27 27 A resource broker for scheduling task farming data Grid applications with static or dynamic parameter sweeps on global Grids. It uses computational economy paradigm for optimal selection of computational and data services depending on their quality, cost, and availability, and users’ QoS requirements (deadline, budget, & T/C optimisation) Key Features A single window to manage & control experiment Programmable Task Farming Engine Resource Discovery and Resource Trading Optimal Data Source Discovery Scheduling & Predications Generic Dispatcher & Grid Agents Transportation of data & sharing of results Accounting Grid Service Broker (GSB)

28 28 Gridbus Broker at a Glance Home Node/Portal -PBS -Condor -SGE Alchemi Globus Job manager fork()batch() Gridbus Broker Gateway Unicore fork() batch() -PBS -Condor -Alchemi Data Store Access Technology Grid FTP SRB Gridbus agent Data Catalog Credential Repository MyProxy

29 29 Gridbus Broker Architecture Grid Middleware Gridbus Client Gribus Client Grid Info Server Schedule Advisor Trading Manager Gridbus Farming Engine Record Keeper Grid Explorer GE GIS, NWS TM TS RM & TS Grid Dispatcher RM: Local Resource Manager, TS: Trade Server G G C U Globus enabled node. A L Alchemi enabled node. (Data Grid Scheduler) Data Catalog Data Node Unicore enabled node. $ $ $ App, T, $, Opt (Bag of Tasks Applications)

30 30 Gridbus Services for eScience applications Application Development Environment: XML-based language for composition of task farming (legacy) applications as parameter sweep applications. Task farming APIs for new applications. Web APIs (e.g., Portlets) for Grid portal development. Workflow interface and Gridbus-enabled workflow engine. Resource Allocation and Scheduling Dynamic discovery of optional computational and data nodes that meet user QoS requirements. Hide L ow-Level Grid Middleware interfaces Globus, Alchemi, Unicore, NorduGrid, XGrid, etc.

31 31 Gridbus Broker: XML file 1 1 main./calc $X $Y

32 32 Portal-based Access to Grid Broker for Launching and Steering Applications Grid Broker World-Wide Grid

33 33 Figure 3 : Logging into the portal. Drug Design Made Easy!

34 34 Excel Plugin to Access Gridbus Services Excel ExcelGrid Add-In ExcelGrid Runner ExcelGridJob ExcelGrid MiddlewareGridbus BrokerEnterprise Grid 210 0

35 35 Discover Resources Distribute Jobs Establish Rates Meet requirements ? Remaining Jobs, Deadline, & Budget ? Evaluate & Reschedule Discover More Resources Compose & Schedule Adaptive Scheduling Steps

36 36 Deadline (D) and Budget (B) Constrained Scheduling Algorithms AlgorithmExecution Time (D) Execution Cost (B) Compute Grid Data Grid Cost OptLimited by DMinimize Yes Cost-Time OptMinimize if possible Minimize Yes Time OptMinimizeLimited by B Yes Conservative- Time Opt MinimizeLimited by B, jobs have guaranteed minimum budget Yes

37 37 Sample Applications of Gridbus Broker Molecular Docking - WEHI Drug Discovery Brain Activity Analysis – Osaka University Neuroscience studies Natural Language Engineering – Melbourne NLP Indexing of newswire data High Energy Physics – School of Physics/Melbourne Belle experiment data analysis Finance - Portfolio Analysis – U. Comp. Madrid/Spain Investment risk analysis Astronomy Australian Virtual Observatory Spreadsheet Processing Microsoft Excel

38 Economy-based Data Grid Scheduling High Energy Physics as eScience Application Case Study

39 39 Case Study: High Energy Physics What is High Energy Physics? (HEP) Study of the fundamental constituents of matter and forces. High Energy Physics - using H.E. enables the probing of smaller distances/structures and study in early-universe like environ. Particle Physics - quanta of matter/forces and their properties The Belle Experiment KEK B-Factory, Japan Investigating fundamental violation of symmetry in nature (Charge Parity) which may help explain the universal matter – antimatter imbalance. Collaboration 400 people, 50 institutes 100’s TB data currently

40 40 Case Study: Event Simulation and Analysis B0->D*+D*-Ks Simulation and Analysis Package - Belle Analysis Software Framework (BASF) Experiment in 2 parts – Generation of Simulated Data and Analysis of the distributed data Only the Analysis is discussed here

41 41 Australian Belle Data Grid Testbed

42 42 Case Study: Input File for Analysis parameter jobf Gridfile lfn:/users/winton/fsimddks/fsimdata*.mdst; task main copy runme.grid2 node:runme.grid2 node:execute./runme.grid2 $jobf $jobname endtask Dynamic parameter defined to describe an input data file Logical file name pointing to the location in the replica catalog that contains a mapping to where the physical files are present. 100 data files (30MB each) were equally distributed among the five nodes

43 43 Resources Used and their Service Price Organization Node detailsRoleCost (in G$/CPU-sec) CS,UniMelbbelle.cs.mu.oz.au 4 CPU, 2GB RAM, 40 GB HD, Linux Broker host, Data host, NWS server N.A. (Not used as a compute resource) Physics, UniMelbfleagle.ph.unimelb.edu.au 1 CPU, 512 MB RAM, 40 GB HD, Linux Replica Catalog host, Data host, Compute resource, NWS sensor 2 CS, University of Adelaide belle.cs.adelaide.edu.au 4 CPU (only 1 available), 2GB RAM, 40 GB HD, Linux Data host, NWS sensor N.A. (Not used as a compute resource) ANU, Canberrabelle.anu.edu.au 4 CPU, 2GB RAM, 40 GB HD, Linux Data host, Compute resource, NWS sensor 4 Dept of Physics, USyd belle.physics.usyd.edu.au 4 CPU (only 1 available), 2GB RAM, 40 GB HD, Linux Data host, Compute resource, NWS sensor 4 VPAC, Melbournebrecca-2.vpac.org 180 node cluster (only head node used), Linux Compute resource, NWS sensor 6

44 44 Network Cost (in Grid $/Currency!)

45 45 Deploying Application Scenario A data grid scenario with 100 jobs and each accessing remote data of ~30MB Deadline: 3hrs. Budget: G$ 60K Scheduling Optimisation Scenario: Minimise Time Minimise Cost Results:

46 46 Time Minimization in Data Grids 0 10 20 30 40 50 60 70 80 123456789101112131415161718192021222324252627282930313233343536373839404142 Time (in mins.) Number of jobs completed fleagle.ph.unimelb.edu.aubelle.anu.edu.aubelle.physics.usyd.edu.aubrecca-2.vpac.org

47 47 Results : Cost Minimization in Data Grids 0 10 20 30 40 50 60 70 80 90 100 13579111315171921232527293133353739414345474951535557596163 Time(in mins.) Number of jobs completed fleagle.ph.unimelb.edu.aubelle.anu.edu.aubelle.physics.usyd.edu.aubrecca-2.vpac.org

48 48 Observation Organization Node detailsCost (in G$/CPU-sec)Total Jobs Executed TimeCost CS,UniMelbbelle.cs.mu.oz.au 4 CPU, 2GB RAM, 40 GB HD, Linux N.A. (Not used as a compute resource) -- Physics, UniMelbfleagle.ph.unimelb.edu.au 1 CPU, 512 MB RAM, 40 GB HD, Linux 23 94 CS, University of Adelaide belle.cs.adelaide.edu.au 4 CPU (only 1 available), 2GB RAM, 40 GB HD, Linux N.A. (Not used as a compute resource) -- ANU, Canberrabelle.anu.edu.au 4 CPU, 2GB RAM, 40 GB HD, Linux 42 2 Dept of Physics, USyd belle.physics.usyd.edu.au 4 CPU (only 1 available), 2GB RAM, 40 GB HD, Linux 472 2 VPAC, Melbournebrecca-2.vpac.org 180 node cluster (only head node used), Linux 623 2

49 49 Grid Workflow Management System and Broker Services Database Workflow Submission Handler Workflow Language Parser Tasks Parameters Dependencies Resource Discovery DispatcherData Movement GMD Replica Catalog Gridbus Broker Globus Web services HTTPGridFTP Data transfer Workflow PlannerApplication Composition …… Scientific Portal Workflow Enactment Engine Workflow description & QoS Info Service MDS Workflow Scheduler

50 50 The GridSim Toolkit A Java based tool for Grid Scheduling Simulations Basic Discrete Event Simulation Infrastructure Virtual Machine (Java, cJVM, RMI) PCs Clusters Workstations... SMPs Distributed Resources GridSim Toolkit Application Modeling Information Services Resource Allocation Grid Resource Brokers or Schedulers’s Simulation Statistics Resource Modeling and Simulation (with Time and Space shared schedulers) Job Management ClustersSingle CPUReservationSMPsLoad Pattern Application Configuration Resource Configuration Visual Modeler Grid Scenario Network SimJavaDistributed SimJava Resource Entities Output Application, User, Grid Scenario’s Input and Results Add your own policy for resource allocation

51 51 Selected GridSim Users - 2002

52 52 Summary and Conclusion Introduced requirements for an eScience application Demonstrated suitability of Grid computing as Cyberinfrastructure for eScience and eBusiness. Grids exploit synergies that result from cooperation of autonomous entities: Resource sharing, dynamic provisioning, and aggregation at global level. Grids allow users to dynamically lease Grid services at runtime based on their quality, cost, availability, and users QoS requirements. Delivering ICT services as computing utilities. Grids offer enormous opportunities for realizing eScience and eBusiness at global level.

53 53 Any Questions ? http://www.gridbus.org

54 54 Testbed Details All nodes ran Globus 2.4.2 ANU and Melbourne CS had 4 CPUs each. Sydney node was effectively 1 processor (SMP kernel was disabled) Adelaide Globus Gatekeeper was not functioning however we could get data off it. BASF was pre-installed on all the machines Gigabyte of code.


Download ppt "The Gridbus Toolkit for Building and Deploying eScience Applications on Utility Grids Rajkumar Buyya Fellow of Grid Computing Grid Computing and Distributed."

Similar presentations


Ads by Google