Presentation is loading. Please wait.

Presentation is loading. Please wait.

October, 2003 – Linkoping, Sweden Andrew Grimshaw Department of Computer Science, Virginia CTO & Founder Avaki Corporation From Clusters to Grids.

Similar presentations


Presentation on theme: "October, 2003 – Linkoping, Sweden Andrew Grimshaw Department of Computer Science, Virginia CTO & Founder Avaki Corporation From Clusters to Grids."— Presentation transcript:

1 October, 2003 – Linkoping, Sweden Andrew Grimshaw Department of Computer Science, Virginia CTO & Founder Avaki Corporation From Clusters to Grids

2 2 Agenda Grid Computing Background Legion Existing Systems & Standards Summary

3 3 Grid Computing

4 4 First: What is a Grid System? A Grid system is a collection of distributed resources connected by a network Examples of Distributed Resources: Desktop Handheld hosts Devices with embedded processing resources such as digital cameras and phones Tera-scale supercomputers

5 5 A grid enables users to collaborate securely by sharing processing, applications, and data across heterogeneous systems and administrative domains for collaboration, faster application execution and easier access to data. Compute Grids Data Grids What is a Grid? A grid is all about gathering together resources and making them accessible to users and applications.

6 6 What are the characteristics of a Grid system? Numerous Resources Ownership by Mutually Distrustful Organizations & Individuals Potentially Faulty Resources Different Security Requirements & Policies Required Resources are Heterogeneous Geographically Separated Different Resource Management Policies Connected by Heterogeneous, Multi-Level Networks

7 7 What are the characteristics of a Grid system? Numerous Resources Ownership by Mutually Distrustful Organizations & Individuals Potentially Faulty Resources Different Security Requirements & Policies Required Resources are Heterogeneous Geographically Separated Different Resource Management Policies Connected by Heterogeneous, Multi-Level Networks

8 8 Technical Requirements of a Successful Grid Architecture Simple Secure Scalable Extensible Site Autonomy Persistence & I/O Multi-Language Legacy Support Single Namespace Transparency Heterogeneity Fault-tolerance & Exception Management Success requires an integrated solution AND flexible policy Manage Complexity!!

9 9 Implication: Complexity is THE Critical Challenge How should complexity be addressed?

10 10 Robustness Time & Cost Low High Sockets & Shells Low High A low-level or “socket & shell” approach is low in robustness & high in time and cost to develop. Integrated Solution Low High An integrated approach is high in robustness and low in time and cost to develop. As Application Complexity Increases, Differences Between the Systems Increase Dramatically High High-level versus low-level solutions

11 11 The Importance of Integration in a Grid Architecture If separate pieces are used, then the programmer must integrate the solutions. If all the pieces are not present, then the programmer must develop enough of the missing pieces to support the application. Bottom Line: Both raise the bar by putting the cognitive burden on the programmer.

12 12 Simple cycle aggregation State of the state is essentially scheduling and queuing for CPU cluster management These definitions are selling short the promise of Grid technology AVAKI believes grids are not just about aggregating and scheduling CPU cycles but also … Virtualizing many types of resources, internally and across domains Empowering anyone to have secure access to any and all resources through easy administration Misconceptions about Grids

13 13 Sons of SETI@home United Devices, Entropia, Data Synapse Low-end, desktop cycle aggregation Hard sell in corporate America Cluster Load Management LSF, PBS, SGE High end, great for management of local clusters but not well proven in multi-cluster environments As soon as you go outside of the local cluster to cross-domain multi-cluster, the game changes dramatically with the introduction of three major issues: Data Security Administration Compute Grids Categories To address these issues, you need a fully-integrated solution, or a toolkit to build one

14 14 Typical Grid Scenarios Desktop Cycle Aggregation Desktop only United Devices, Entropia, Data Synapse Cluster & Departmental Grids Single owner, platform, domain, file system and location SUN SGE, Platform LSF, PBS Enterprise Grids Single enterprise; multiple owners, platforms, domains, file systems, locations, and security policies SUN SGE EE, Platform Multi-cluster Global Grids Multiple enterprises, owners, platforms, domains, file systems, locations, and security policies Legion, Avaki, Globus

15 15 What are grids being used for today? Multiple sites with multiple data sources (public and private) Need secure access to data and applications for sharing Have partnership relationships with other organizations: internal, partners, or customers Computationally challenging applications Distributed R&D groups across company, networks and geographies Staging large files Want to utilize and leverage heterogeneous compute resources Need for accounting of resources Need to handle multiple queuing systems Considering purchasing compute cycles for spikes in demand

16 16 Legion

17 17 Legion Grid Software Desktop Server Users Wide-area access to data, processing and application resources in a single, uniform operating environment that is secure and easy to administer Server ApplicationData Server Data Cluster Applications Legion Grid Capabilities  Wide-area data access  Distributed processing  Global naming  Policy-based administration  Resource accounting  Fine-grained security  Automatic failure detection and recovery Legion G R I D Load Mgmt & Queuing Vendor Department B Department A Partner Application Data Load Mgmt & Queuing

18 18 Legion Combines Data and Compute Grid Users Applications Legion G R I D Desktop Server Server ApplicationData Server Data Cluster Load Mgmt & Queuing Vendor Department B Department A Partner Application Data Load Mgmt & Queuing

19 19 The Legion Data Grid

20 20 Data Grid Users Wide-area access to data at its source location based on business policies, eliminating manual copying and errors caused by accessing out-of-date copies Applications Desktop Server Server ApplicationData Server Data Cluster Vendor Department B Department A Partner Application Legion G R I D Data Data Grid Capabilities  Federates multiple data sources  Provides global naming  Works with local and virtual file systems – NFS, XFS, CIFS  Accesses data in DAS, NAS, SAN  Uses standard interfaces  Caches data locally

21 21 Data Grid Share Users Applications LinuxNTSolaris Tools Vendor Research Center Headquarters Informatics Partner Data mapped to Grid namespace via Legion ExportDir Legion Data Grid transparently handles client and application requests, maps them to the global namespace, and returns the data

22 22 Data Grid Access Server RD - 2 App_A PM-1 Cluster HQ - 1 sequence_b Cluster BLAST sequence_c sequence_a Tools Vendor Research Center Headquarters Informatics Partner Users Applications Fine-grained Security Access Point Access files using standard NFS protocol or Legion commands - NFS security issues eliminated - Caches exploit semantics Access files using global name Access based on specified privileges

23 23 Data Grid Access using virtual NFS Partner Fine-grained Security Department ADepartment B Legion-NFS Complexity = Servers + Clients Clients mount grid Servers share files to grid Clients access data using NFS protocol Wide-area access to data outside administrative domain sequence_c sequence_a

24 24 Keeping Data in the grid Legion storage servers Data is copied into Legion storage servers that execute on a set of hosts. The particular set hosts used is a configuration option - here five hosts are used Access to the different files is completely independent and asynchronous Very high sustained read/write bandwidth is possible using commodity resources a de b f c gh / Local Disk Local Disk Local Disk Local Disk Local Disk

25 25 I/O Performance Read performance in NFS, Legion-NFS, and Legion I/Olibraries. The x axis indicates the number of clients that simultaneously perform 1 MB reads on 10 MB files, and the y axis indicates total read bandwidth. All results are the average of multiple runs. All clients on 400 MHZ Intel’s, NFS server on 800 MHZ Intel server.

26 26 Data Grid Benefits Easy, convenient, wide-area access to data – regardless of location, administrative domain or platform Eliminates time-consuming copying and obtaining accounts on machines where data resides Provides access to the most recent data available Eliminates confusion and errors caused by inconsistent naming of data Caches remote data for improved performance Requires no changes to legacy or commercial applications Protects data with fine-grained security and limits access privileges to those required Eases data administration and management Eases migration to new storage technologies

27 27 The Legion Compute Grid

28 28 Compute Grid Users Wide-area access to processing resources based on business policies, managing utilization of processing resources for fast, efficient job completion Applications Desktop ServerApplication Server ApplicationData Server Data Cluster Vendor Department B Department A Partner Application Legion G R I D Compute Grid Capabilities  Job scheduling and priority-based queuing  Easy integration with third party load management and queuing software  Automatic staging of data and applications  Efficient processing of both sequential and parallel applications  Failure detection and recovery  Usage accounting

29 29 Fine-grained Security Compute Grid Access Solaris Server RD - 2 NT Server PM-1 Data Cluster HQ - 1 Data Linux Cluster BLAST Tools Vendor Research Center Headquarters Informatics Partner Scheduling, Queuing, Usage Management, Accounting, Recovery Login/Submission The grid: ­ Locates resources ­ Authenticates and grants access privileges ­ Stages applications and data ­ Detects failures and recovers ­ Writes output to specified location ­ Accounts for usage App_A Data Users Applications

30 30 Tools - All are cross-platform MPI P-space studies - multi-run Parallel C++ Parallel object-based Fortran CORBA binding Object migration Accounting legion_make - remote builds Fault-tolerant MPI libraries post-mortem debugger “console” objects parallel 2D file objects Collections

31 31 One Favorite

32 32 Related Work

33 33 Related Work Avaki All distributed systems literature Globus AFS/DFS LSF, PBS, …. Global Grid Forum - OGSA

34 34 Avaki Company Background Grid Pioneers - a Legion spin-off Over $20M capitalization The only commercial grid software provider with a solution that addresses data access, security, and compute power challenges Standards efforts leader Partners Standards Organizations Customers

35 35 AFS/DFS comparison with Legion Data Grid AFS presumes that all files kept in AFS - no federation with other file systems. Legion allows data to be kept in Legion, or in an NFS, XFS, PFS, or Samba file system. AFS presumes all sites using Kerberos and that realms “trust” each other - Legion assumes nothing about local authentication mechanism and there is no need for cross-realm trust AFS semantics are fixed - copy on open - Legion can support multiple semantics. Default is Unix semantics. AFS volume oriented (sub-tree’s) - Legion can be volume oriented or file oriented AFS caching semantics not extensible - Legion caching semantics are extensible

36 36 Legion & Globus GT2 Projects with many common goals: Metacomputing (or the “Grid”) Middleware for wide-area systems Heterogeneous resource sets Disjoint administrative domains High-performance, large-scale applications

37 37 Legion Specific Goals Shared collaborative environment including shared file system Fault-tolerance and high-availability Both HPC applications and distributed applications Complete security model including access control Extensible Integrated - create a meta-operating system

38 38 Many “Similar” Features Resource Management Support Message-passing libraries e.g., MPI Distributed I/O Facilities Globus GASS/remote I/O vs. Avaki Data Grid Security Infrastructure

39 39 The “toolkit” approach Provide services as separate libraries E.g. Nexus, GASS, LDAP Pros: Decoupled architecture easy to add new services into the mix Low buy-in: use only what you like! In practice all the pieces use each other Cons: No unifying abstractions very complex environment to learn in full composition of services difficult as number of services grows Interfaces keep changing due to ever evolving design Does not cover space of problems Globus

40 40 Standards: GGF Background: Grid standards are now being developed at the Global Grid Forum (GGF) In-development standard, Open Grid Services Infrastructure (OGSI) will extend Web Services (SOAP/XML, WSDL, etc.) Names and a two level name scheme Factories and lifetime management Mandatory set of interfaces, e.g., discovery interfaces OGSA – Open Grid Services Architecture Over-arching architecture Still in development

41 41 Summary Grids are about resource federation and sharing Grids are here today. They are being used in production computing in industry to solve real problems and provide real value. Compute Grids Data Grids We believe that users want high-level abstractions - and don’t want to think about the grid. Need low activation energy and legacy support There are a number of challenges to be solved - and different applications and organizations want to solve them differently Policy heterogeneity Strong separation of policy and mechanism Several areas where really good policies are still lacking Scheduling Security and security policy interactions Failure recovery (and the interaction of different policies)


Download ppt "October, 2003 – Linkoping, Sweden Andrew Grimshaw Department of Computer Science, Virginia CTO & Founder Avaki Corporation From Clusters to Grids."

Similar presentations


Ads by Google