Download presentation
Presentation is loading. Please wait.
Published byReynold Ryan Modified over 9 years ago
1
DRM/Computational Grids Bill DeSalvo April 14,, 2004
2
Computational Grids
3
© Platform Computing Inc. 2003 3 Ian Foster’s Three-Point Grid Checklist Coordinates resources Not subject to centralized control One or more (virtual) organizations Geographic distribution of users/resources is common Standard, open, general-purpose protocols and interfaces Delivers nontrivial qualities of service SLAs vs. policies vs. QoS Translates business objectives into IT objectives Enables effective utilization, resource aggregation, and remote access to specialized resources Clusters are NOT grids! A cluster is a local-area, logical arrangement of independent entities that collectively provide a service.
4
© Platform Computing Inc. 2003 4 Virtual Organizations
5
© Platform Computing Inc. 2003 5 Grid “Ready-ness”
6
© Platform Computing Inc. 2003 6 Evolution of the Grid
7
© Platform Computing Inc. 2003 7
8
8 Everyone’s Aware of “The Grid”
9
© Platform Computing Inc. 2003 9 Platform Grid Competencies Resource Leasing Job Forwarding Account Mapping Grid Fairshare Scheduling Advance Reservations User Authentication Reliable Data Transfer Outgrowth of Platform’s experience in Grid and Distributed Computing
10
Platform MultiCluster
11
© Platform Computing Inc. 2003 11 Three-Point Grid Checklist & Platform MultiCluster Coordinates resources Not subject to centralized control ‘Single’ organization (“Enterprise Grid”) Geographic distribution of users/resources is common Proprietary protocols and interfaces Delivers nontrivial qualities of service SLAs vs. policies Common queues Advance reservation Resource leasing Fairshare SLAs Translates business objectives into IT objectives Enables effective utilization, resource aggregation, and remote access to specialized resources
12
© Platform Computing Inc. 2003 12 Why MultiCluster Global Sharing, Local Ownership (“politics of the grid”) Providing … while maintaining … Increased Capacity Increased Capability Increased Scalability Growing Computational Needs Local Autonomy Dept A Dept B Dept C Dept D
13
© Platform Computing Inc. 2003 13 Job Forwarding Model “HPC Center” Configuration Enhanced transparency FCFS guarantee, pending reason support, chunk jobs, host type/queue status aware scheduling, checkpoint/migration Cluster A HPC Center Cluster B Cluster C
14
© Platform Computing Inc. 2003 14 Job Forwarding Model Compute Servers Compute Servers Site A Site B Send queue Receive queue You submit We do --- Job transfer data staging Account mapping Accounting
15
© Platform Computing Inc. 2003 15 Resource Leasing Model Accelerating Enterprise Grid Adoption Single system image, ease of administration, scalability Enable fairshare, preemption, pending reason support, chunk jobs, advance reservation, interactive jobs, parallel jobs, … across clusters
16
© Platform Computing Inc. 2003 16 Compute Servers Compute Servers Site A Site B Configuration Begin Queue QUEUE=lease HOSTS= all@siteB End Queue Begin HostExport PER_HOST = hopper curie DISTRIBUTION = [siteA, 10] SLOTS = 10 End HostExport
17
© Platform Computing Inc. 2003 17 Common Resource Leases t utilization By Admin Lease 528 CPUs To Site A Site B project completes t utilization By Load IF (load < threshold(X)) Lease 528 CPUs to Site A ELSE Reclaim Site B hits extended low util period then goes up t utilization By User Req Lease based on Advance Rsv req Site B is always loaded
18
© Platform Computing Inc. 2003 18 Advance Reservation Nodes dedicated to User A for time duration Reserve nodes for exclusive access for user or user group Ensures critical work is done without interference Useful for benchmarking or system maintenance One-time and recurring reservation Administrator defines reservation for users
19
Use Cases
20
© Platform Computing Inc. 2003 20
21
© Platform Computing Inc. 2003 21 DoD HPCMP Grid DoD HPCMP Challenge Initiative to share resources on HPCMP’s resources easily & transparently: SMDC, TACOM, NRL, NAVO and WSMR, … Build a meta-queuing system to integrate the centers Primary Benefit The capability to submit a job to a single, common queue, which will be sent to the best available computer in the Grid
22
© Platform Computing Inc. 2003 22 DOD HPCMO Solution Platform LSF MultiCluster Resource reservation protocol Transparent job control Accounting Client-server, interactions Kerberized Ticket forwarding/renewal Multi-realm support Account mapping Platform FTA Kerberized Fault tolerant DoD HPCMP Grid Requirement Fire and Forget Full Kerberos 5 Support Reliable, Secure File Transfer
23
© Platform Computing Inc. 2003 23 NAVO SUN E10K 64 PEs AEDC Origin 2000 64 PEs DREN NRL Origin 2000 128 PEs TACOM/TARD EC Onyx2 32 PEs RTTC Origin 2000 32 PEs SMDC Origin 2000 64 PEs SSCSD HP Superdom e 44 PEs AFFTC Origin 3000 64 PEs WSMR Origin 2000 64 PEs DREN GRID Challenges Logistics / Coordination People User Accounts Geographic locations Site configurations Time zones /schedules Network Security /Firewalls Intro of batch queuing systems to environments Training & skills transfer DoD HPCMP Grid
24
© Platform Computing Inc. 2003 24 SHARCNET External Grids/Portal
25
© Platform Computing Inc. 2003 25 SHARCNET The network is no longer ‘passive plumbing’ True resource that can be managed in real time – with guaranteed QoS Potential projects -based resource leasing, advance reservation IP-based topology awareness Enables new classes of Grid applications Operational results Real-time, remote visualization Virtual storage Persistent/pervasive On demand
26
The Globus Toolkit V2
27
© Platform Computing Inc. 2003 27 Sharing pains…physical login Compute Servers Compute Servers Site A Site B You have to Get and maintain multiple accounts Use different batch systems No consolidated accounting Manual file movement
28
© Platform Computing Inc. 2003 28 The Globus Toolkit™ Version 2 (GT2) A software toolkit that addresses key technical problems in the development of Grid-enabled tools, services, and applications Offers a modular “bag of technologies” Enables incremental development of grid-enabled tools and applications Implements standard Grid protocols and APIs Made available under liberal Open Source license Provided by The Globus Alliance http://www.globus.org
29
© Platform Computing Inc. 2003 29 Globus Toolkit: Evaluation (+) Good technical solutions for key problems, e.g. Authentication and authorization Resource discovery and monitoring Reliable remote service invocation High-performance remote data access This & good engineering is enabling progress Good quality reference implementation, multi-language support, interfaces to many systems, large user base, industrial support Growing community code base built on tools
30
© Platform Computing Inc. 2003 30 Globus Toolkit: Evaluation (-) Protocol deficiencies, e.g. Heterogeneous basis: HTTP, LDAP, FTP No standard means of invocation, notification, error propagation, authorization, termination, … Significant missing functionality, e.g. Databases, sensors, instruments, workflow, … Virtualization of end systems (hosting envs.) Little work on total system properties, e.g. Dependability, end-to-end QoS, … Reasoning about system properties Scalability
31
© Platform Computing Inc. 2003 31 LSF MC & Globus MC: Transparent, dynamic, intelligent, scalable inter-cluster sharing User does not need to know about clusters: total transparency MC dynamically chooses the “best cluster” to run the job User chooses which cluster to submit job to via Globus interface Static, non-intelligent sharing Lacks transparency Cluster A Cluster B Cluster C Globus Inter-cluster protocols
32
Globus Toolkit 3 (OGSA)
33
© Platform Computing Inc. 2003 33 Every product an island unto itself Prelude to OGSA: An Analogy
34
© Platform Computing Inc. 2003 34 Prelude to OGSA: An Analogy Differentiated products, integrated stack
35
© Platform Computing Inc. 2003 35
36
© Platform Computing Inc. 2003 36 Open Grid Services Architecture (OGSA) Next-generation architecture Consequence of technology refresh (i.e., refactoring the Globus Toolkit) and research into Autonomic Computing Convergence of Grid Computing and Web Services Globus Toolkit Access services – e.g., CLIs, GUIs, portals and CoGs Resource and allocation management Monitoring and discovery services – e.g., sensing and indexing Data management services – e.g., file transfer, replica management, etc. Security – e.g., the Grid Security Infrastructure Initially SOAP, WSDL and WS-Inspection The Global Grid Forum (GGF) serves as the standards authority Two layers Core Grid platform – OGSA platform interfaces and models Core Grid infrastructure – Open Grid Services Infrastructure (OGSI) http://www.gridforum.org http://www.globus.org/ogsa
37
© Platform Computing Inc. 2003 37 Importance of OGSA to Customers Grid-enabled Web Services transforming IT Analyst feedback (e.g., Gartner) Customer experience Customers demand standards-compliant products, solutions and services – why? Vendors guilty of over-promising and under-delivering Avoid single-vendor lock-in Proprietary implementations based on open standards Seek multi-vendor deliverables Framework for partner collaboration Demanding professionalism in software engineering Seek to be engaged in the process
38
© Platform Computing Inc. 2003 38 Platform Embraces Open Standards Platform developing software for over 11 years Standards efforts are recent activities Existing implementations are proprietary Platform is an NPi founder NPi merged with GGF (4/02) NPi being leveraged in OGSA Platform committed to open standards Proprietary implementations based on open standards Platform experienced in Open Source arena Offering Linux solutions for over 6 years Offering Globus Toolkit solutions for about 2 years Source-code available for components of Platform LSF
39
Platform and Globus
40
© Platform Computing Inc. 2003 40 OGSI Compliance What is it? Through GT3 and the CSF contribution, Platform LSF is now OGSI compliant This also implies that applications can access Platform LSF via Web Services APIs Benefits Protects customer investment Provides a standardized approach for customers to access Platform LSF and interoperate Platform LSF with 3rd party systems, providing best-of-breed solution to the market Users: UCLA, UNC, NEC, etc.
41
© Platform Computing Inc. 2003 41 OGSI Compliance OGSI-compliant service interfaces … CSF MetaScheduler Globus Toolkit V3.0 Platform LSF V6.0 Connector Platform LSF V6.0 Platform LSF V6.0 Platform LSF V6.0 Connector
42
© Platform Computing Inc. 2003 42 Platform Globus Toolkit CSF Plus Advanced CSF-based metascheduler Job persistence; enhanced scalability (6x GT 3); Cluster load balancing and host type matching (LSF only) Globus Toolkit 3 Community Scheduler Framework (CSF) Round robin job scheduling; Advance reservation booking, query, & control; Reservation based scheduling; Job throttling for increased reliability Connectors for 3rd party workload management systems (ie: SGE, PBS, etc) Native command line interface support Platform Globus Tookit One step installation Open Source Platform Enhancements
43
CSF
44
© Platform Computing Inc. 2003 44 What is CSF? CSF (Community Scheduler Framework). Not a Platform product. Contributed industries 1st open source meta-scheduler enhancement to Globus Toolkit V3.X. Developed with the latest version of OGSI – grid guideline being developed with Global Grid Forum. Open source "meta-scheduler“ – framework - Provides basic protocols and interfaces to help resources work together in heterogeneous environments - enables global access and maintains local control of resources
45
© Platform Computing Inc. 2003 45 Key Benefits of OGSA Compliance Future-proof & protect grid investment using standards-based solutions Standardized approach to access Platform LSF Interoperate with 3rd party systems
46
© Platform Computing Inc. 2003 46 Metaschedulers Scheduler that co-ordinates communication between heterogeneous schedulers that operate at a local level Enables global access and coordination while maintaining local control and ownership of resources Future – possible to schedule workload execution also storage, network bandwidth, etc.
47
© Platform Computing Inc. 2003 47 CSF Grid Services Job Service creates, monitors and controls compute jobs Reservation Serviceguarantees resources are available for running a job Queueing Serviceprovides a service where administrators can customize and define scheduling policies at the VO level and/or at the different resource manager level RM Adaptor Serviceprovides a Grid service interface that bridges the Grid service protocol and resource managers (LSF, PBS, SGE, Condor and other RMs)
48
© Platform Computing Inc. 2003 48 Job Service Reservation Service MetaScheduler CSF – Out of The Box LSF RM Adapter for LSF PBS GRAM for PBS GRAM for SGE SGE Globus Information Service Provider for LSF RIPS for PBS RIPS for SGE Grid Service Hosting Environment Queuing Service Platform Open Source GT3.0 Existing Services
49
© Platform Computing Inc. 2003 49 CSF Architecture Platform LSF User Globus Toolkit User LSF Meta- scheduler Plugin Grid Service Hosting Environment Job Service Reservation Service Meta-Scheduler Global Information Service RIPS GRAM SGE RIPS GRAM PBS RIPS RM Adapter RIPS = Resource Information Provider Services GRAM = Grid Resource & Allocation Mangement Queuing Service Third Party Workload Management System Platform LSF
50
Profile High Low Awareness/KnowledgeLiking/Preference/ConvictionCommitment Grid Canada OMII
51
© Platform Computing Inc. 2003 51 What are the Multi-Domain Tools and What Do They Do? Platform MultiCluster Enables global access and coordination while maintaining local control and ownership of resources Join geographically dispersed clusters Production quality solution to build enterprise grids Platform proprietary solution that is standards-based & OGSA compliant Globus Toolkit Tools to join geographically dispersed clusters A bunch of “bricks” to build grids (that’s why it’s called a toolkit) Users have to specify which cluster they would like their job to be sent to – not transparent Open source solution Platform adds commercial support: documentation, training, tech support, professional services
52
© Platform Computing Inc. 2003 52 Key Components of Platform MultiCluster
53
Data Grids
54
© Platform Computing Inc. 2003 54
55
© Platform Computing Inc. 2003 55 Data Grid Spectrum No Updates Periodic Updates Frequent Updates GOV/EDU Grid Life Sciences Grid Auto Grid Partial replication Efficient & reliable file transfer Intelligent transfer Workload-directed caching Cache-aware scheduling Data pipeline Sharing scope HEP Grid User private Intra-project sharing Aero Grid EDA Grid Efficient data sync Inter-project sharing Intelligent data scheduling
56
© Platform Computing Inc. 2003 56 Data Grid Spectrum No Updates Periodic Updates Frequent Updates Sharing scope User private Intra-project sharing Inter-project sharing GridFTP Replica Catalog FTA DataGrid
57
Summary
58
© Platform Computing Inc. 2003 58 Summary OGSA applies to e-Science and e-Business Rich architectural framework Existing, emerging and planned specifications Ultimately resulting in Open Standards Existing, emerging and planned implementations The Community Scheduler Framework Standards-based Choice of implementations Ushers existing grids towards OGSA compliance Spectrum of potential use cases
59
Thank you.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.